心跳间隔周期内,设备重连推送的上下线事件的时间戳问题

环境

  • EMQX 版本:5.7.2
  • 操作系统版本: centos 7

重现此问题的步骤

步骤:

  1. 设备断电
  2. 设备上电连接(心跳检测周期内,即控制台未检测已离线)

问题:
此时emqx会推送一个connect 事件、disconnect 事件,
但是一些情况下我发现,connect 事件的 connected_at 会小于 disconnect 事件的 disconnected_at

如下所示:connected_at:1756839646560,disconnected_at:1756839646561

 c.s.i.mqtt.callback.ConnectCallback - ConnectEventStr:{"ipaddress":"","expiry_interval":7200000,"clean_start":false,"sockport":1883,
"connected_at":1756839646560,"proto_ver":4,"proto_name":"MQTT","clientid":"8zlj9szy5nrmr13f56thwb","username":"8zlj9szy5nrmr13f56thwb|productId=nrbhc2ebpfvqktjx3u5n,signMethod=hmacSha256,authType=1,timestamp=1756839645244","ts":1756839646560,"protocol":"mqq
tt","keepalive":60}
c.s.i.m.callback.DisconnectCallback - DisconnectEventStr:{"ipaddress":"","disconnected_at":1756839646561,"sockport":1883,"connected_at":1756758482404,,
"proto_ver":4,"proto_name":"MQTT","clientid":"8zlj9szy5nrmr13f56thwb","username":"8zlj9szy5nrmr13f56thwb|productId=nrbhc2ebpfvqktjx3u5n,signMethod=hmacSha256,authType=1,timestamp=1756753268184","ts":1756839646561,"protocol":"mqtt",reason":"takenover"}

但是目前无法稳定复现,同时正常情况下 connected_at >= disconnected_at

预期行为

心跳检测周期内,设备重新上线,推送的上下线事件记录 connected_at >= disconnected_at

实际行为

这不是很正常么,设备总是先 connected,才会 disconnected。那 connected < disconnected的。

这种是对应错了包,一般的流程是这样的:
先 connected 再 disconnected(他俩是一对)
如果你看到的是connected_at > disconnected ,那应该是这样:
connected1 → disconnected1 → connected2 → xxxx
你拿 connected2 去和 disconnected1 比(他俩都不是一对)所以就出现了你说的 connected_at >= disconnected_at

为了能准确的识别出他们是不是一对,在 5.8.6 上的 disconnected 事件中引入了 connected_at 字段,这样你就能判断(找到) 这个 disconnected对应的 connected 事件在哪里了。

应该是我描述得不够清晰。

流程:
connected1 → disconnected1 → connected2 → xxxx

我的场景是:connected1 正在连接,然后设备断电重连(此时emqx 未检测到离线),设备重连相当于“顶号”,然后 emqx 推送disconnected1 、connected2 。

但是发现 disconnected1#disconnected_at(1756839646561) 大于 connected2#connected_at(1756839646560),这个是不正常的,应该是先离线再上线。

而且事件也是先推送disconnected1 ,再推送 connected2,预期是 disconnected1#disconnected_at <= connected2#connected_at

这里是我测试“顶号”的上下线事件的数据。

messageArrived topic:$SYS/brokers/emqx@172.17.0.3/clients/p4on0scli6nq8syk0i76sa/connected
messageArrived Qos:0
messageArrived context:{"ipaddress":"192.168.46.119","expiry_interval":0,"clean_start":true,"sockport":1883,"proto_ver":4,"proto_name":"MQTT","connected_at":1756894131290,"clientid":"p4on0scli6nq8syk0i76sa","username":"p4on0scli6nq8syk0i76sa|productId=nrbhc2ebpfvqktjx3u5n,signMethod=hmacSha256,authType=1,timestamp=11111111111111","ts":1756894131290,"protocol":"mqtt","keepalive":60}

messageArrived topic:$SYS/brokers/emqx@172.17.0.3/clients/p4on0scli6nq8syk0i76sa/disconnected
messageArrived Qos:0
messageArrived context:{"ipaddress":"192.168.46.119","disconnected_at":1756894151733,"sockport":1883,"proto_ver":4,"proto_name":"MQTT","connected_at":1756894131290,"clientid":"p4on0scli6nq8syk0i76sa","username":"p4on0scli6nq8syk0i76sa|productId=nrbhc2ebpfvqktjx3u5n,signMethod=hmacSha256,authType=1,timestamp=11111111111111","ts":1756894151733,"protocol":"mqtt","reason":"discarded"}

messageArrived topic:$SYS/brokers/emqx@172.17.0.3/clients/p4on0scli6nq8syk0i76sa/connected
messageArrived Qos:0
messageArrived context:{"ipaddress":"192.168.46.119","expiry_interval":0,"clean_start":true,"sockport":1883,"proto_ver":4,"proto_name":"MQTT","connected_at":1756894151735,"clientid":"p4on0scli6nq8syk0i76sa","username":"p4on0scli6nq8syk0i76sa|productId=nrbhc2ebpfvqktjx3u5n,signMethod=hmacSha256,authType=1,timestamp=11111111111111","ts":1756894151735,"protocol":"mqtt","keepalive":60}

messageArrived topic:$SYS/brokers/emqx@172.17.0.3/clients/p4on0scli6nq8syk0i76sa/disconnected
messageArrived Qos:0
messageArrived context:{"ipaddress":"192.168.46.119","disconnected_at":1756894404089,"sockport":1883,"proto_ver":4,"proto_name":"MQTT","connected_at":1756894151735,"clientid":"p4on0scli6nq8syk0i76sa","username":"p4on0scli6nq8syk0i76sa|productId=nrbhc2ebpfvqktjx3u5n,signMethod=hmacSha256,authType=1,timestamp=11111111111111","ts":1756894404089,"protocol":"mqtt","reason":"discarded"}

messageArrived topic:$SYS/brokers/emqx@172.17.0.3/clients/p4on0scli6nq8syk0i76sa/connected
messageArrived Qos:0
messageArrived context:{"ipaddress":"192.168.46.119","expiry_interval":0,"clean_start":true,"sockport":1883,"proto_ver":4,"proto_name":"MQTT","connected_at":1756894404090,"clientid":"p4on0scli6nq8syk0i76sa","username":"p4on0scli6nq8syk0i76sa|productId=nrbhc2ebpfvqktjx3u5n,signMethod=hmacSha256,authType=1,timestamp=11111111111111","ts":1756894404090,"protocol":"mqtt","keepalive":60}

旧版本没有字段可以判断那一对是真的一对,不能用到达的顺序来决定,这事件是不保证顺序的,特别是在使用的是HTTP webhook (里面有进程池,还有pipeline,如果有事件时间非常近,顺序乱的概率就更大)ps 其他mqtt订阅也不能保证顺序,只是说http更大概率

您好,

1.我看此版本的 diconnect 事件的 connected_at 是对得上 connect 事件的 connected_at,在5.7版本中,这2个数据是不具体对照作用?

2.我这边不关心事件推送到达的顺序。我这边关心的是, “顶号”行为产生的 disconnected1 和 connected2 在 EMQX 集群中生成的先后顺序。按照您前面的描述,是先生成了 disconnected1 ,再 产生 connected2 ,如果逻辑是这样的话,为什么我这边收到的 disconnected1的 ts > connected2 的 ts 呢? 尽管 disconnected1 和 connected2 不是一对,但是在时间线上,肯定是有先后顺序的吧。

再次麻烦解答下,感谢:pray:

有相同问题

请问最后有解决吗

你好,目前我这边又出现了同样的情况:

设备client1 先连接上,然后在心跳间隔周期内,快速进行断开重启,client1 会进行重新连接。

此时会发生client1是会发生挤下线的行为,
整体的事件顺序是这样: connect1 → disconnect1 → connect2,

现象场景: client1 重新连接挤号行为,emqx会推送 一个 disconnect1 和 一个 connect2 事件,按照逻辑,应该是先生成 disconnect1 事件,再生成 connect2 事件,那么 disconnect1 事件的disconnected_at时间戳,是要小于 connect2 事件的 connected_at时间戳(这里不管最终推送抵达的先后顺序,只关心disconnected_at 和 connected_at 2个时间戳)。
出现的问题是: disconnect1 事件的 disconnected_at 时间戳 大于 connect2 事件的 connected_at 时间戳,预期肯定是小于的,毕竟是先产生disconnect1 再产生connect2;同时我也抓取了日志,确定disconnect1 的 connected_at 是和 connect1 的 connected_at 对得上的

目前我们有采用上下线的时间戳来保障设备的在线离线状态的准确性

期望:1.希望官方能帮忙定位下这个问题;2.有没有一些方案规避这个问题,不影响我的业务逻辑

@zhongwencool @heeejianbo

补充下日志: 分别是 connect1 、disconnect1 、connect2

2026-02-27 16:29:50.194 [MQTT Call: sub_connect_172.31.0.1:18081_3] INFO  c.s.i.mqtt.callback.ConnectCallback - ConnectEventStr:{"ipaddress":"219.133.191.30","expiry_interval":7200000,"clean_start":false,"sockport":1883,"connected_at":1772180990194,"proto_ver":4,"proto_name":"MQTT","clientid":"*****","username":"*****","ts":1772180990194,"protocol":"mqtt","keepalive":60}
2026-02-27 16:43:03.593 [MQTT Call: sub_disconnect_172.31.0.1:18081_3] INFO  c.s.i.m.callback.DisconnectCallback - DisconnectEventStr:{"ipaddress":"219.133.191.30","disconnected_at":1772181783592,"sockport":1883,"connected_at":1772180990194,"proto_ver":4,"proto_name":"MQTT","clientid":"*****","username":"*****","ts":1772181783592,"protocol":"mqtt","reason":"takenover"}
2026-02-27 16:43:03.593 [MQTT Call: sub_connect_192.168.144.1:18081_4] INFO  c.s.i.mqtt.callback.ConnectCallback - ConnectEventStr:{"ipaddress":"219.133.191.30","expiry_interval":7200000,"clean_start":false,"sockport":1883,"connected_at":1772181783591,"proto_ver":4,"proto_name":"MQTT","clientid":"*****","username":"*****","ts":1772181783591,"protocol":"mqtt","keepalive":60}

可以参考一下这个:订阅客户端上下线消息,客户端连接上broker后,先收到一个disconnected消息,然后收到一个connected消息

1.里面提到的"过滤掉 reason = takenover 或者 reason = discarded 的 disonnect 事件" 是一种解决方案,感谢!

2.但是下面这个,与我当前遇到的不符合,上面日志的 disconnected_at1 是比 connected_at2 要晚