求助在设备离线消息获取问题？

Mqtt · 2025 年3 月 14 日 03:32

使用的EMQX 版本为 [5.8.4 (Open Source)]

$SYS/brokers/${node}/clients/connected 和disconnected 事件中，怎么没有连接属性，用的MQTT V5 协议连接也没有？这正常么？

$events/client_connected 和 disconnected 事件中有连接属性，但在离线事件中不带对应的上线时间？而系统主题中的离线事件，却带对应的上线时间这正常么？

$events/client_disconnected 事件中，可以带上对应的上线时间么？因为在实际测试中，客户端网络不好的情况下，设备会多次断开重连，其中断开消息会晚于最新的在线消息，导致覆盖掉最新的消息；

实测数据如下：
序号为1的离线事件，是序号3 在线事件对应的事件
序号2 为设备的最终状态，但序号3 的离线事件，在业务逻辑中不特殊处理的情况下，会覆盖掉设备的最终状态

 1、 {\"ipaddress\":\"36.110.46.106\",\"disconnected_at\":1740160870627,\"sockport\":8883,\"connected_at\":1740160813688,\"proto_name\":\"MQTT\",\"proto_ver\":5,\"clientid\":\"80482C7B0FE8\",\"username\":\"80482C7B0FE8\",\"ts\":1740160870627,\"protocol\":\"mqtt\",\"reason\":\"ssl_closed\"}"
 2、 {\"ipaddress\":\"36.110.46.106\",\"expiry_interval\":5000,\"clean_start\":false,\"sockport\":8883,\"connected_at\":1740160856111,\"proto_name\":\"MQTT\",\"proto_ver\":5,\"clientid\":\"80482C7B0FE8\",\"username\":\"80482C7B0FE8\",\"ts\":1740160856111,\"protocol\":\"mqtt\",\"keepalive\":60}"
 3、 {\"ipaddress\":\"36.110.46.106\",\"expiry_interval\":5000,\"clean_start\":false,\"sockport\":8883,\"connected_at\":1740160813688,\"proto_name\":\"MQTT\",\"proto_ver\":5,\"clientid\":\"80482C7B0FE8\",\"username\":\"80482C7B0FE8\",\"ts\":1740160813688,\"protocol\":\"mqtt\",\"keepalive\":60}"
 4、 {\"ipaddress\":\"36.110.46.106\",\"disconnected_at\":1740160807434,\"sockport\":8883,\"connected_at\":1740139101779,\"proto_name\":\"MQTT\",\"proto_ver\":5,\"clientid\":\"80482C7B0FE8\",\"username\":\"80482C7B0FE8\",\"ts\":1740160807434,\"protocol\":\"mqtt\",\"reason\":\"ssl_closed\"}"

在系统主题中每个离线事件有在线时间，可以根据在线时间判断，是否要丢弃到该离线事件；但通过规则引擎过来的数据离线数据，不带上线时间，就无法做这个判断；

zhongwencool · 2025 年3 月 14 日 07:29

正常，不过可以当一个 feature加到下个版本里。

也正常，后续可以在 disconnect 里面加上 connected_at时间。
这样，处理流程

收到 disconnect 事件时，存上connected_at时间。
收到下一个 disconnect 事件，对比 connected_at 时间，如果大于 1 中的时间就更新，否则就直接忽略。

我觉得这是一个非常好的需求建议！看看你还有什么要补充的么

Mqtt · 2025 年3 月 14 日 07:44

没有要补充的了，期待这两个更新

zhongwencool · 2025 年3 月 14 日 08:06

github.com/emqx/emqx

feat: add connected_at field to events/client_disconnected payload

emqx:release-58 ← zhongwencool:add-connected_at-into-disconnect-event

opened 08:05AM - 14 Mar 25 UTC

zhongwencool

+3 -1

Fixes <issue-or-jira-number> Release version: v/e5.8.6 In EMQX 5.8.4 Open …Source: System topics ($SYS/brokers/${node}/clients/connected/disconnected) include connected_at in disconnect events, but this field is missing in Rule Engine's $events/client_disconnected events. When clients reconnect frequently due to poor networks, delayed disconnect events may overwrite newer connection statuses. Example sequence: Disconnect event (1) contains old connected_at: 1740160813688 New connect event (3) has connected_at: 1740160856111 Without connected_at in disconnect events, business logic cannot determine which connection session the disconnect belongs to. Root Cause: Rule Engine's client_disconnected event payload does not include the connected_at timestamp from the original connection session, while the system topics serialize this field correctly. Proposed Solution: Add connected_at field to the payload of `$events/client_disconnected` events to match the behavior of system topics. This also fix https://github.com/emqx/emqx/issues/14837 1. Upon receiving a client_disconnected event, store its connected_at timestamp. 2. For subsequent client_disconnected events: - Compare the new event’s connected_at with the stored value. - Update the status if the new connected_at is newer. - Ignore the event if the new connected_at is older or equal. ## Summary ## PR Checklist Please convert it to a draft if any of the following conditions are not met. Reviewers may skip over until all the items are checked: - [ ] Added tests for the changes - [ ] Added property-based tests for code which performs user input validation - [ ] Changed lines covered in coverage report - [ ] Change log has been added to `changes/(ce|ee)/(feat|perf|fix|breaking)-<PR-id>.en.md` files - [ ] For internal contributor: there is a jira ticket to track this change - [ ] Created PR to [emqx-docs](https://github.com/emqx/emqx-docs) if documentation update is required, or link to a follow-up jira ticket - [ ] Schema changes are backward compatible ## Checklist for CI (.github/workflows) changes - [ ] If changed package build workflow, pass [this action](https://github.com/emqx/emqx/actions/workflows/build_packages.yaml) (manual trigger) - [ ] Change log has been added to `changes/` dir for user-facing artifacts update

zhongwencool · 2025 年3 月 14 日 09:06

github.com/emqx/emqx

feat: add disconn_props/conn_props into sys topic

emqx:release-58 ← zhongwencool:add-prop-into-sys-topic

opened 09:06AM - 14 Mar 25 UTC

zhongwencool

+103 -94

Fixes <issue-or-jira-number> Release version: v/e5.8.6 - rule_event(connec…ted) has `conn_prop/receive_maximum/client_attrs`, but sys_topic don't have. - rule_event(disconnected) has `disconn_prop/client_attrs`, but sys_topic don't have. - rename `emqx_rule_event:printable_maps/1` to `emqx_utils_maps:printable_props/1` ## Summary ## PR Checklist Please convert it to a draft if any of the following conditions are not met. Reviewers may skip over until all the items are checked: - [ ] Added tests for the changes - [ ] Added property-based tests for code which performs user input validation - [ ] Changed lines covered in coverage report - [ ] Change log has been added to `changes/(ce|ee)/(feat|perf|fix|breaking)-<PR-id>.en.md` files - [ ] For internal contributor: there is a jira ticket to track this change - [ ] Created PR to [emqx-docs](https://github.com/emqx/emqx-docs) if documentation update is required, or link to a follow-up jira ticket - [ ] Schema changes are backward compatible ## Checklist for CI (.github/workflows) changes - [ ] If changed package build workflow, pass [this action](https://github.com/emqx/emqx/actions/workflows/build_packages.yaml) (manual trigger) - [ ] Change log has been added to `changes/` dir for user-facing artifacts update

yooka · 2025 年7 月 2 日 11:35

在大量设备同时反复断线后重连的情况下还是会出现这个问题，disconnect_at和connected_at这2个时间戳会是一样的。

yooka · 2025 年7 月 2 日 11:36

我是用的http的webhook，这种情况是不是需要在webhook里面配置请求为同步才行？

yooka · 2025 年7 月 2 日 11:37

[connectHook,63] - Connection, clientid: IN1008146, connected at: 1751450998554, disconnect at: null
[connectHook,63] - Connection, clientid: IN1008146, connected at: null, disconnect at: 1751450998554
如上面这种情况，在大量设备反复离线上线的情况下，同一个clientid的connect_at和disconnect_at都是1751450998554，导致无法区分最终到底是在线还是离线状态了。

zhongwencool · 2025 年7 月 2 日 12:39

你可以看看这个兄弟是怎么解决的，我觉得他写得蛮清晰的

yooka · 2025 年7 月 3 日 01:09

这个兄弟的问题是离线和上线时间不相等的情况下，可以通过时间来确定最终状态。我遇到的问题是上线和离线时间相等，无法根据时间来区分，当然也有解决办法，就是对于这种时间相等的情况调用API进行查询获取最终状态进行纠正。

zhongwencool · 2025 年7 月 3 日 01:37

这个毫秒级的相等应该是极小概率事件，
你提供的日志，我猜测是之前的版本，更新后可以在disconnect事件中看到connected at的时间，这个才是对的，而不是只用用相同的clientid去看。因为可能存在相同client互相挤下线的情况

yooka · 2025 年7 月 3 日 02:15

对，就是极端情况下的概率事件。
这个问题的复现情况如下:
1.用nginx代理转发到emqx，
2.用客户端工具创建1w个连接
3.1w连接还未完全连接完的情况下重启nginx
4.间隔不定时反复第3步操作几次，比如连接到8000个的时候就进行重启，其中就有10-20个连接会出现这种情况。
我用的5.8.4版本，我看5.8.6版本的disconnect加了这个connected_at，但是这个connected_at和disconnected_at的逻辑没变化，这种毫秒一致的极端情况应该还是存在，目前就是对这种情况调用API获取最终状态纠正。

zhongwencool · 2025 年7 月 3 日 02:32

我觉得不存在，你更新试试。