使用某个clientId时, 发送消息成功, 但是订阅者收不到消息

问题描述:
服务端发送消息,设备端收不到。
服务端运行了几个月一直可以,就近突然不行。排查后发现是服务端的那个clientID有问题,用其他新的clientID发送就可以收到,用mqttx模拟设备端和服务端也是这样的问题。

emqx版本为5.8.6。
没做过任何操作。
使用mqttx模拟并采集了debug级别的日志,模拟流程如下:

device1 订阅消息;
config_center_admin_prod_239 连接并发送消息,主题为 p2p/device1,连发三条消息device1均可收到;
config_center_admin_prod_239 断开连接;
config_center_admin_prod_235 连接并发送消息,主题为 p2p/device1,连发三条消息device1均收不到。

提示:日志是生产环境采集的,所以有很多其他客户端的日志。
log.zip (14.1 KB)

2025-08-18T15:23:49.162901+08:00 [debug] tag: MQTT, msg: raw_bin_received, peername: 113.105.236.111:52874, size: 83, type: hex, bin: 105100044D51545405C2003C051100000000001C636F6E6669675F63656E7465725F61646D696E5F70726F645F323335000D636F6E6669675F63656E7465720012436F6E666967323332337364664D474A2370
2025-08-18T15:23:49.163109+08:00 [debug] tag: MQTT, clientid: config_center_admin_prod_235, msg: mqtt_packet_received, peername: 113.105.236.111:52874, username: config_center, packet: CONNECT(Q0, R0, D0, ClientId=config_center_admin_prod_235, ProtoName=MQTT, ProtoVsn=5, CleanStart=true, KeepAlive=60, Username=config_center, Password=******)
2025-08-18T15:23:49.163291+08:00 [debug] tag: AUTHN, clientid: config_center_admin_prod_235, msg: authenticator_result, peername: 113.105.236.111:52874, username: config_center, result: {ok,#{is_superuser => true}}, authenticator: <<"password_based:built_in_database">>
2025-08-18T15:23:49.163390+08:00 [debug] tag: AUTHN, clientid: config_center_admin_prod_235, msg: authentication_result, peername: 113.105.236.111:52874, username: config_center, reason: chain_result, result: {stop,{ok,#{is_superuser => true}}}
2025-08-18T15:23:49.163805+08:00 [debug] tag: RULE_SQL_EXEC, clientid: config_center_admin_prod_235, msg: rule_activated, peername: 113.105.236.111:52874, username: config_center, input: #{node => 'emqx@127.0.0.1',timestamp => 1755501829163,peername => <<"113.105.236.111:52874">>,sockname => <<"172.19.120.197:2994">>,keepalive => 60,event => 'client.connected',username => <<"config_center">>,clientid => <<"config_center_admin_prod_235">>,proto_ver => 5,client_attrs => #{},connected_at => 1755501829163,is_bridge => false,proto_name => <<"MQTT">>,mountpoint => undefined,clean_start => true,expiry_interval => 0,conn_props => #{'User-Property' => #{},'Session-Expiry-Interval' => 0},receive_maximum => 32}, environment: #{}
2025-08-18T15:23:49.164031+08:00 [debug] tag: RULE_SQL_EXEC, clientid: config_center_admin_prod_235, msg: SQL_yielded_no_result, peername: 113.105.236.111:52874, username: config_center
2025-08-18T15:23:49.164201+08:00 [debug] clientid: config_center_admin_prod_235, msg: insert_channel_info, peername: 113.105.236.111:52874, username: config_center
2025-08-18T15:23:49.164300+08:00 [debug] tag: MQTT, clientid: config_center_admin_prod_235, msg: mqtt_packet_sent, peername: 113.105.236.111:52874, username: config_center, packet: CONNACK(Q0, R0, D0, AckFlags=0, ReasonCode=0)
2025-08-18T15:23:49.178722+08:00 [debug] clientid: FSLJ009ZZX, msg: sess_poll_timeout, peername: 223.104.83.79:16196, username: device, ref: #Ref<0.0.21630835.3852116763.3568369665.35613>
2025-08-18T15:23:49.179659+08:00 [debug] clientid: FSLJ009ZZX, msg: sessds_push, peername: 223.104.83.79:16196, username: device
2025-08-18T15:23:49.298730+08:00 [debug] clientid: device1, msg: sess_poll_timeout, peername: 113.105.236.111:52179, username: device, ref: #Ref<0.0.59932547.3852116763.3494707202.205714>
2025-08-18T15:23:49.299659+08:00 [debug] clientid: device1, msg: sessds_push, peername: 113.105.236.111:52179, username: device
2025-08-18T15:23:49.449675+08:00 [debug] clientid: device1, msg: sess_poll_timeout, peername: 113.105.236.111:52179, username: device, ref: #Ref<0.0.59932547.3852116763.3568369665.35655>
2025-08-18T15:23:49.450658+08:00 [debug] clientid: device1, msg: sessds_push, peername: 113.105.236.111:52179, username: device
2025-08-18T15:23:49.458717+08:00 [debug] clientid: FSGM030007, msg: sess_poll_timeout, peername: 36.151.218.74:33966, username: device, ref: #Ref<0.0.60926851.3852116763.3568369665.35663>
2025-08-18T15:23:49.459657+08:00 [debug] clientid: FSGM030007, msg: sessds_push, peername: 36.151.218.74:33966, username: device

应该是开启了 ds session 的什么功能导致的sess_poll_timeout。(你的 ds是如何设置的,关掉应该就能正常收发。。。)

这情况应该是一个 bug,麻烦在 GitHub · Where software is built 上提一个 bug,说明一下如何配置的 ds,让同事看看(我对 ds 内部实现不熟…)

ds是指会话持久化吗? 确实有一个现象,如下: 设备和服务端都是开启会话持久化的, 设备端关闭会话持久化就不会出现这个问题。