conn_congestion频繁告警,订阅端连接后立即断开

环境

  • EMQX 版本:5.0.21
  • 操作系统版本:centos

(1)conn_congestion频繁告警
2024-12-16T19:29:17.338399+08:00 [warning] msg: alarm_is_activated, mfa: emqx_alarm:do_actions/3, line: 416, message: <<"connection congested: #{buffer => 4096,clientid => <<"UTMQTT_CLIENT_ASYNC_172.20.0.36_7116182459584348773">>,conn_state => connected,connected_at => 1734348547840,high_msgq_watermark => 8192,high_watermark => 1048576,memory => 29720,message_queue_len => 0,peername => <<"172.20.0.1:58162">>,pid => <<"<0.10188.0>">>,proto_name => <<"MQTT">>,proto_ver => 5,recbuf => 374400,recv_cnt => 2,recv_oct "…>>, name: <<“conn_congestion/UTMQTT_CLIENT_ASYNC_172.20.0.36_7116182459584348773/admin”>>

2024-12-16T19:30:17.501433+08:00 [warning] msg: alarm_is_deactivated, mfa: emqx_alarm:do_actions/3, line: 422, name: <<“conn_congestion/UTMQTT_CLIENT_ASYNC_172.20.0.36_7116182459584348773/admin”>>

2024-12-16T19:30:31.045819+08:00 [warning] msg: alarm_is_activated, mfa: emqx_alarm:do_actions/3, line: 416, message: <<"connection congested: #{buffer => 4096,clientid => <<"UTMQTT_CLIENT_ASYNC_172.20.0.36_7116182459584348773">>,conn_state => connected,connected_at => 1734348547840,high_msgq_watermark => 8192,high_watermark => 1048576,memory => 42512,message_queue_len => 0,peername => <<"172.20.0.1:58162">>,pid => <<"<0.10188.0>">>,proto_name => <<"MQTT">>,proto_ver => 5,recbuf => 374400,recv_cnt => 3,recv_oct "…>>, name: <<“conn_congestion/UTMQTT_CLIENT_ASYNC_172.20.0.36_7116182459584348773/admin”>>

2024-12-16T19:30:42.652575+08:00 [warning] msg: busy_port, mfa: emqx_sys_mon:handle_info/2, line: 178, portinfo: [{port,#Port<0.350>},{name,“tcp_inet”},{links,[<0.10188.0>]},{id,2800},{connected,<0.10188.0>},{input,0},{output,38294881},{os_pid,undefined}], procinfo: [{pid,<0.10188.0>},{memory,42512},{total_heap_size,5172},{heap_size,2586},{stack_size,34},{min_heap_size,233},{proc_lib_initial_call,{emqx_connection,init,[‘Argument__1’,‘Argument__2’,‘Argument__3’,‘Argument__4’]}},{initial_call,{proc_lib,init_p,5}},{current_stacktrace,[{erlang,port_command,3,},{esockd_transport,async_send,3,[{file,“esockd_transport.erl”},{line,153}]},{emqx_connection,send,2,[{file,“emqx_connection.erl”},{line,902}]},{emqx_connection,process_msg,2,[{file,“emqx_connection.erl”},{line,487}]},{emqx_connection,process_msg,2,[{file,“emqx_connection.erl”},{line,493}]},{emqx_connection,handle_recv,3,[{file,“emqx_connection.erl”},{line,449}]},{proc_lib,wake_up,3,[{file,“proc_lib.erl”},{line,236}]}]},{registered_name,},{status,suspended},{message_queue_len,2},{group_leader,<0.1964.0>},{priority,normal},{trap_exit,false},{reductions,21211658},{last_calls,false},{catchlevel,4},{trace,0},{suspending,},{sequential_trace_token,},{error_handler,error_handler}]

2024-12-16T19:30:45.294649+08:00 [warning] msg: alarm_is_activated, mfa: emqx_alarm:do_actions/3, line: 416, message: <<“connection congested: #{buffer => 4096,clientid => <<"admin;cloud;1734338961">>,conn_state => connected,connected_at => 1734348519769,high_msgq_watermark => 8192,high_watermark => 1048576,memory => 143456,message_queue_len => 0,peername => <<"172.20.0.1:57770">>,pid => <<"<0.9896.0>">>,proto_name => <<"MQTT">>,proto_ver => 4,recbuf => 3119368,recv_cnt => 55440,recv_oct => 26036210,reductions =”…>>, name: <<“conn_congestion/admin;cloud;1734338961/admin”>>

2024-12-16T19:30:51.533751+08:00 [warning] msg: alarm_is_deactivated, mfa: emqx_alarm:do_actions/3, line: 422, name: <<“conn_congestion/UTMQTT_CLIENT_ASYNC_172.20.0.36_7116182459584348773/admin”>>

2024-12-16T19:30:51.827774+08:00 [warning] msg: busy_port, mfa: emqx_sys_mon:handle_info/2, line: 178, portinfo: [{port,#Port<0.335>},{name,“tcp_inet”},{links,[<0.9896.0>]},{id,2680},{connected,<0.9896.0>},{input,0},{output,33053184},{os_pid,undefined}], procinfo: [{pid,<0.9896.0>},{memory,111640},{total_heap_size,13544},{heap_size,2586},{stack_size,34},{min_heap_size,233},{proc_lib_initial_call,{emqx_connection,init,[‘Argument__1’,‘Argument__2’,‘Argument__3’,‘Argument__4’]}},{initial_call,{proc_lib,init_p,5}},{current_stacktrace,[{erlang,port_command,3,},{esockd_transport,async_send,3,[{file,“esockd_transport.erl”},{line,153}]},{emqx_connection,send,2,[{file,“emqx_connection.erl”},{line,902}]},{emqx_connection,process_msg,2,[{file,“emqx_connection.erl”},{line,487}]},{emqx_connection,process_msg,2,[{file,“emqx_connection.erl”},{line,493}]},{emqx_connection,handle_recv,3,[{file,“emqx_connection.erl”},{line,449}]},{proc_lib,wake_up,3,[{file,“proc_lib.erl”},{line,236}]}]},{registered_name,},{status,suspended},{message_queue_len,19},{group_leader,<0.1964.0>},{priority,normal},{trap_exit,false},{reductions,473813009},{last_calls,false},{catchlevel,4},{trace,0},{suspending,},{sequential_trace_token,},{error_handler,error_handler}]

2024-12-16T19:31:01.344889+08:00 [warning] msg: alarm_is_deactivated, mfa: emqx_alarm:do_actions/3, line: 422, name: <<“conn_congestion/admin;cloud;1734338961/admin”>>

(2)日志追踪:MQTT_CLIENT_ASYNC_172.20.0.36_7116182459584348773,连接上后立即断开
2024-12-16T19:31:02+08:00 [MQTT] MQTT_CLIENT_ASYNC_172.20.0.36_7116182459584348773@172.20.0.1:59790 msg: mqtt_packet_received, packet: CONNECT(Q0, R0, D0, ClientId=MQTT_CLIENT_ASYNC_172.20.0.36_7116182459584348773, ProtoName=MQTT, ProtoVsn=5, CleanStart=true, KeepAlive=60, Username=admin, Password=******)
2024-12-16T19:31:02+08:00 [MQTT] MQTT_CLIENT_ASYNC_172.20.0.36_7116182459584348773@172.20.0.1:59790 msg: mqtt_packet_sent, packet: CONNACK(Q0, R0, D0, AckFlags=0, ReasonCode=0)
2024-12-16T19:31:58+08:00 [MQTT] MQTT_CLIENT_ASYNC_172.20.0.36_7116182459584348773@172.20.0.1:59790 msg: mqtt_packet_received, packet: DISCONNECT(Q0, R0, D0, ReasonCode=0)
2024-12-16T19:31:58+08:00 [SOCKET] MQTT_CLIENT_ASYNC_172.20.0.36_7116182459584348773@172.20.0.1:59790 msg: socket_force_closed, reason: normal
2024-12-16T19:31:58+08:00 [SOCKET] MQTT_CLIENT_ASYNC_172.20.0.36_7116182459584348773@172.20.0.1:59790 msg: emqx_connection_terminated, reason: {shutdown,normal}

  1. conn_congestion 这个是因为下面连上来就断开引起的,重连太频繁了。
  2. msg: busy_port, 说明这个连接肯定在传输很多消息。
  3. 日志追踪:里面只是连上来后,没看到消息就断开了。综合 2 可以看出,他肯定是有很多的消息在消息队列和飞行窗口的会话里面,一上线就会把没有发送的这些消息发给他,然后他消化不了,就断开了 tcp 连接。

可以查看一下客户端的日志,看是不是收到很多消息然后处理不过来后断开了。

您好,请问下面的message_queue_len的值是不是就是积压的未处理的消息数量:


是的。

请问对于这种情况,配置里可以增加或设置哪些参数

已经 5W 条消息没有处理了,改配置还不影响功能已经不可能了。
你可以适当减少一下

好的,感谢

你好,咨询下,这里的clean Start已经设置为true,理论上重连后,会建立新的会话,为啥旧会话的飞行窗口和消息队列的数据还会给重连后的客户端?

你好,我没有研究过这个,可以问一下官方大佬