集群节点1883端口无法提供服务(5节点集群)

集群隔段时间就会出现某节点1883端口无法提供服务,控制台查看,大概总连接数 80w,在线连接数30w,控制台看节点连接为0.

EMQX 5.8.3开源版本

etc/emqx.conf
node {
name = “emqx@xxx1”
cookie = “xxxxxxxxxxxxx”
max_ports = 2097152
data_dir = “/emqx/data”
}

cluster {
name = emqxcl
autoheal = true
discovery_strategy = static
static {
seeds = [“emqx@xxx1”, “emqx@xxx2”,“emqx@xxx3”,“emqx@xxx4”,“emqx@xxx5”]
}
}
dashboard {
listeners {
http.bind = 18083
# https.bind = 18084
https {
ssl_options {
certfile = “${EMQX_ETC_DIR}/certs/cert.pem”
keyfile = “${EMQX_ETC_DIR}/certs/key.pem”
}
}
}
}

Fri Feb 27 13:43:13 控制台显示,节点连接数为0,telnet 节点1883端口不通,实际服务进程存在,ss -l 端口也存在。

日志
cat run_erl.log
run_erl [85660] Fri Feb 27 13:43:13 2026
Args before exec of shell:
run_erl [85660] Fri Feb 27 13:43:13 2026
argv[0] = sh
run_erl [85660] Fri Feb 27 13:43:13 2026
argv[1] = -c
run_erl [85660] Fri Feb 27 13:43:13 2026
argv[2] = exec “/usr/local/emqx-5.8.3/emqx-5.8.3/bin/emqx” “console”

cat emqx.log.14|grep -E ‘error|fail’|

2026-02-27T12:00:02.483722+08:00 [warning] msg: long_schedule, info: [{timeout,254},{in,{emqx_connection,recvloop,2}},{out,{erlang,port_command,3}}], procinfo: [{pid,<0.78328272.2>},{memory,22008},{total_heap_size,2586},{heap_size,2586},{stack_size,9},{min_heap_size,233},{proc_lib_initial_call,{emqx_connection,init,[‘Argument__1’,‘Argument__2’,‘Argument__3’,‘Argument__4’]}},{initial_call,{proc_lib,init_p,5}},{current_stacktrace,[{emqx_connection,recvloop,2,[{file,“emqx_connection.erl”},{line,411}]},{proc_lib,wake_up,3,[{file,“proc_lib.erl”},{line,251}]}]},{registered_name,},{status,waiting},{message_queue_len,0},{group_leader,<0.4176.0>},{priority,normal},{trap_exit,false},{reductions,8999775},{last_calls,false},{catchlevel,1},{trace,0},{suspending,},{sequential_trace_token,},{error_handler,error_handler}]
2026-02-27T12:00:02.493724+08:00 [warning] msg: long_schedule, info: [{timeout,255},{in,{emqx_connection,recvloop,2}},{out,{erlang,port_command,3}}], procinfo: [{pid,<0.67323392.2>},{memory,47424},{total_heap_size,5783},{heap_size,1598},{stack_size,9},{min_heap_size,233},{proc_lib_initial_call,{emqx_connection,init,[‘Argument__1’,‘Argument__2’,‘Argument__3’,‘Argument__4’]}},{initial_call,{proc_lib,init_p,5}},{current_stacktrace,[{emqx_connection,recvloop,2,[{file,“emqx_connection.erl”},{line,411}]},{proc_lib,wake_up,3,[{file,“proc_lib.erl”},{line,251}]}]},{registered_name,},{status,waiting},{message_queue_len,0},{group_leader,<0.4176.0>},{priority,normal},{trap_exit,false},{reductions,19910},{last_calls,false},{catchlevel,1},{trace,0},{suspending,},{sequential_trace_token,},{error_handler,error_handler}]
2026-02-27T12:00:02.494906+08:00 [warning] msg: long_schedule, info: [{timeout,296},{in,{emqx_connection,recvloop,2}},{out,{erts_internal,await_result,1}}], procinfo: [{pid,<0.76308925.2>},{memory,34688},{total_heap_size,4191},{heap_size,2586},{stack_size,9},{min_heap_size,233},{proc_lib_initial_call,{emqx_connection,init,[‘Argument__1’,‘Argument__2’,‘Argument__3’,‘Argument__4’]}},{initial_call,{proc_lib,init_p,5}},{current_stacktrace,[{emqx_connection,recvloop,2,[{file,“emqx_connection.erl”},{line,411}]},{proc_lib,wake_up,3,[{file,“proc_lib.erl”},{line,251}]}]},{registered_name,},{status,waiting},{message_queue_len,0},{group_leader,<0.4176.0>},{priority,normal},{trap_exit,false},{reductions,61351},{last_calls,false},{catchlevel,1},{trace,0},{suspending,},{sequential_trace_token,},{error_handler,error_handler}]
2026-02-27T12:00:38.491105+08:00 [warning] msg: alarm_is_activated, message: <<“resource down: #{error => timeout,status => disconnected}”>>, name: <<“action:http:action-data-mqtt-webhook:connector:http:data-mqtt-webhook”>>
2026-02-27T12:00:40.084149+08:00 [error] tag: ERROR, msg: send_error, id: <<“action:http:action-data-mqtt-webhook:connector:http:data-mqtt-webhook”>>, reason: {recoverable_error,<<“channel: "action:http:action-data-mqtt-webhook:connector:http:data-mqtt-webhook" not operational”>>}, rule_id: <<“rule-data-mqtt-conn”>>, rule_trigger_ts: [1772164840083]
2026-02-27T12:00:43.110512+08:00 [error] tag: ERROR, msg: send_error, id: <<“action:http:action-data-mqtt-webhook:connector:http:data-mqtt-webhook”>>, reason: {recoverable_error,<<“channel: "action:http:action-data-mqtt-webhook:connector:http:data-mqtt-webhook" not operational”>>}, rule_id: <<“rule-data-mqtt-conn”>>, rule_trigger_ts: [1772164843110]
2026-02-27T12:00:53.428459+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: health_check_failed, status: disconnected, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:01:56.442542+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:02:59.468837+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:04:02.486537+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:05:05.503984+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:06:08.523913+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:07:11.545412+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:08:14.564619+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:09:17.583393+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:10:20.601825+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:11:23.620453+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:12:26.637630+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:13:29.654618+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:14:32.669699+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:15:35.684527+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:16:00.260417+08:00 [error] msg: gen_rpc_error, error: channel_error, driver: tcp, reason: econnreset, socket: #Port<0.4977228>, peer: xxx4:22156, action: stopping
2026-02-27T12:16:00.261771+08:00 [error] msg: gen_rpc_error, error: channel_error, driver: tcp, reason: econnreset, socket: #Port<0.6024>, peer: xxx3:50072, action: stopping
2026-02-27T12:16:00.270762+08:00 [error] msg: gen_rpc_error, error: channel_error, driver: tcp, reason: econnreset, socket: #Port<0.2929>, peer: xxx1:33516, action: stopping
2026-02-27T12:16:00.275601+08:00 [error] msg: gen_rpc_error, error: channel_error, driver: tcp, reason: econnreset, socket: #Port<0.20361877>, peer: xxx2:10964, action: stopping
2026-02-27T12:16:38.697371+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:17:41.711269+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:18:44.725090+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:19:47.738810+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:20:50.752054+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:21:53.767001+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:22:56.781192+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:23:59.793906+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:25:02.806316+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:26:05.822210+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:27:08.834347+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:28:11.846080+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:29:14.860669+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:30:17.875051+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:31:20.887264+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:32:23.899953+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:33:26.911632+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:34:29.924149+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:35:32.937222+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:36:35.950162+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:37:38.966214+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:38:41.980905+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:39:44.993752+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:40:48.007906+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:41:51.019170+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:42:54.032309+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:43:57.044914+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:45:00.055717+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:46:03.067954+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:47:06.080320+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:48:09.093335+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:49:12.105988+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:50:15.117804+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:51:18.129112+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:52:21.143658+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:53:24.157436+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:54:27.169462+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:55:30.182104+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:56:33.195554+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:57:36.208274+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:58:39.219898+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T12:59:42.231990+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:00:45.244976+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:01:48.256638+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:02:51.268146+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:03:54.280472+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:04:57.292479+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:06:00.304321+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:07:03.316875+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:08:06.328358+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:09:09.341176+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:10:12.352893+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:11:15.366648+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:12:18.379281+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:13:21.393219+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:14:24.407639+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:15:27.420557+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:16:30.432006+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:17:33.444016+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:18:36.457170+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:19:39.468926+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:20:42.482274+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:21:45.495848+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:22:48.507179+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:23:51.518504+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:24:54.531961+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:25:57.544536+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:27:00.557138+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:28:03.569987+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:29:06.581797+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:30:09.595576+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:31:12.607853+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:32:15.620178+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:33:18.632433+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:34:21.644690+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:35:24.656719+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:36:27.668584+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:37:30.679945+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:38:33.693120+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:39:36.705851+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:40:39.717861+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:41:42.729022+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:42:45.741022+08:00 [warning] tag: CONNECTOR/WEBHOOK, msg: start_resource_failed, reason: timeout, resource_id: <<“connector:http:data-mqtt-webhook”>>
2026-02-27T13:57:28.093645+08:00 [error] Process: <0.238030.0> on node ‘emqx@xxx5’, Context: maximum heap size reached, Max Heap Size: 6291456, Total Heap Size: 7545111, Kill: true, Error Logger: true, Message Queue Len: 0, GC Info: [{old_heap_block_size,2984878},{heap_block_size,4560232},{mbuf_size,6037},{recent_size,873792},{stack_size,9},{old_heap_size,0},{heap_size,2066797},{bin_vheap_size,589758},{bin_vheap_block_size,832883},{bin_old_vheap_size,0},{bin_old_vheap_block_size,440983}]
2026-02-27T14:09:50.270273+08:00 [error] Process: <0.528436.0> on node ‘emqx@xxx5’, Context: maximum heap size reached, Max Heap Size: 6291456, Total Heap Size: 10865073, Kill: true, Error Logger: true, Message Queue Len: 0, GC Info: [{old_heap_block_size,4298223},{heap_block_size,6566731},{mbuf_size,7240},{recent_size,1604225},{stack_size,29},{old_heap_size,0},{heap_size,2977757},{bin_vheap_size,1089380},{bin_vheap_block_size,2180423},{bin_old_vheap_size,0},{bin_old_vheap_block_size,713510}]

服务器信息:
Red Hat Enterprise Linux Server release 7.9 (Maipo)
32G16C

uname -a
5.4.17-2136.312.3.4.el7uek.x86_64 #2 SMP Wed Oct 19 17:45:00 PDT 2022 x86_64 x86_64 x86_64 GNU/Linux

/etc/sysctl.conf
fs.file-max = 1048576
net.ipv4.tcp_keepalive_time = 1200
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_sack = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_rmem = 4096 87380 4194304
net.ipv4.tcp_wmem = 4096 16384 4194304
net.ipv4.tcp_max_syn_backlog = 16384
net.core.netdev_max_backlog = 32768
net.core.somaxconn = 32768
net.core.wmem_default = 8388608
net.core.rmem_default = 8388608
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syn_retries = 2
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_tw_reuse = 1
#net.ipv4.tcp_mem = 94500000 915000000 927000000
#net.ipv4.tcp_max_orphans = 3276800
net.ipv4.ip_local_port_range = 1024 65000
#net.nf_conntrack_max = 655350
#net.netfilter.nf_conntrack_max = 655350
#net.netfilter.nf_conntrack_tcp_timeout_close_wait = 60
#net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 120
#net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120
#net.netfilter.nf_conntrack_tcp_timeout_established = 3600
vm.swappiness=10

/etc/systemd/system.conf
#增加
DefaultLimitCORE=infinity
DefaultLimitNOFILE=1048576
DefaultLimitNPROC=120000

/etc/security/limits.conf

  • soft nproc 65535
  • hard nproc 65535
  • soft nofile 1048576
  • hard nofile 1048576

这个现象看起来像“节点监听还在,但 accept/调度被拖死”,5.8.3 在高连接量下也建议先升到同大版本最新补丁再观察。
你给的关键信息是:进程在、ss -l 在、但 1883 不通,且有 long_schedule。这通常不是端口没开,而是节点处理连接的能力被打满或卡住了。
先在故障节点抓这一组(故障发生当下):

emqx ctl listeners
emqx ctl broker stats
emqx ctl vm all
ss -lntp | grep 1883
ss -s
cat /proc/sys/net/core/somaxconn
cat /proc/sys/net/ipv4/tcp_max_syn_backlog
ulimit -n
cat /proc/$(pgrep -f beam.smp | head -1)/limits | grep -i "open files"

再补两段日志:

grep -E "long_schedule|alarm|too_many|accept|emfile|enomem|closed" /usr/local/emqx-5.8.3/log/emqx.log* | tail -n 200
grep -E "Kernel poll|smp|scheduler|system_limit" /usr/local/emqx-5.8.3/log/emqx.log* | tail -n 100

另外你贴的配置里是中文引号(“”),确认实际配置文件里是英文引号("),避免解析歧义。
把上面命令输出贴一下来看看。

你好,配置文件里是英文引号(" )。

grep -E “long_schedule|alarm|too_many|accept|emfile|enomem|closed” emqx.log.13 | tail -n 200
2026-02-27T08:00:02.419854+08:00 [warning] msg: long_schedule, info: [{timeout,416},{in,{emqx_connection,recvloop,2}},{out,{emqx_connection,recvloop,2}}], procinfo: [{pid,<0.70880017.2>},{memory,34744},{total_heap_size,4198},{heap_size,2586},{stack_size,9},{min_heap_size,233},{proc_lib_initial_call,{emqx_connection,init,[‘Argument__1’,‘Argument__2’,‘Argument__3’,‘Argument__4’]}},{initial_call,{proc_lib,init_p,5}},{current_stacktrace,[{emqx_connection,recvloop,2,[{file,“emqx_connection.erl”},{line,411}]},{proc_lib,wake_up,3,[{file,“proc_lib.erl”},{line,251}]}]},{registered_name,},{status,waiting},{message_queue_len,0},{group_leader,<0.4176.0>},{priority,normal},{trap_exit,false},{reductions,221629},{last_calls,false},{catchlevel,1},{trace,0},{suspending,},{sequential_trace_token,},{error_handler,error_handler}]
2026-02-27T08:00:09.188160+08:00 [warning] msg: long_schedule, info: [{timeout,291},{port_op,input}], portinfo: [{port,#Port<0.30149573>},{name,“tcp_inet”},{links,[<0.22149008.2>]},{id,24107},{connected,<0.22149008.2>},{input,0},{output,2015},{os_pid,undefined}]
grep -E “long_schedule|alarm|too_many|accept|emfile|enomem|closed” emqx.log.14 | tail -n 200
2026-02-27T12:00:02.483722+08:00 [warning] msg: long_schedule, info: [{timeout,254},{in,{emqx_connection,recvloop,2}},{out,{erlang,port_command,3}}], procinfo: [{pid,<0.78328272.2>},{memory,22008},{total_heap_size,2586},{heap_size,2586},{stack_size,9},{min_heap_size,233},{proc_lib_initial_call,{emqx_connection,init,[‘Argument__1’,‘Argument__2’,‘Argument__3’,‘Argument__4’]}},{initial_call,{proc_lib,init_p,5}},{current_stacktrace,[{emqx_connection,recvloop,2,[{file,“emqx_connection.erl”},{line,411}]},{proc_lib,wake_up,3,[{file,“proc_lib.erl”},{line,251}]}]},{registered_name,},{status,waiting},{message_queue_len,0},{group_leader,<0.4176.0>},{priority,normal},{trap_exit,false},{reductions,8999775},{last_calls,false},{catchlevel,1},{trace,0},{suspending,},{sequential_trace_token,},{error_handler,error_handler}]
2026-02-27T12:00:02.493724+08:00 [warning] msg: long_schedule, info: [{timeout,255},{in,{emqx_connection,recvloop,2}},{out,{erlang,port_command,3}}], procinfo: [{pid,<0.67323392.2>},{memory,47424},{total_heap_size,5783},{heap_size,1598},{stack_size,9},{min_heap_size,233},{proc_lib_initial_call,{emqx_connection,init,[‘Argument__1’,‘Argument__2’,‘Argument__3’,‘Argument__4’]}},{initial_call,{proc_lib,init_p,5}},{current_stacktrace,[{emqx_connection,recvloop,2,[{file,“emqx_connection.erl”},{line,411}]},{proc_lib,wake_up,3,[{file,“proc_lib.erl”},{line,251}]}]},{registered_name,},{status,waiting},{message_queue_len,0},{group_leader,<0.4176.0>},{priority,normal},{trap_exit,false},{reductions,19910},{last_calls,false},{catchlevel,1},{trace,0},{suspending,},{sequential_trace_token,},{error_handler,error_handler}]
2026-02-27T12:00:02.494906+08:00 [warning] msg: long_schedule, info: [{timeout,296},{in,{emqx_connection,recvloop,2}},{out,{erts_internal,await_result,1}}], procinfo: [{pid,<0.76308925.2>},{memory,34688},{total_heap_size,4191},{heap_size,2586},{stack_size,9},{min_heap_size,233},{proc_lib_initial_call,{emqx_connection,init,[‘Argument__1’,‘Argument__2’,‘Argument__3’,‘Argument__4’]}},{initial_call,{proc_lib,init_p,5}},{current_stacktrace,[{emqx_connection,recvloop,2,[{file,“emqx_connection.erl”},{line,411}]},{proc_lib,wake_up,3,[{file,“proc_lib.erl”},{line,251}]}]},{registered_name,},{status,waiting},{message_queue_len,0},{group_leader,<0.4176.0>},{priority,normal},{trap_exit,false},{reductions,61351},{last_calls,false},{catchlevel,1},{trace,0},{suspending,},{sequential_trace_token,},{error_handler,error_handler}]
2026-02-27T12:00:38.491105+08:00 [warning] msg: alarm_is_activated, message: <<“resource down: #{error => timeout,status => disconnected}”>>, name: <<“action:http:action-data-mqtt-webhook:connector:http:data-mqtt-webhook”>>
2026-02-27T12:00:53.427937+08:00 [warning] msg: alarm_is_activated, message: <<“resource down: timeout”>>, name: <<“connector:http:data-mqtt-webhook”>>

grep -E “long_schedule|alarm|too_many|accept|emfile|enomem|closed” emqx.log*
更多内容,请查看mqtt_case.zip文件
mqtt_case.zip (6.0 KB)

grep -E “Kernel poll|smp|scheduler|system_limit” emqx.log* | tail -n 100
无搜到结果

结论:1883 监听大概率没丢,是节点在故障窗口里 accept/调度被拖住了;long_schedule 出现在 emqx_connection:recvlooptcp_inet,方向就是调度延迟/连接处理能力被打满。
先做止血:故障节点先从 LB 摘掉,再重启该节点恢复服务;恢复后再做根因抓取。

请在故障当下补这一组输出(同一时间窗口,别只贴 tail 几行):

emqx ctl listeners
emqx ctl vm all
emqx ctl broker stats
cat /proc/net/netstat | egrep "ListenOverflows|ListenDrops"
ss -lntp | grep :1883
ss -s

再补 2 分钟完整日志(不只 long_schedule):

grep -E "long_schedule|accept|emfile|enomem|alarm|too_many|busy_dist_port" /usr/local/emqx-5.8.3/log/emqx.log* | tail -n 400

另外把部署拓扑说清楚:

  • 是否有四层 LB/Nginx/SLB 在 1883 前面(健康检查策略是什么)
  • 异常节点 CPU、load、磁盘 IO、网络是否在故障点打满
    你这个量级(总连接 80w)可以先做两项保守调优再观察:
listeners.tcp.default.acceptors = 64
listeners.tcp.default.max_connections = 1024000

可以试当调大内核队列参数:net.core.somaxconnnet.ipv4.tcp_max_syn_backlog

你这个 long_schedule,和 runq_overload 都在明确的说机器的 CPU 不行了。
但是你说控制台看节点连接是 0,得找出到底是谁在用 CPU。你 htop 看看。

[root@HDDC-CLWAITJ-APP07 ~]# ps -ef|grep emqx
root 40457 1 0 Jan17 ? 00:00:01 /usr/local/emqx-5.8.3/emqx-5.8.3/erts-14.2.5.2/bin/run_erl -daemon //emqx/data/root_erl_pipes/emqx@172.26.244.192/ /usr/local/emqx-5.8.3/emqx-5.8.3/log exec “/usr/local/emqx-5.8.3/emqx-5.8.3/bin/emqx” “console”
root 40480 40457 99 Jan17 pts/1 206-01:14:40 emqx -spp true -A 4 -IOt 4 -SDio 8 -C multi_time_warp -c true -pc unicode -hmax 33554432 -e 262144 -zdbbl 8192 -Q 2097152 -P 4194304 – -root /usr/local/emqx-5.8.3/emqx-5.8.3 -bindir /usr/local/emqx-5.8.3/emqx-5.8.3/erts-14.2.5.2/bin -progname /usr/local/emqx-5.8.3/emqx-5.8.3/bin/emqx – -home /root – -enable-feature maybe_expr -boot /usr/local/emqx-5.8.3/emqx-5.8.3/releases/5.8.3/start -boot_var RELEASE_LIB /usr/local/emqx-5.8.3/emqx-5.8.3/lib -boot_var ERTS_LIB_DIR /usr/local/emqx-5.8.3/emqx-5.8.3/lib -mode embedded -config /emqx/data/configs/app.2026.01.17.21.24.57.config -stdlib restricted_shell emqx_restricted_shell -shutdown_time 30000 -cache_boot_paths false -pa data/patches -mnesia dump_log_write_threshold 5000 -mnesia dump_log_time_threshold 60000 -os_mon start_disksup false -pa /usr/local/emqx-5.8.3/emqx-5.8.3/releases/5.8.3/consolidated -kernel prevent_overlapping_partitions false -kernel net_ticktime 120 -setcookie 8dnew8q5r60g108c7ru32g4w -name emqx@172.26.244.192 -mnesia dir “/emqx/data/mnesia/emqx@172.26.244.192” – -start_epmd false -epmd_module ekka_epmd -proto_dist ekka – console -ekka_proto_dist inet_tcp -emqx_data_dir /emqx/data –
root 40859 40761 0 Jan17 ? 00:00:01 /usr/local/emqx-5.8.3/emqx-5.8.3/erts-14.2.5.2/bin/inet_gethost 4
root 40860 40859 0 Jan17 ? 00:00:01 /usr/local/emqx-5.8.3/emqx-5.8.3/erts-14.2.5.2/bin/inet_gethost 4
root 40903 40761 0 Jan17 ? 00:11:22 /usr/local/emqx-5.8.3/emqx-5.8.3/lib/os_mon-2.9.1/priv/bin/memsup
root 40904 40761 0 Jan17 ? 00:00:29 /usr/local/emqx-5.8.3/emqx-5.8.3/lib/os_mon-2.9.1/priv/bin/cpu_sup
root 109571 109382 0 08:40 pts/0 00:00:00 grep --color=auto emqx
[root@HDDC-CLWAITJ-APP07 ~]# cd /usr/local/emqx/emqx-5.8.3/
[root@HDDC-CLWAITJ-APP07 emqx-5.8.3]# emqx ctl listeners
-bash: emqx: command not found
[root@HDDC-CLWAITJ-APP07 emqx-5.8.3]# ./bin/emqx ctl listeners
Node ‘emqx@172.26.244.192’ not responding to pings.
[root@HDDC-CLWAITJ-APP07 emqx-5.8.3]# ss -s
Total: 1280
TCP: 1247 (estab 83, closed 529, orphaned 0, timewait 119)

Transport Total IP IPv6
RAW 0 0 0
UDP 0 0 0
TCP 718 716 2
INET 718 716 2
FRAG 0 0 0

[root@HDDC-CLWAITJ-APP07 emqx-5.8.3]# ./bin/emqx ctl vm all
Node ‘emqx@172.26.244.192’ not responding to pings.
[root@HDDC-CLWAITJ-APP07 emqx-5.8.3]# ./bin/emqx ctl broker stats
Node ‘emqx@172.26.244.192’ not responding to pings.
[root@HDDC-CLWAITJ-APP07 emqx-5.8.3]# cat /proc/net/netstat | egrep “ListenOverflows|ListenDrops”
TcpExt: SyncookiesSent SyncookiesRecv SyncookiesFailed EmbryonicRsts PruneCalled RcvPruned OfoPruned OutOfWindowIcmps LockDroppedIcmps ArpFilter TW TWRecycled TWKilled PAWSActive PAWSEstab DelayedACKs DelayedACKLocked DelayedACKLost ListenOverflows ListenDrops TCPHPHits TCPPureAcks TCPHPAcks TCPRenoRecovery TCPSackRecovery TCPSACKReneging TCPSACKReorder TCPRenoReorder TCPTSReorder TCPFullUndo TCPPartialUndo TCPDSACKUndo TCPLossUndo TCPLostRetransmit TCPRenoFailures TCPSackFailures TCPLossFailures TCPFastRetrans TCPSlowStartRetrans TCPTimeouts TCPLossProbes TCPLossProbeRecovery TCPRenoRecoveryFail TCPSackRecoveryFail TCPRcvCollapsed TCPBacklogCoalesce TCPDSACKOldSent TCPDSACKOfoSent TCPDSACKRecv TCPDSACKOfoRecv TCPAbortOnData TCPAbortOnClose TCPAbortOnMemory TCPAbortOnTimeout TCPAbortOnLinger TCPAbortFailed TCPMemoryPressures TCPMemoryPressuresChrono TCPSACKDiscard TCPDSACKIgnoredOld TCPDSACKIgnoredNoUndo TCPSpuriousRTOs TCPMD5NotFound TCPMD5Unexpected TCPMD5Failure TCPSackShifted TCPSackMerged TCPSackShiftFallback TCPBacklogDrop PFMemallocDrop TCPMinTTLDrop TCPDeferAcceptDrop IPReversePathFilter TCPTimeWaitOverflow TCPReqQFullDoCookies TCPReqQFullDrop TCPRetransFail TCPRcvCoalesce TCPOFOQueue TCPOFODrop TCPOFOMerge TCPChallengeACK TCPSYNChallenge TCPFastOpenActive TCPFastOpenActiveFail TCPFastOpenPassive TCPFastOpenPassiveFail TCPFastOpenListenOverflow TCPFastOpenCookieReqd TCPFastOpenBlackhole TCPSpuriousRtxHostQueues BusyPollRxPackets TCPAutoCorking TCPFromZeroWindowAdv TCPToZeroWindowAdv TCPWantZeroWindowAdv TCPSynRetrans TCPOrigDataSent TCPHystartTrainDetect TCPHystartTrainCwnd TCPHystartDelayDetect TCPHystartDelayCwnd TCPACKSkippedSynRecv TCPACKSkippedPAWS TCPACKSkippedSeq TCPACKSkippedFinWait2 TCPACKSkippedTimeWait TCPACKSkippedChallenge TCPWinProbe TCPKeepAlive TCPMTUPFail TCPMTUPSuccess TCPDelivered TCPDeliveredCE TCPAckCompressed TCPZeroWindowDrop TCPRcvQDrop TCPWqueueTooBig TCPFastOpenPassiveAltKey
[root@HDDC-CLWAITJ-APP07 emqx-5.8.3]# ss -lntp | grep :1883
LISTEN 1025 1024 0.0.0.0:1883 0.0.0.0:* users:((“beam.smp”,pid=40480,fd=97))
[root@HDDC-CLWAITJ-APP07 emqx-5.8.3]# ss -s
Total: 1284
TCP: 1245 (estab 83, closed 523, orphaned 0, timewait 113)

Transport Total IP IPv6
RAW 0 0 0
UDP 0 0 0
TCP 722 720 2
INET 722 720 2
FRAG 0 0 0

[root@HDDC-CLWAITJ-APP07 emqx-5.8.3]# free -m
total used free shared buff/cache available
Mem: 32354 4714 13317 842 14322 26340
Swap: 8063 0 8063

[root@HDDC-CLWAITJ-APP07 log]# vmstat 1
procs -----------memory---------- —swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 0 13626684 5528 14666836 0 0 0 1 0 0 7 3 90 0 0
0 0 0 13626456 5528 14666836 0 0 0 0 6025 3030 1 1 99 0 0
1 0 0 13626480 5528 14666836 0 0 0 0 7072 3620 1 1 99 0 0
0 0 0 13626772 5528 14666836 0 0 0 0 6805 4538 1 1 99 0 0
1 0 0 13626620 5528 14666836 0 0 0 0 6135 2985 1 1 99 0 0
1 0 0 13626780 5528 14666836 0 0 0 0 6520 3273 0 1 99 0 0
0 0 0 13626684 5528 14666836 0 0 0 8 6323 4138 1 1 99 0 0

[root@HDDC-CLWAITJ-APP07 log]# grep -E “long_schedule|accept|emfile|enomem|alarm|too_many|busy_dist_port” emqx.log*
详情请查看附件

emqx.log.8:2026-02-20T21:48:33.483782+08:00 [error] crasher: initial call: gen_rpc_acceptor:init/1, pid: <0.19753764.13>, registered_name: , exit: {{badtcp,closed},[{gen_statem,loop_state_callback_result,11,[{file,“gen_statem.erl”},{line,1524}]},{proc_lib,init_p_do_apply,3,[{file,“proc_lib.erl”},{line,241}]}]}, ancestors: [gen_rpc_acceptor_sup,gen_rpc_sup,<0.2236.0>], message_queue_len: 0, messages: , links: [<0.2240.0>], dictionary: , trap_exit: true, status: running, heap_size: 6772, stack_size: 28, reductions: 10992; neighbours:
emqx.log.8:2026-02-20T21:48:33.485391+08:00 [error] Supervisor: {local,gen_rpc_acceptor_sup}. Context: child_terminated. Reason: {badtcp,closed}. Offender: id=gen_rpc_acceptor,pid=<0.19753764.13>.
emqx.log.8:2026-02-20T21:48:42.080249+08:00 [error] State machine {acceptor,{{172,26,244,29},64302}} terminating. Reason: {badtcp,closed}. Stack: [{gen_statem,loop_state_callback_result,11,[{file,“gen_statem.erl”},{line,1524}]},{proc_lib,init_p_do_apply,3,[{file,“proc_lib.erl”},{line,241}]}]. Last event: {{call,{<0.2239.0>,#Ref<0.100585327.584581126.101485>}},{socket_ready,#Port<0.124109861>}}. State: {waiting_for_socket,{state,#Port<0.124109861>,tcp,gen_rpc_driver_tcp,tcp_closed,tcp_error,{{172,26,244,29},64302},disabled,disabled}}. Client gen_rpc_server_tcp stacktrace: [{prim_inet,accept0,3,},{inet_tcp,accept,2,[{file,“inet_tcp.erl”},{line,227}]},{gen_rpc_server,waiting_for_connection,3,[{file,“gen_rpc_server.erl”},{line,71}]},{gen_statem,loop_state_callback,11,[{file,“gen_statem.erl”},{line,1395}]}].
emqx.log.8:2026-02-20T21:48:42.080970+08:00 [error] crasher: initial call: gen_rpc_acceptor:init/1, pid: <0.35323426.13>, registered_name: , exit: {{badtcp,closed},[{gen_statem,loop_state_callback_result,11,[{file,“gen_statem.erl”},{line,1524}]},{proc_lib,init_p_do_apply,3,[{file,“proc_lib.erl”},{line,241}]}]}, ancestors: [gen_rpc_acceptor_sup,gen_rpc_sup,<0.2236.0>], message_queue_len: 0, messages: , links: [<0.2240.0>], dictionary: , trap_exit: true, status: running, heap_size: 6772, stack_size: 28, reductions: 10996; neighbours:
emqx.log.8:2026-02-20T21:48:42.084149+08:00 [error] Supervisor: {local,gen_rpc_acceptor_sup}. Context: child_terminated. Reason: {badtcp,closed}. Offender: id=gen_rpc_acceptor,pid=<0.35323426.13>.
emqx.log.8:2026-02-20T21:48:48.073356+08:00 [error] State machine {acceptor,{{172,26,244,29},47314}} terminating. Reason: {badtcp,closed}. Stack: [{gen_statem,loop_state_callback_result,11,[{file,“gen_statem.erl”},{line,1524}]},{proc_lib,init_p_do_apply,3,[{file,“proc_lib.erl”},{line,241}]}]. Last event: {{call,{<0.2239.0>,#Ref<0.100585326.1154482190.208146>}},{socket_ready,#Port<0.125380425>}}. State: {waiting_for_socket,{state,#Port<0.125380425>,tcp,gen_rpc_driver_tcp,tcp_closed,tcp_error,{{172,26,244,29},47314},disabled,disabled}}. Client gen_rpc_server_tcp stacktrace: [{prim_inet,accept0,3,},{inet_tcp,accept,2,[{file,“inet_tcp.erl”},{line,227}]},{gen_rpc_server,waiting_for_connection,3,[{file,“gen_rpc_server.erl”},{line,71}]},{gen_statem,loop_state_callback,11,[{file,“gen_statem.erl”},{line,1395}]}].
emqx.log.8:2026-02-20T21:48:48.073711+08:00 [error] crasher: initial call: gen_rpc_acceptor:init/1, pid: <0.259690523.12>, registered_name: , exit: {{badtcp,closed},[{gen_statem,loop_state_callback_result,11,[{file,“gen_statem.erl”},{line,1524}]},{proc_lib,init_p_do_apply,3,[{file,“proc_lib.erl”},{line,241}]}]}, ancestors: [gen_rpc_acceptor_sup,gen_rpc_sup,<0.2236.0>], message_queue_len: 0, messages: , links: [<0.2240.0>], dictionary: , trap_exit: true, status: running, heap_size: 6772, stack_size: 28, reductions: 11267; neighbours:
emqx.log.8:2026-02-20T21:48:48.074180+08:00 [error] Supervisor: {local,gen_rpc_acceptor_sup}. Context: child_terminated. Reason: {badtcp,closed}. Offender: id=gen_rpc_acceptor,pid=<0.259690523.12>.
emqx.log.8:2026-02-20T21:48:57.819449+08:00 [error] State machine {acceptor,{{172,26,244,29},7704}} terminating. Reason: {badtcp,closed}. Stack: [{gen_statem,loop_state_callback_result,11,[{file,“gen_statem.erl”},{line,1524}]},{proc_lib,init_p_do_apply,3,[{file,“proc_lib.erl”},{line,241}]}]. Last event: {{call,{<0.2239.0>,#Ref<0.100585326.2178416652.190243>}},{socket_ready,#Port<0.125246090>}}. State: {waiting_for_socket,{state,#Port<0.125246090>,tcp,gen_rpc_driver_tcp,tcp_closed,tcp_error,{{172,26,244,29},7704},disabled,disabled}}. Client gen_rpc_server_tcp stacktrace: [{prim_inet,accept0,3,},{inet_tcp,accept,2,[{file,“inet_tcp.erl”},{line,227}]},{gen_rpc_server,waiting_for_connection,3,[{file,“gen_rpc_server.erl”},{line,71}]},{gen_statem,loop_state_callback,11,[{file,“gen_statem.erl”},{line,1395}]}].
emqx.log.8:2026-02-20T21:48:57.822071+08:00 [error] crasher: initial call: gen_rpc_acceptor:init/1, pid: <0.47991783.13>, registered_name: , exit: {{badtcp,closed},[{gen_statem,loop_state_callback_result,11,[{file,“gen_statem.erl”},{line,1524}]},{proc_lib,init_p_do_apply,3,[{file,“proc_lib.erl”},{line,241}]}]}, ancestors: [gen_rpc_acceptor_sup,gen_rpc_sup,<0.2236.0>], message_queue_len: 0, messages: , links: [<0.2240.0>], dictionary: , trap_exit: true, status: running, heap_size: 6772, stack_size: 28, reductions: 10988; neighbours:
emqx.log.8:2026-02-20T21:48:57.823823+08:00 [error] Supervisor: {local,gen_rpc_acceptor_sup}. Context: child_terminated. Reason: {badtcp,closed}. Offender: id=gen_rpc_acceptor,pid=<0.47991783.13>.
emqx.log.8:2026-02-20T21:49:04.466659+08:00 [error] State machine {acceptor,{{172,26,244,29},47090}} terminating. Reason: {badtcp,closed}. Stack: [{gen_statem,loop_state_callback_result,11,[{file,“gen_statem.erl”},{line,1524}]},{proc_lib,init_p_do_apply,3,[{file,“proc_lib.erl”},{line,241}]}]. Last event: {{call,{<0.2239.0>,#Ref<0.100585326.3078619146.115716>}},{socket_ready,#Port<0.124550264>}}. State: {waiting_for_socket,{state,#Port<0.124550264>,tcp,gen_rpc_driver_tcp,tcp_closed,tcp_error,{{172,26,244,29},47090},disabled,disabled}}. Client gen_rpc_server_tcp stacktrace: [{prim_inet,accept0,3,},{inet_tcp,accept,2,[{file,“inet_tcp.erl”},{line,227}]},{gen_rpc_server,waiting_for_connection,3,[{file,“gen_rpc_server.erl”},{line,71}]},{gen_statem,loop_state_callback,11,[{file,“gen_statem.erl”},{line,1395}]}].
emqx.log.8:2026-02-20T21:49:04.479678+08:00 [error] crasher: initial call: gen_rpc_acceptor:init/1, pid: <0.251367004.12>, registered_name: , exit: {{badtcp,closed},[{gen_statem,loop_state_callback_result,11,[{file,“gen_statem.erl”},{line,1524}]},{proc_lib,init_p_do_apply,3,[{file,“proc_lib.erl”},{line,241}]}]}, ancestors: [gen_rpc_acceptor_sup,gen_rpc_sup,<0.2236.0>], message_queue_len: 0, messages: , links: [<0.2240.0>], dictionary: , trap_exit: true, status: running, heap_size: 6772, stack_size: 28, reductions: 11228; neighbours:
emqx.log.8:2026-02-20T21:49:04.480617+08:00 [error] Supervisor: {local,gen_rpc_acceptor_sup}. Context: child_terminated. Reason: {badtcp,closed}. Offender: id=gen_rpc_acceptor,pid=<0.251367004.12>.
emqx.log.8:2026-02-20T21:49:11.314089+08:00 [error] State machine {acceptor,{{172,26,244,29},47092}} terminating. Reason: {badtcp,closed}. Stack: [{gen_statem,loop_state_callback_result,11,[{file,“gen_statem.erl”},{line,1524}]},{proc_lib,init_p_do_apply,3,[{file,“proc_lib.erl”},{line,241}]}]. Last event: {{call,{<0.2239.0>,#Ref<0.100585326.3078619146.132033>}},{socket_ready,#Port<0.125577902>}}. State: {waiting_for_socket,{state,#Port<0.125577902>,tcp,gen_rpc_driver_tcp,tcp_closed,tcp_error,{{172,26,244,29},47092},disabled,disabled}}. Client gen_rpc_server_tcp stacktrace: [{prim_inet,accept0,3,},{inet_tcp,accept,2,[{file,“inet_tcp.erl”},{line,227}]},{gen_rpc_server,waiting_for_connection,3,[{file,“gen_rpc_server.erl”},{line,71}]},{gen_statem,loop_state_callback,11,[{file,“gen_statem.erl”},{line,1395}]}].
emqx.log.8:2026-02-20T21:49:11.314870+08:00 [error] crasher: initial call: gen_rpc_acceptor:init/1, pid: <0.39946389.13>, registered_name: , exit: {{badtcp,closed},[{gen_statem,loop_state_callback_result,11,[{file,“gen_statem.erl”},{line,1524}]},{proc_lib,init_p_do_apply,3,[{file,“proc_lib.erl”},{line,241}]}]}, ancestors: [gen_rpc_acceptor_sup,gen_rpc_sup,<0.2236.0>], message_queue_len: 0, messages: , links: [<0.2240.0>], dictionary: , trap_exit: true, status: running, heap_size: 6772, stack_size: 28, reductions: 11264; neighbours:
emqx.log.8:2026-02-20T21:49:11.341158+08:00 [error] Supervisor: {local,gen_rpc_acceptor_sup}. Context: child_terminated. Reason: {badtcp,closed}. Offender: id=gen_rpc_acceptor,pid=<0.39946389.13>.
emqx.log.8:2026-02-20T21:49:18.208464+08:00 [error] State machine {acceptor,{{172,26,244,29},8952}} terminating. Reason: {badtcp,closed}. Stack: [{gen_statem,loop_state_callback_result,11,[{file,“gen_statem.erl”},{line,1524}]},{proc_lib,init_p_do_apply,3,[{file,“proc_lib.erl”},{line,241}]}]. Last event: {{call,{<0.2239.0>,#Ref<0.100585328.654835713.212440>}},{socket_ready,#Port<0.124743562>}}. State: {waiting_for_socket,{state,#Port<0.124743562>,tcp,gen_rpc_driver_tcp,tcp_closed,tcp_error,{{172,26,244,29},8952},disabled,disabled}}. Client gen_rpc_server_tcp stacktrace: [{prim_inet,accept0,3,},{inet_tcp,accept,2,[{file,“inet_tcp.erl”},{line,227}]},{gen_rpc_server,waiting_for_connection,3,[{file,“gen_rpc_server.erl”},{line,71}]},{gen_statem,loop_state_callback,11,[{file,“gen_statem.erl”},{line,1395}]}].
emqx.log.8:2026-02-20T21:49:18.208950+08:00 [error] crasher: initial call: gen_rpc_acceptor:init/1, pid: <0.70781413.13>, registered_name: , exit: {{badtcp,closed},[{gen_statem,loop_state_callback_result,11,[{file,“gen_statem.erl”},{line,1524}]},{proc_lib,init_p_do_apply,3,[{file,“proc_lib.erl”},{line,241}]}]}, ancestors: [gen_rpc_acceptor_sup,gen_rpc_sup,<0.2236.0>], message_queue_len: 0, messages: , links: [<0.2240.0>], dictionary: , trap_exit: true, status: running, heap_size: 6772, stack_size: 28, reductions: 11225; neighbours:
emqx.log.8:2026-02-20T21:49:18.210069+08:00 [error] Supervisor: {local,gen_rpc_acceptor_sup}. Context: child_terminated. Reason: {badtcp,closed}. Offender: id=gen_rpc_acceptor,pid=<0.70781413.13>.
errorlog.zip (180.6 KB)

部署拓扑 lvs>5节点ng>5节点emqx集群

日志的 badtcp ,这个日志本身不是根因,是结果。
{badtcp,closed} 出现在 gen_rpc_acceptor,含义是集群内部 RPC 的 TCP socket 在 accept/握手过程中被对端关闭了。它通常发生在两种场景:网络路径抖动,或者节点当时已经忙不过来导致连接被动关闭。
这帖里更直接的根因证据其实是这条:
ss -lntp | grep :1883 显示 LISTEN 1025 1024 ... beam.smp
这表示 1883 的 accept 队列已经打满(当前队列 >= backlog 上限),所以你会看到“端口在监听,但新连接进不来”。
所以这次问题链路是:

  1. 1883 accept 队列满。
  2. 新连接堆积/超时。
  3. 集群内 RPC 连接也开始出现 {badtcp,closed}gen_rpc_error ... econnreset(这是连锁反应)。

最根本的原因还是你附件里面的 long_schedule 一直在报。说明机器的 CPU 真的很忙。
你还是用 top,htop,或者云厂商的监控平台看看到底是谁在吃你的 CPU 吧。



节点异常时 节点的负载情况

你这次两张负载图可以只能判断 CPU 调度被打满,1883 的 accept 队列随之顶满,导致新连接进不来;{badtcp,closed}/gen_rpc econnreset 是后续连锁现象,不是原因。
从图上看多个 erts_sched_* 线程长期高占用,这和前面日志里的 long_schedule 是一致的。

还是没有办法判断,再看看故障时下面的输出是什么?


# EMQX
emqx ctl listeners
emqx ctl broker stats
emqx ctl vm all

# 内核/网络
mpstat -P ALL 1 120
pidstat -t -p $(pgrep -f beam.smp | head -1) 1 120
vmstat 1 120
cat /proc/net/netstat | egrep "ListenOverflows|ListenDrops"
ss -s

# 日志
grep -E "long_schedule|accept|busy_dist_port|emfile|enomem|too_many|badtcp|econnreset" /usr/local/emqx-5.8.3/log/emqx.log* | tail -n 400