Windows Server 2008R2系统下,EMQX4.3.3会莫名崩溃无法连接

环境

  • EMQX 版本:4.3.3
  • 操作系统版本:Windows Server 2008R2

重现此问题的步骤

  1. 暂时无法复现,当连接数超过8000时会大概率发生,但是连接数达到4000时也发生过
  2. 重启emqx会恢复,但是运行一段时间后,EMQX又会崩溃,此时新连接无法建立,已有连接不受影响,仍在连接状态

预期行为

实际行为

2023-12-15T14:51:36.934000+08:00 [error] crasher: initial call: esockd_acceptor:init/1, pid: <0.10538.4>, registered_name: , exit: {system_limit,[{gen_statem,loop_state_callback_result,11,[{file,“gen_statem.erl”},{line,1360}]},{proc_lib,init_p_do_apply,3,[{file,“proc_lib.erl”},{line,226}]}]}, ancestors: [<0.552.0>,<0.550.0>,esockd_sup,<0.159.0>], message_queue_len: 0, messages: , links: [<0.552.0>], dictionary: [{rand_seed,{#{jump => #Fun<rand.3.47293030>,max => 288230376151711743,next => #Fun<rand.5.47293030>,type => exsplus},[141377235351073971|147891784259438830]}}], trap_exit: false, status: running, heap_size: 4185, stack_size: 28, reductions: 8939; neighbours:
2023-12-15T14:51:36.934000+08:00 [error] Supervisor: {<0.552.0>,esockd_acceptor_sup}. Context: child_terminated. Reason: system_limit. Offender: id=acceptor,pid=<0.10538.4>.
2023-12-15T14:51:36.996000+08:00 [error] State machine <0.10539.4> terminating. Reason: system_limit. Stack: [{gen_statem,loop_state_callback_result,11,[{file,“gen_statem.erl”},{line,1360}]},{proc_lib,init_p_do_apply,3,[{file,“proc_lib.erl”},{line,226}]}]. Last event: {info,{inet_async,#Port<0.16>,62457,{error,system_limit}}}. State: {accepting,{state,‘mqtt:tcp’,{{0,0,0,0},1883},#Port<0.16>,inet_tcp,{{0,0,0,0},1883},{fun esockd_listener_sup:tune_socket/2,[[{tune_buffer,false}]]},,{listener,‘mqtt:tcp’,{{0,0,0,0},1883}},<0.551.0>,62457}}.
2023-12-15T14:51:36.996000+08:00 [error] crasher: initial call: esockd_acceptor:init/1, pid: <0.10539.4>, registered_name: , exit: {system_limit,[{gen_statem,loop_state_callback_result,11,[{file,“gen_statem.erl”},{line,1360}]},{proc_lib,init_p_do_apply,3,[{file,“proc_lib.erl”},{line,226}]}]}, ancestors: [<0.552.0>,<0.550.0>,esockd_sup,<0.159.0>], message_queue_len: 0, messages: , links: [<0.552.0>], dictionary: [{rand_seed,{#{jump => #Fun<rand.3.47293030>,max => 288230376151711743,next => #Fun<rand.5.47293030>,type => exsplus},[141377235351073971|148484796388592406]}}], trap_exit: false, status: running, heap_size: 4185, stack_size: 28, reductions: 8938; neighbours:
2023-12-15T14:51:36.996000+08:00 [error] Supervisor: {<0.552.0>,esockd_acceptor_sup}. Context: child_terminated. Reason: system_limit. Offender: id=acceptor,pid=<0.10539.4>.
2023-12-15T14:51:37.762000+08:00 [error] 869576056157281@39.144.2.81:11371 [Auth http] Deny connection from path: /api/mqtt/auth, response http code: 400
2023-12-15T14:51:37.762000+08:00 [warning] 869576056157281@39.144.2.81:11371 [Channel] Client 869576056157281 (Username: ‘869576056157281’) login failed for bad_username_or_password
2023-12-15T14:51:37.809000+08:00 [error] 869576059827674@117.132.193.67:39598 [Auth http] Deny connection from path: /api/mqtt/auth, response http code: 400
2023-12-15T14:51:37.809000+08:00 [warning] 869576059827674@117.132.193.67:39598 [Channel] Client 869576059827674 (Username: ‘869576059827674’) login failed for bad_username_or_password
2023-12-15T14:51:37.965000+08:00 [warning] 869020063110751@36.113.67.240:10013 [Channel] The PUBREL PacketId 201 is not found.
2023-12-15T14:51:38.481000+08:00 [warning] 869861064962900@36.113.35.101:10013 [Channel] The PUBREL PacketId 0 is not found.
2023-12-15T14:51:38.825000+08:00 [warning] 869861064967057@36.113.33.223:30999 [Channel] The PUBREL PacketId 201 is not found.
2023-12-15T14:51:38.903000+08:00 [error] State machine <0.10536.4> terminating. Reason: system_limit. Stack: [{gen_statem,loop_state_callback_result,11,[{file,“gen_statem.erl”},{line,1360}]},{proc_lib,init_p_do_apply,3,[{file,“proc_lib.erl”},{line,226}]}]. Last event: {info,{inet_async,#Port<0.16>,62462,{error,system_limit}}}. State: {accepting,{state,‘mqtt:tcp’,{{0,0,0,0},1883},#Port<0.16>,inet_tcp,{{0,0,0,0},1883},{fun esockd_listener_sup:tune_socket/2,[[{tune_buffer,false}]]},,{listener,‘mqtt:tcp’,{{0,0,0,0},1883}},<0.551.0>,62462}}.
2023-12-15T14:51:38.903000+08:00 [error] crasher: initial call: esockd_acceptor:init/1, pid: <0.10536.4>, registered_name: , exit: {system_limit,[{gen_statem,loop_state_callback_result,11,[{file,“gen_statem.erl”},{line,1360}]},{proc_lib,init_p_do_apply,3,[{file,“proc_lib.erl”},{line,226}]}]}, ancestors: [<0.552.0>,<0.550.0>,esockd_sup,<0.159.0>], message_queue_len: 0, messages: , links: [<0.552.0>], dictionary: [{rand_seed,{#{jump => #Fun<rand.3.47293030>,max => 288230376151711743,next => #Fun<rand.5.47293030>,type => exsplus},[141377243941008433|145489686015487738]}}], trap_exit: false, status: running, heap_size: 4185, stack_size: 28, reductions: 9255; neighbours:
2023-12-15T14:51:38.903000+08:00 [error] Supervisor: {<0.552.0>,esockd_acceptor_sup}. Context: child_terminated. Reason: system_limit. Offender: id=acceptor,pid=<0.10536.4>.
2023-12-15T14:51:38.903000+08:00 [error] Supervisor: {<0.552.0>,esockd_acceptor_sup}. Context: shutdown. Reason: reached_max_restart_intensity. Offender: id=acceptor,pid=<0.10536.4>.
2023-12-15T14:51:39.419000+08:00 [error] 866156056431714@39.144.3.249:64825 [Auth http] Deny connection from path: /api/mqtt/auth, response http code: 400
2023-12-15T14:51:39.419000+08:00 [warning] 866156056431714@39.144.3.249:64825 [Channel] Client 866156056431714 (Username: ‘866156056431714’) login failed for bad_username_or_password
2023-12-15T14:51:39.544000+08:00 [warning] 869020063101669@39.144.129.84:37635 [Channel] The PUBREL PacketId 0 is not found.
2023-12-15T14:51:39.919000+08:00 [warning] 869020063107138@36.113.118.176:5945 [Channel] The PUBREL PacketId 201 is not found.
2023-12-15T14:51:40.044000+08:00 [warning] 869861064942712@36.113.67.254:10002 [Channel] The PUBREL PacketId 193 is not found.
2023-12-15T14:51:40.169000+08:00 [warning] 869861064945939@36.113.30.71:10014 [Channel] The PUBREL PacketId 18176 is not found.
2023-12-15T14:51:40.216000+08:00 [warning] 869861064962769@36.113.30.135:10013 [Channel] The PUBREL PacketId 18176 is not found.
2023-12-15T14:51:40.232000+08:00 [warning] 869861064961373@36.113.66.96:10014 [Channel] The PUBREL PacketId 12342 is not found.
2023-12-15T14:51:40.529000+08:00 [warning] 869861064958460@36.113.68.55:10013 [Channel] The PUBREL PacketId 193 is not found.
2023-12-15T14:51:41.669000+08:00 [warning] 869861064963635@36.113.69.69:10012 [Channel] The PUBREL PacketId 0 is not found.
2023-12-15T14:51:42.700000+08:00 [warning] 869861064941060@36.113.38.74:10014 [Channel] The PUBREL PacketId 201 is not found.
2023-12-15T14:51:42.700000+08:00 [warning] 869020063126476@36.113.70.183:10017 [Channel] The PUBREL PacketId 0 is not found.
2023-12-15T14:51:42.715000+08:00 [warning] 869020063110538@36.113.116.242:10013 [Channel] The PUBREL PacketId 0 is not found.
2023-12-15T14:51:42.920000+08:00 [warning] 869020063098360@36.113.112.217:10014 [Channel] The PUBREL PacketId 18176 is not found.
2023-12-15T14:51:43.015000+08:00 [warning] 869020063097719@36.113.38.208:10015 [Channel] The PUBREL PacketId 201 is not found.
2023-12-15T14:51:43.836000+08:00 [warning] 869861064967446@36.113.113.18:10013 [Channel] The PUBREL PacketId 201 is not found.

如果有大佬遇到过类似的问题,或者知道解决方案的话,还请不吝啬于回复,非常感谢

文件句柄不够用了。请参考:

此外 4.3 版本已经停止支持了,也不建议使用 win 进行部署
可以尝试使用最新的 5.3.2 并使用 docker/WSL 等方式部署

非常感谢解答。我已经在上周五晚上设置过 USERProcessHandleQuota为18000并重启了ECS实例,从服务器端注册表处看到此参数已经修改成功,
1702873670(1)
但是周六下午仍然出现了此问题,仍然是重启了emqx后才恢复的,所以不太明白现在应该如何做了

新的日志也是这样么?
exit: {system_limit, [XXX]}

是的,当时失败时去查看了日志,确认是相同的错误后重启了emqx

并且有一次发生过这样的情况,通过emqx restart 命令返回了ok,但是实际重启失败了,通过emqx start 也无法启动,均提示命令无响应,必须要重启ECS实例后再重新启动EMQX才可以恢复