emqx内存溢出

  • 版本: 5.2.1
  • 问题:内存溢出
  • os: 5.10.210-201.852.amzn2.aarch64 #1 SMP Tue Feb 27 17:09:24 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux
  • 服务发现方式: static
  • core节点数量: 3,复制节点数量: 7

问题描述

总连接数在百万级,aws的一台虚拟机硬件出现问题,如图1所示
后设备连接到其他复制节点。
过了一段时间后,发现core中有两台节点出现内存溢出,如图2 所示
通过排查发现集群已不包含出问题节点,
查询日志发现core节点还在访问硬件出问题的机器,如下:

日志如下:

 [error] event=connect_to_remote_server peer="emqx-8l4sEI@xx.xx.2.246" result=failure reason="timeout"
 [error] crasher: initial call: gen_rpc_client:init/1, pid: <0.13258.7331>, registered_name: [], exit: {{badrpc,timeout},[{gen_server,init_it,6,[{file,"gen_server.erl"},{line,835}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}, ancestors: [gen_rpc_client_sup,gen_rpc_sup,<0.2135.0>], message_queue_len: 0, messages: [], links: [<0.2141.0>], dictionary: [], trap_exit: true, status: running, heap_size: 1598, stack_size: 28, reductions: 4908; neighbours: []
 [error] event=connect_to_remote_server peer="emqx-8l4sEI@xx.xx.2.246" result=failure reason="ehostunreach"
 [error] crasher: initial call: gen_rpc_client:init/1, pid: <0.14788.7328>, registered_name: [], exit: {{badrpc,ehostunreach},[{gen_server,init_it,6,[{file,"gen_server.erl"},{line,835}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}, ancestors: [gen_rpc_client_sup,gen_rpc_sup,<0.2135.0>], message_queue_len: 0, messages: [], links: [<0.2141.0>], dictionary: [], trap_exit: true, status: running, heap_size: 2586, stack_size: 28, reductions: 5000; neighbours: []

图1

图2

建议升级到 570, 5.x 早期版本是有可能出现这种情况