emqx-replicant crash 请官方协助分析原因

EMQX版本: 5.0.20
高可用架构: ETCD
集群: 2core + 2replicant
OS: centos 7.9
现象:
1、任意手工"bin/emqx stop"关闭一个core节点,集群正常;
2、直接关机任意一个core节点服务器,或者kill -9 PID ,导致2*replicant都crash了,集群变成 单core节点;
3、同样的版本和架构,另一套集群正常;
crash文件如下,请帮忙分析原因;
erl_crash.dump.zip (645.2 KB)

日志报错如下:

2024-07-26T19:12:30.612057+08:00 [info] msg: terminate, mfa: emqx_connection:terminate/2, line: 668, peername: 10.216.0.148:13596, reason: {shutdown,tcp_closed}
2024-07-26T19:12:30.770800+08:00 [info] msg: terminate, mfa: emqx_connection:terminate/2, line: 668, peername: 10.216.0.148:27756, reason: {shutdown,tcp_closed}
2024-07-26T19:12:35.612174+08:00 [info] msg: terminate, mfa: emqx_connection:terminate/2, line: 668, peername: 10.216.0.148:13597, reason: {shutdown,tcp_closed}
2024-07-26T19:12:35.624355+08:00 [info] msg: terminate, mfa: emqx_connection:terminate/2, line: 668, peername: 10.216.0.18:40032, reason: {shutdown,tcp_closed}
2024-07-26T19:12:35.770945+08:00 [info] msg: terminate, mfa: emqx_connection:terminate/2, line: 668, peername: 10.216.0.148:27757, reason: {shutdown,tcp_closed}
2024-07-26T19:12:35.892712+08:00 [info] event=client_process_not_found target="{'emqx-core-gece20@10.216.0.250',emqx_cm_shard}" action=spawning_client
2024-07-26T19:12:35.893120+08:00 [info] event=initializing_client driver=tcp node="emqx-core-gece20@10.216.0.250" port=5390
2024-07-26T19:12:35.895872+08:00 [info] msg: gen_rpc_client_process_not_found, mfa: gen_rpc_client:cast_worker/4, line: 484, peername: 10.216.0.56:59768, clientid: bridge:mqtt:qb-bridge-source-cmd:8:emqx-core10@10.216.0.56, target: {'emqx-replicant-gece20@10.216.0.56',3}
2024-07-26T19:12:35.896271+08:00 [info] event=client_connection_received driver=tcp socket="#Port<0.10>" action=starting_acceptor
2024-07-26T19:12:35.896494+08:00 [info] event=initializing_client driver=tcp node="emqx-replicant-gece20@10.216.0.56" port=5390
2024-07-26T19:12:35.896568+08:00 [info] event=start driver=tcp peer="10.216.0.199:51264"
2024-07-26T19:12:35.899480+08:00 [info] msg: authorization_failed_nomatch, mfa: emqx_authz:authorize_non_superuser/5, line: 369, peername: 10.216.0.56:59768, clientid: bridge:mqtt:qb-bridge-source-cmd:8:emqx-core10@10.216.0.56, topic: 210171e2928f4eb8a22b1d51a97a4077/+/cmd, ipaddr: {10,216,0,56}, reason: no-match rule, username: <<"pbbridgeadmin">>
2024-07-26T19:12:35.900103+08:00 [info] event=client_process_not_found target="{'emqx-core-gece20@10.216.0.250',emqx_shared_sub_shard}" action=spawning_client
2024-07-26T19:12:35.900598+08:00 [info] event=initializing_client driver=tcp node="emqx-core-gece20@10.216.0.250" port=5390
2024-07-26T19:12:35.902690+08:00 [info] event=client_process_not_found target="{'emqx-core-gece20@10.216.0.250',route_shard}" action=spawning_client
2024-07-26T19:12:35.903022+08:00 [info] event=initializing_client driver=tcp node="emqx-core-gece20@10.216.0.250" port=5390
2024-07-26T19:12:35.903482+08:00 [info] event=client_connection_received driver=tcp socket="#Port<0.10>" action=starting_acceptor
2024-07-26T19:12:35.903762+08:00 [info] event=start driver=tcp peer="10.216.0.199:51266"
2024-07-26T19:12:35.905660+08:00 [info] event=client_connection_received driver=tcp socket="#Port<0.10>" action=starting_acceptor
2024-07-26T19:12:35.905903+08:00 [info] event=start driver=tcp peer="10.216.0.199:51268"
2024-07-26T19:12:40.516154+08:00 [info] msg: terminate, mfa: emqx_connection:terminate/2, line: 668, peername: 10.216.0.148:41959, reason: {shutdown,tcp_closed}
2024-07-26T19:12:40.611700+08:00 [info] msg: terminate, mfa: emqx_connection:terminate/2, line: 668, peername: 10.216.0.148:13598, reason: {shutdown,tcp_closed}
2024-07-26T19:12:40.771719+08:00 [info] msg: terminate, mfa: emqx_connection:terminate/2, line: 668, peername: 10.216.0.148:27758, reason: {shutdown,tcp_closed}
2024-07-26T19:12:42.749282+08:00 [error] message=channel_closed driver=tcp socket="#Port<0.177>" action=stopping
2024-07-26T19:12:42.749134+08:00 [error] message=channel_closed driver=tcp socket="#Port<0.173>" action=stopping
2024-07-26T19:12:42.749032+08:00 [notice] msg: gen_rpc_channel_closed, mfa: gen_rpc_acceptor:handle_event/4, line: 223, action: stopping, driver: tcp, peer: 10.216.0.250:44644, socket: #Port<0.131>
2024-07-26T19:12:42.749485+08:00 [info] Mria(Membership): Node emqx-core-gece20@10.216.0.250 down
2024-07-26T19:12:42.749452+08:00 [error] message=channel_closed driver=tcp socket="#Port<0.132>" action=stopping
2024-07-26T19:12:42.749541+08:00 [error] message=channel_closed driver=tcp socket="#Port<0.176>" action=stopping
2024-07-26T19:12:42.918154+08:00 [info] event=client_process_not_found target="{'emqx-core-gece20@10.216.0.250',route_shard}" action=spawning_client
2024-07-26T19:12:42.918691+08:00 [info] event=initializing_client driver=tcp node="emqx-core-gece20@10.216.0.250" port=5390
2024-07-26T19:12:42.919247+08:00 [error] event=connect_to_remote_server peer="emqx-core-gece20@10.216.0.250" result=failure reason="econnrefused"
2024-07-26T19:12:42.919408+08:00 [error] crasher: initial call: gen_rpc_client:init/1, pid: <0.3435.0>, registered_name: [], exit: {{badrpc,econnrefused},[{gen_server,init_it,6,[{file,"gen_server.erl"},{line,407}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,226}]}]}, ancestors: [gen_rpc_client_sup,gen_rpc_sup,<0.1876.0>], message_queue_len: 0, messages: [], links: [<0.1882.0>], dictionary: [], trap_exit: true, status: running, heap_size: 2586, stack_size: 28, reductions: 17780; neighbours:
2024-07-26T19:12:42.920294+08:00 [error] Generic server emqx_router_helper terminating. Reason: {{badrpc,econnrefused},[{mria_lib,unwrap_exception,1,[{file,"mria_lib.erl"},{line,126}]},{mria,rpc_to_core_node,5,[{file,"mria.erl"},{line,433}]},{global,trans,4,[{file,"global.erl"},{line,463}]},{emqx_router_helper,handle_info,2,[{file,"emqx_router_helper.erl"},{line,150}]},{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,695}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,771}]},{proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,236}]}]}. Last message: {membership,{node,down,'emqx-core-gece20@10.216.0.250'}}. State: #{nodes => ['emqx-replicant-gece2-@10.216.0.79','emqx-replicant-gece20@10.216.0.56']}.
2024-07-26T19:12:42.920534+08:00 [error] crasher: initial call: emqx_router_helper:init/1, pid: <0.2315.0>, registered_name: emqx_router_helper, error: {{badrpc,econnrefused},[{mria_lib,unwrap_exception,1,[{file,"mria_lib.erl"},{line,126}]},{mria,rpc_to_core_node,5,[{file,"mria.erl"},{line,433}]},{global,trans,4,[{file,"global.erl"},{line,463}]},{emqx_router_helper,handle_info,2,[{file,"emqx_router_helper.erl"},{line,150}]},{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,695}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,771}]},{proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,236}]}]}, ancestors: [emqx_router_sup,emqx_sup,<0.2231.0>], message_queue_len: 2, messages: [{mnesia_table_event,{delete,{emqx_routing_node,'emqx-core-gece20@10.216.0.250'},{dirty,<0.2299.0>}}},{mnesia_table_event,{delete,{emqx_routing_node,'emqx-core-gece20@10.216.0.250'},{dirty,<0.2299.0>}}}], links: [<0.2314.0>], dictionary: [{rand_seed,{#{bits => 58,jump => #Fun<rand.3.92093067>,next => #Fun<rand.0.92093067>,type => exsss,uniform => #Fun<rand.1.92093067>,uniform_n => #Fun<rand.2.92093067>},[37537719274300492|31223902360558891]}}], trap_exit: true, status: running, heap_size: 6772, stack_size: 28, reductions: 13166; neighbours:
2024-07-26T19:12:42.920936+08:00 [error] Supervisor: {local,emqx_router_sup}. Context: child_terminated. Reason: {{badrpc,econnrefused},[{mria_lib,unwrap_exception,1,[{file,"mria_lib.erl"},{line,126}]},{mria,rpc_to_core_node,5,[{file,"mria.erl"},{line,433}]},{global,trans,4,[{file,"global.erl"},{line,463}]},{emqx_router_helper,handle_info,2,[{file,"emqx_router_helper.erl"},{line,150}]},{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,695}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,771}]},{proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,236}]}]}. Offender: id=helper,pid=<0.2315.0>.
2024-07-26T19:12:42.921239+08:00 [error] Supervisor: {local,emqx_router_sup}. Context: shutdown. Reason: reached_max_restart_intensity. Offender: id=helper,pid=<0.2315.0>.
2024-07-26T19:12:42.921437+08:00 [error] Supervisor: {local,emqx_sup}. Context: child_terminated. Reason: shutdown. Offender: id=emqx_router_sup,pid=<0.2314.0>.
2024-07-26T19:12:42.921576+08:00 [error] Supervisor: {local,emqx_sup}. Context: shutdown. Reason: reached_max_restart_intensity. Offender: id=emqx_router_sup,pid=<0.2314.0>.
2024-07-26T19:12:42.924668+08:00 [notice] tcp:default stopped on 0.0.0.0:11883
2024-07-26T19:12:42.931526+08:00 [notice] Application: emqx. Exited: shutdown. Type: permanent.
2024-07-26T19:12:42.960617+08:00 [info] msg: exhook_mgr_terminated, mfa: emqx_exhook_mgr:terminate/2, line: 312, reason: shutdown, servers: #{}
2024-07-26T19:12:43.039752+08:00 [info] msg: mria_lb_core_discovery_new_nodes, mfa: mria_lb:do_update/1, line: 195, ignored_nodes: ['emqx-core-gece20@10.216.0.250'], node: 'emqx-replicant-gece2-@10.216.0.79', previous_cores: ['emqx-core-gece20@10.216.0.199','emqx-core-gece20@10.216.0.250'], returned_cores: ['emqx-core-gece20@10.216.0.199']
2024-07-26T19:12:43.060628+08:00 [info] Lease KeepAlive: <0.2125.0> find gun(<0.2123.0>) process stop shutdown, mfa: eetcd_lease:handle_info/2 line: 144
2024-07-26T19:12:43.093096+08:00 [notice] msg: Mria is stopped, mfa: mria_app:stop/1, line: 45

你好,方便使用最新的 EMQX 版本(5.7.1)在试试么?