replicant节点无法加入集群

crash.zip (547.7 KB)

2023-07-19T09:16:08.196074+00:00 [error] Failed to connect ETCD: {"etcd.xxxx.com",2379} by {shutdown,enetunreach} mfa: eetcd_conn:fold_connect/5 line: 205
2023-07-19T09:16:08.196514+00:00 [error] Supervisor: {local,ekka_cluster_sup}. Context: start_error. Reason: {{badmatch,{error,[{{"etcd.xxxx.com",2379},{shutdown,enetunreach}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,348}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}. Offender: id=ekka_cluster_etcd,pid=undefined.
2023-07-19T09:16:08.196362+00:00 [error] crasher: initial call: ekka_cluster_etcd:init/1, pid: <0.2064.0>, registered_name: [], error: {{badmatch,{error,[{{"etcd.xxxx.com",2379},{shutdown,enetunreach}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,348}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}, ancestors: [ekka_cluster_sup,ekka_sup,<0.2061.0>], message_queue_len: 0, messages: [], links: [<0.2063.0>], dictionary: [], trap_exit: true, status: running, heap_size: 376, stack_size: 28, reductions: 363; neighbours: []
2023-07-19T09:16:08.197395+00:00 [error] Supervisor: {local,ekka_sup}. Context: start_error. Reason: {shutdown,{failed_to_start_child,ekka_cluster_etcd,{{badmatch,{error,[{{"etcd.xxxx.com",2379},{shutdown,enetunreach}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,348}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}}}. Offender: id=ekka_cluster_sup,pid=undefined.
2023-07-19T09:16:08.198005+00:00 [error] crasher: initial call: application_master:init/4, pid: <0.2060.0>, registered_name: [], exit: {{{shutdown,{failed_to_start_child,ekka_cluster_sup,{shutdown,{failed_to_start_child,ekka_cluster_etcd,{{badmatch,{error,[{{"etcd.xxxx.com",2379},{shutdown,enetunreach}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,348}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}}}}},{ekka_app,start,[normal,[]]}},[{application_master,init,4,[{file,"application_master.erl"},{line,142}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}, ancestors: [<0.2059.0>], message_queue_len: 1, messages: [{'EXIT',<0.2061.0>,normal}], links: [<0.2059.0>,<0.1786.0>], dictionary: [], trap_exit: true, status: running, heap_size: 610, stack_size: 28, reductions: 205; neighbours: []
2023-07-19T09:16:08.206407+00:00 [error] crasher: initial call: application_master:init/4, pid: <0.2047.0>, registered_name: [], exit: {{bad_return,{{emqx_machine_app,start,[normal,[]]},{'EXIT',{{badmatch,{error,{ekka,{{shutdown,{failed_to_start_child,ekka_cluster_sup,{shutdown,{failed_to_start_child,ekka_cluster_etcd,{{badmatch,{error,[{{"etcd.xxxx.com",2379},{shutdown,enetunreach}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,348}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}}}}},{ekka_app,start,[normal,[]]}}}}},[{ekka,start,0,[{file,"ekka.erl"},{line,97}]},{emqx_machine,start,0,[{file,"emqx_machine.erl"},{line,45}]},{emqx_machine_app,start,2,[{file,"emqx_machine_app.erl"},{line,27}]},{application_master,start_it_old,4,[{file,"application_master.erl"},{line,293}]}]}}}},[{application_master,init,4,[{file,"application_master.erl"},{line,142}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}, ancestors: [<0.2046.0>], message_queue_len: 1, messages: [{'EXIT',<0.2048.0>,normal}], links: [<0.2046.0>,<0.1786.0>], dictionary: [], trap_exit: true, status: running, heap_size: 610, stack_size: 28, reductions: 222; neighbours: []
2023-07-19T09:16:09.624791+00:00 [error] Ekka(AutoCluster): Core node discovery error {noproc,{gen_server,call,[ekka_cluster_etcd,discover,5000]}}: [{gen_server,call,3,[{file,"gen_server.erl"},{line,385}]},{ekka_autocluster,core_node_discovery_callback,0,[{file,"ekka_autocluster.erl"},{line,86}]},{mria_lb,do_update,1,[{file,"mria_lb.erl"},{line,149}]},{mria_lb,handle_info,2,[{file,"mria_lb.erl"},{line,111}]},{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,1123}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,1200}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]
2023-07-19T09:16:11.217780+00:00 [error] Ekka(AutoCluster): Core node discovery error {noproc,{gen_server,call,[ekka_cluster_etcd,discover,5000]}}: [{gen_server,call,3,[{file,"gen_server.erl"},{line,385}]},{ekka_autocluster,core_node_discovery_callback,0,[{file,"ekka_autocluster.erl"},{line,86}]},{mria_lb,do_update,1,[{file,"mria_lb.erl"},{line,149}]},{mria_lb,handle_info,2,[{file,"mria_lb.erl"},{line,111}]},{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,1123}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,1200}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]

@heeejianbo 您好,确定5.1.1已经没这个问题了,但是replica启动的时候有crash dump,再重启就好了,这个问题你们是否要看看?

replica 启动 crash 是必现的么?
5.1.1 有一个已知的 etcd 自动集群的问题
https://github.com/emqx/emqx/issues/11312

偶现的
好的,等etcd自动发现修复了我再测试一下。