emqx5.6使用k8s部署,并使用etcd自动集群模式,etcd挂掉服务无法识别主节点

环境

  • emqx:5.6
  • etcd:3.5.8
  • emqx部署方式:k8s mira+replicant
  • emqx-core个数:3
  • emqx-replicant个数:30
  • 部署方式:helm模板
  • nginx反向代理转发连接请求

重现此问题的步骤

  1. 启动etcd集群
  2. 使用helm部署emqx,emqx使用etcd自动集群
  3. emqx集群ready后过一段时间,下线etcd集群

预期行为

下线etcd后,emqx集群应该无影响,并且emqx节点之间可以互相识别,因为节点已经注册完成了,不应该无法识别

实际行为

core节点一直输出如下日志
2024-07-18T03:00:28.996949+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:00:28.997138+00:00 [error] Failed to connect ETCD: {"100.88.106.149",2379} by {shutdown,econnrefused}
2024-07-18T03:00:28.997382+00:00 [error] Supervisor: {local,ekka_cluster_sup}. Context: start_error. Reason: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}. Offender: id=ekka_cluster_etcd,pid=<0.4094.0>.
2024-07-18T03:00:28.997722+00:00 [error] crasher: initial call: ekka_cluster_etcd:init/1, pid: <0.5637.26>, registered_name: [], error: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}, ancestors: [ekka_cluster_sup,ekka_sup,<0.4090.0>], message_queue_len: 0, messages: [], links: [<0.4093.0>], dictionary: [], trap_exit: true, status: running, heap_size: 376, stack_size: 28, reductions: 258; neighbours:
2024-07-18T03:00:28.998965+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:00:28.999102+00:00 [error] Failed to connect ETCD: {"100.88.106.149",2379} by {shutdown,econnrefused}
2024-07-18T03:00:28.999312+00:00 [error] Supervisor: {local,ekka_cluster_sup}. Context: start_error. Reason: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}. Offender: id=ekka_cluster_etcd,pid={restarting,<0.4094.0>}.
2024-07-18T03:00:28.999367+00:00 [error] crasher: initial call: ekka_cluster_etcd:init/1, pid: <0.5640.26>, registered_name: [], error: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}, ancestors: [ekka_cluster_sup,ekka_sup,<0.4090.0>], message_queue_len: 0, messages: [], links: [<0.4093.0>], dictionary: [], trap_exit: true, status: running, heap_size: 376, stack_size: 28, reductions: 256; neighbours:
2024-07-18T03:00:29.000436+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:00:29.000570+00:00 [error] Failed to connect ETCD: {"100.88.106.149",2379} by {shutdown,econnrefused}
2024-07-18T03:00:29.000776+00:00 [error] Supervisor: {local,ekka_cluster_sup}. Context: start_error. Reason: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}. Offender: id=ekka_cluster_etcd,pid={restarting,<0.4094.0>}.
2024-07-18T03:00:29.000835+00:00 [error] crasher: initial call: ekka_cluster_etcd:init/1, pid: <0.5643.26>, registered_name: [], error: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}, ancestors: [ekka_cluster_sup,ekka_sup,<0.4090.0>], message_queue_len: 0, messages: [], links: [<0.4093.0>], dictionary: [], trap_exit: true, status: running, heap_size: 376, stack_size: 28, reductions: 256; neighbours:
2024-07-18T03:00:29.001950+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:00:29.002101+00:00 [error] Failed to connect ETCD: {"100.88.106.149",2379} by {shutdown,econnrefused}
2024-07-18T03:00:29.002387+00:00 [error] Supervisor: {local,ekka_cluster_sup}. Context: start_error. Reason: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}. Offender: id=ekka_cluster_etcd,pid={restarting,<0.4094.0>}.
2024-07-18T03:00:29.002531+00:00 [error] crasher: initial call: ekka_cluster_etcd:init/1, pid: <0.5646.26>, registered_name: [], error: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}, ancestors: [ekka_cluster_sup,ekka_sup,<0.4090.0>], message_queue_len: 0, messages: [], links: [<0.4093.0>], dictionary: [], trap_exit: true, status: running, heap_size: 376, stack_size: 28, reductions: 256; neighbours:
2024-07-18T03:00:29.003884+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:00:29.004050+00:00 [error] Failed to connect ETCD: {"100.88.106.149",2379} by {shutdown,econnrefused}
2024-07-18T03:00:29.004318+00:00 [error] Supervisor: {local,ekka_cluster_sup}. Context: start_error. Reason: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}. Offender: id=ekka_cluster_etcd,pid={restarting,<0.4094.0>}.
2024-07-18T03:00:29.004353+00:00 [error] crasher: initial call: ekka_cluster_etcd:init/1, pid: <0.5649.26>, registered_name: [], error: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}, ancestors: [ekka_cluster_sup,ekka_sup,<0.4090.0>], message_queue_len: 0, messages: [], links: [<0.4093.0>], dictionary: [], trap_exit: true, status: running, heap_size: 376, stack_size: 28, reductions: 256; neighbours:
2024-07-18T03:00:29.005458+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:00:29.005579+00:00 [error] Failed to connect ETCD: {"100.88.106.149",2379} by {shutdown,econnrefused}
2024-07-18T03:00:29.005737+00:00 [error] Supervisor: {local,ekka_cluster_sup}. Context: start_error. Reason: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}. Offender: id=ekka_cluster_etcd,pid={restarting,<0.4094.0>}.
2024-07-18T03:00:29.005791+00:00 [error] crasher: initial call: ekka_cluster_etcd:init/1, pid: <0.5652.26>, registered_name: [], error: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}, ancestors: [ekka_cluster_sup,ekka_sup,<0.4090.0>], message_queue_len: 0, messages: [], links: [<0.4093.0>], dictionary: [], trap_exit: true, status: running, heap_size: 376, stack_size: 28, reductions: 256; neighbours:
2024-07-18T03:00:29.006829+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:00:29.006951+00:00 [error] Failed to connect ETCD: {"100.88.106.149",2379} by {shutdown,econnrefused}
2024-07-18T03:00:29.007075+00:00 [error] Supervisor: {local,ekka_cluster_sup}. Context: start_error. Reason: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}. Offender: id=ekka_cluster_etcd,pid={restarting,<0.4094.0>}.
2024-07-18T03:00:29.007135+00:00 [error] crasher: initial call: ekka_cluster_etcd:init/1, pid: <0.5655.26>, registered_name: [], error: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}, ancestors: [ekka_cluster_sup,ekka_sup,<0.4090.0>], message_queue_len: 0, messages: [], links: [<0.4093.0>], dictionary: [], trap_exit: true, status: running, heap_size: 376, stack_size: 28, reductions: 256; neighbours:
2024-07-18T03:00:29.007916+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:00:29.008016+00:00 [error] Failed to connect ETCD: {"100.88.106.149",2379} by {shutdown,econnrefused}
2024-07-18T03:00:29.008176+00:00 [error] Supervisor: {local,ekka_cluster_sup}. Context: start_error. Reason: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}. Offender: id=ekka_cluster_etcd,pid={restarting,<0.4094.0>}.
2024-07-18T03:00:29.008170+00:00 [error] crasher: initial call: ekka_cluster_etcd:init/1, pid: <0.5658.26>, registered_name: [], error: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}, ancestors: [ekka_cluster_sup,ekka_sup,<0.4090.0>], message_queue_len: 0, messages: [], links: [<0.4093.0>], dictionary: [], trap_exit: true, status: running, heap_size: 376, stack_size: 28, reductions: 256; neighbours:
2024-07-18T03:00:29.009302+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:00:29.009413+00:00 [error] Failed to connect ETCD: {"100.88.106.149",2379} by {shutdown,econnrefused}
2024-07-18T03:00:29.009603+00:00 [error] Supervisor: {local,ekka_cluster_sup}. Context: start_error. Reason: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}. Offender: id=ekka_cluster_etcd,pid={restarting,<0.4094.0>}.
2024-07-18T03:00:29.009678+00:00 [error] crasher: initial call: ekka_cluster_etcd:init/1, pid: <0.5661.26>, registered_name: [], error: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}, ancestors: [ekka_cluster_sup,ekka_sup,<0.4090.0>], message_queue_len: 0, messages: [], links: [<0.4093.0>], dictionary: [], trap_exit: true, status: running, heap_size: 376, stack_size: 28, reductions: 256; neighbours:
2024-07-18T03:00:29.010604+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:00:29.010786+00:00 [error] Failed to connect ETCD: {"100.88.106.149",2379} by {shutdown,econnrefused}
2024-07-18T03:00:29.010959+00:00 [error] Supervisor: {local,ekka_cluster_sup}. Context: start_error. Reason: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}. Offender: id=ekka_cluster_etcd,pid={restarting,<0.4094.0>}.
2024-07-18T03:00:29.011138+00:00 [error] Supervisor: {local,ekka_cluster_sup}. Context: shutdown. Reason: reached_max_restart_intensity. Offender: id=ekka_cluster_etcd,pid={restarting,<0.4094.0>}.
2024-07-18T03:00:29.011126+00:00 [error] crasher: initial call: ekka_cluster_etcd:init/1, pid: <0.5664.26>, registered_name: [], error: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}, ancestors: [ekka_cluster_sup,ekka_sup,<0.4090.0>], message_queue_len: 0, messages: [], links: [<0.4093.0>], dictionary: [], trap_exit: true, status: running, heap_size: 376, stack_size: 28, reductions: 256; neighbours:
2024-07-18T03:00:34.999323+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:00:47.801428+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:01:13.403390+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:01:13.605420+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:01:14.007365+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:01:14.809237+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:01:16.411198+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:01:19.613313+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:01:26.015147+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:01:38.817265+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:02:04.419217+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:02:04.621392+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:02:05.023263+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:02:05.825317+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:02:07.427258+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:02:10.629128+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:02:17.031173+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:02:29.833267+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
replicant节点无法识别core节点,一直输出如下日志
2024-07-18T03:00:22.998149+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:00:22.999154+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:00:23.800477+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:00:23.831809+00:00 [error] Ekka(AutoCluster): Core node discovery error: eetcd_conn_unavailable
2024-07-18T03:00:23.831986+00:00 [info] msg: mria_lb_core_discovery_new_nodes, ignored_nodes: [], node: 'emqx@10.69.66.245', previous_cores: ['emqx@10.69.77.227','emqx@10.69.79.134','emqx@10.69.89.92'], returned_cores: []
2024-07-18T03:00:25.402457+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:00:25.558729+00:00 [error] Ekka(AutoCluster): Core node discovery error: eetcd_conn_unavailable
2024-07-18T03:00:27.216729+00:00 [error] Ekka(AutoCluster): Core node discovery error: eetcd_conn_unavailable
2024-07-18T03:00:28.382759+00:00 [error] Ekka(AutoCluster): Core node discovery error: eetcd_conn_unavailable
2024-07-18T03:00:28.604408+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:00:28.995641+00:00 [error] Supervisor: {local,ekka_cluster_sup}. Context: child_terminated. Reason: {shutdown,#{event => 'KeepAliveHalted',lease_id => 7587880098853566099,reason => eetcd_conn_unavailable}}. Offender: id=ekka_cluster_etcd,pid=<0.2305.0>.
2024-07-18T03:00:28.996949+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:00:28.997277+00:00 [error] Failed to connect ETCD: {"100.88.106.149",2379} by {shutdown,econnrefused}
2024-07-18T03:00:28.997452+00:00 [error] Supervisor: {local,ekka_cluster_sup}. Context: start_error. Reason: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}. Offender: id=ekka_cluster_etcd,pid=<0.2305.0>.
2024-07-18T03:00:28.998165+00:00 [error] crasher: initial call: ekka_cluster_etcd:init/1, pid: <0.28172.2>, registered_name: [], error: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}, ancestors: [ekka_cluster_sup,ekka_sup,<0.2302.0>], message_queue_len: 0, messages: [], links: [<0.2304.0>], dictionary: [], trap_exit: true, status: running, heap_size: 376, stack_size: 28, reductions: 258; neighbours:
2024-07-18T03:00:28.998843+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:00:28.998944+00:00 [error] Failed to connect ETCD: {"100.88.106.149",2379} by {shutdown,econnrefused}
2024-07-18T03:00:28.999164+00:00 [error] Supervisor: {local,ekka_cluster_sup}. Context: start_error. Reason: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}. Offender: id=ekka_cluster_etcd,pid={restarting,<0.2305.0>}.
2024-07-18T03:00:28.999187+00:00 [error] crasher: initial call: ekka_cluster_etcd:init/1, pid: <0.28175.2>, registered_name: [], error: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}, ancestors: [ekka_cluster_sup,ekka_sup,<0.2302.0>], message_queue_len: 0, messages: [], links: [<0.2304.0>], dictionary: [], trap_exit: true, status: running, heap_size: 376, stack_size: 28, reductions: 256; neighbours:
2024-07-18T03:00:29.000240+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:00:29.000344+00:00 [error] Failed to connect ETCD: {"100.88.106.149",2379} by {shutdown,econnrefused}
2024-07-18T03:00:29.000466+00:00 [error] Supervisor: {local,ekka_cluster_sup}. Context: start_error. Reason: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}. Offender: id=ekka_cluster_etcd,pid={restarting,<0.2305.0>}.
2024-07-18T03:00:29.000520+00:00 [error] crasher: initial call: ekka_cluster_etcd:init/1, pid: <0.28178.2>, registered_name: [], error: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}, ancestors: [ekka_cluster_sup,ekka_sup,<0.2302.0>], message_queue_len: 0, messages: [], links: [<0.2304.0>], dictionary: [], trap_exit: true, status: running, heap_size: 376, stack_size: 28, reductions: 256; neighbours:
2024-07-18T03:00:29.001438+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:00:29.001564+00:00 [error] Failed to connect ETCD: {"100.88.106.149",2379} by {shutdown,econnrefused}
2024-07-18T03:00:29.001782+00:00 [error] Supervisor: {local,ekka_cluster_sup}. Context: start_error. Reason: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}. Offender: id=ekka_cluster_etcd,pid={restarting,<0.2305.0>}.
2024-07-18T03:00:29.001938+00:00 [error] crasher: initial call: ekka_cluster_etcd:init/1, pid: <0.28181.2>, registered_name: [], error: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}, ancestors: [ekka_cluster_sup,ekka_sup,<0.2302.0>], message_queue_len: 0, messages: [], links: [<0.2304.0>], dictionary: [], trap_exit: true, status: running, heap_size: 376, stack_size: 28, reductions: 256; neighbours:
2024-07-18T03:00:29.003057+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:00:29.003145+00:00 [error] Failed to connect ETCD: {"100.88.106.149",2379} by {shutdown,econnrefused}
2024-07-18T03:00:29.003238+00:00 [error] Supervisor: {local,ekka_cluster_sup}. Context: start_error. Reason: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}. Offender: id=ekka_cluster_etcd,pid={restarting,<0.2305.0>}.
2024-07-18T03:00:29.003307+00:00 [error] crasher: initial call: ekka_cluster_etcd:init/1, pid: <0.28184.2>, registered_name: [], error: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}, ancestors: [ekka_cluster_sup,ekka_sup,<0.2302.0>], message_queue_len: 0, messages: [], links: [<0.2304.0>], dictionary: [], trap_exit: true, status: running, heap_size: 376, stack_size: 28, reductions: 256; neighbours:
2024-07-18T03:00:29.004234+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:00:29.004354+00:00 [error] Failed to connect ETCD: {"100.88.106.149",2379} by {shutdown,econnrefused}
2024-07-18T03:00:29.004461+00:00 [error] Supervisor: {local,ekka_cluster_sup}. Context: start_error. Reason: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}. Offender: id=ekka_cluster_etcd,pid={restarting,<0.2305.0>}.
2024-07-18T03:00:29.004638+00:00 [error] crasher: initial call: ekka_cluster_etcd:init/1, pid: <0.28187.2>, registered_name: [], error: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}, ancestors: [ekka_cluster_sup,ekka_sup,<0.2302.0>], message_queue_len: 0, messages: [], links: [<0.2304.0>], dictionary: [], trap_exit: true, status: running, heap_size: 376, stack_size: 28, reductions: 256; neighbours:
2024-07-18T03:00:29.005425+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:00:29.005516+00:00 [error] Failed to connect ETCD: {"100.88.106.149",2379} by {shutdown,econnrefused}
2024-07-18T03:00:29.005670+00:00 [error] Supervisor: {local,ekka_cluster_sup}. Context: start_error. Reason: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}. Offender: id=ekka_cluster_etcd,pid={restarting,<0.2305.0>}.
2024-07-18T03:00:29.005752+00:00 [error] crasher: initial call: ekka_cluster_etcd:init/1, pid: <0.28190.2>, registered_name: [], error: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}, ancestors: [ekka_cluster_sup,ekka_sup,<0.2302.0>], message_queue_len: 0, messages: [], links: [<0.2304.0>], dictionary: [], trap_exit: true, status: running, heap_size: 376, stack_size: 28, reductions: 256; neighbours:
2024-07-18T03:00:29.006693+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:00:29.006827+00:00 [error] Failed to connect ETCD: {"100.88.106.149",2379} by {shutdown,econnrefused}
2024-07-18T03:00:29.006970+00:00 [error] Supervisor: {local,ekka_cluster_sup}. Context: start_error. Reason: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}. Offender: id=ekka_cluster_etcd,pid={restarting,<0.2305.0>}.
2024-07-18T03:00:29.007106+00:00 [error] crasher: initial call: ekka_cluster_etcd:init/1, pid: <0.28193.2>, registered_name: [], error: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}, ancestors: [ekka_cluster_sup,ekka_sup,<0.2302.0>], message_queue_len: 0, messages: [], links: [<0.2304.0>], dictionary: [], trap_exit: true, status: running, heap_size: 376, stack_size: 28, reductions: 256; neighbours:
2024-07-18T03:00:29.008010+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-18T03:00:29.008119+00:00 [error] Failed to connect ETCD: {"100.88.106.149",2379} by {shutdown,econnrefused}

econnrefused 是说对面端口连不上。一般是服务没启动或者挂了。

是的,但是按我理解不应该影响emqx已有集群,从测试看etcd服务挂了,会对emqx现有集群造成影响?

哦哦 是因为多次重试连接 etcd 失败之后,emqx 节点发现的组件挂了,这可能会导致 emqx 挂掉。重启后加入集群又失败。

可以在集群建立后关闭这种探测吗?
1、节点启动默认从etcd自动集群获取集群信息
2、集群建立成功后节点之间使用静态地址来获取集群信息,不通过etcd
3、如果节点无法识别core节点时或者重启后,再从etcd获取
现在可以这样配置实现么

那你就使用 static 模式建立集群吧,现在看起来 etcd 模式必须要保证 etcd 服务的可用性,不然节点会挂。

好吧,使用的k8s部署的,pod的ip会变更,而且部署的节点太多,k8s也是多套互相格力的,节点之间使用静态方式识别,那我在想想办法,感谢大佬支持