emqx5.6使用k8s部署,并使用etcd自动集群模式,etcd挂掉会影响emqx服务吗

集群部署架构

  • emqx:5.6
  • etcd:3.5.8
  • emqx部署方式:k8s mira+replicant
  • emqx-core个数:3
  • emqx-replicant个数:30
  • nginx反向代理转发连接请求

问题

请问下,在集群正常运行过程中,如果etcd服务挂掉了,会影响emqx服务吗。

一旦集群建立成功了,双方节点已经知道了对方的 IP 地址和节点名,那么通信是不受影响的。

但是如果某一个节点与集群断开时间过长(默认 5min)被移出集群之后,它再加入集群就是需要重新从 etcd 发现集群了。

部署方式使用的是helm的chart模板部署的emqx集群,不是官网的operator,也没有影响吗?
etcd宕机后,复制节点立刻爆出了如下异常

2024-07-17T03:09:48.104406+00:00 [info] Lease KeepAlive: <0.2411.0> find gun(<0.2408.0>) process stop {shutdown,closed}
2024-07-17T03:09:48.105426+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-17T03:09:48.106278+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-17T03:09:48.572268+00:00 [error] Ekka(AutoCluster): Core node discovery error: eetcd_conn_unavailable
2024-07-17T03:09:48.572537+00:00 [info] msg: mria_lb_core_discovery_new_nodes, ignored_nodes: [], node: 'emqx@10.69.77.200', previous_cores: ['emqx@10.69.64.109','emqx@10.69.75.235','emqx@10.69.78.125'], returned_cores: []
2024-07-17T03:09:48.907911+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-17T03:09:50.510019+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-17T03:09:50.523179+00:00 [error] Ekka(AutoCluster): Core node discovery error: eetcd_conn_unavailable
2024-07-17T03:09:51.545126+00:00 [error] Ekka(AutoCluster): Core node discovery error: eetcd_conn_unavailable
2024-07-17T03:09:53.407268+00:00 [error] Ekka(AutoCluster): Core node discovery error: eetcd_conn_unavailable
2024-07-17T03:09:53.711787+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-17T03:09:54.110054+00:00 [error] Supervisor: {local,ekka_cluster_sup}. Context: child_terminated. Reason: {shutdown,#{event => 'KeepAliveHalted',lease_id => 7587880083378856998,reason => eetcd_conn_unavailable}}. Offender: id=ekka_cluster_etcd,pid=<0.2406.0>.
2024-07-17T03:09:54.111369+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-17T03:09:54.111509+00:00 [error] Failed to connect ETCD: {"100.88.106.149",2379} by {shutdown,econnrefused}
2024-07-17T03:09:54.111632+00:00 [error] Supervisor: {local,ekka_cluster_sup}. Context: start_error. Reason: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}. Offender: id=ekka_cluster_etcd,pid=<0.2406.0>.
2024-07-17T03:09:54.111959+00:00 [error] crasher: initial call: ekka_cluster_etcd:init/1, pid: <0.47.2>, registered_name: [], error: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}, ancestors: [ekka_cluster_sup,ekka_sup,<0.2403.0>], message_queue_len: 0, messages: [], links: [<0.2405.0>], dictionary: [], trap_exit: true, status: running, heap_size: 610, stack_size: 28, reductions: 248; neighbours:
2024-07-17T03:09:54.112752+00:00 [warning] ekka_cluster_etcd failed to connect [100.88.106.149:2379] by <Gun Down> {shutdown,econnrefused}
2024-07-17T03:09:54.112875+00:00 [error] Failed to connect ETCD: {"100.88.106.149",2379} by {shutdown,econnrefused}
2024-07-17T03:09:54.113038+00:00 [error] Supervisor: {local,ekka_cluster_sup}. Context: start_error. Reason: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}. Offender: id=ekka_cluster_etcd,pid={restarting,<0.2406.0>}.
2024-07-17T03:09:54.113200+00:00 [error] crasher: initial call: ekka_cluster_etcd:init/1, pid: <0.50.2>, registered_name: [], error: {{badmatch,{error,[{{"100.88.106.149",2379},{shutdown,econnrefused}}]}},[{ekka_cluster_etcd,init,1,[{file,"ekka_cluster_etcd.erl"},{line,357}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,851}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,814}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,240}]}]}, ancestors: [ekka_cluster_sup,ekka_sup,<0.2403.0>], message_queue_len: 0, messages: [], links: [<0.2405.0>], dictionary: [], trap_exit: true, status: running, heap_size: 610, stack_size: 28, reductions: 246; neighbours: