基于etcd的多个不同集群会相互join

环境

  • EMQX 版本:5.8.2
  • 操作系统版本:ubuntu

重现此问题的步骤

我们在测试环境搭了多套emqx集群,他们都基于同一个etcd服务。我们发现这几个不同emqx集群机器之间会相互join

etcd上数据如下

$ etcdctl get --prefix=true emqx
emqx/daily/c0/ekkacl/nodes/emqx@10.37.34.93
emqx@10.37.34.93
emqx/daily/c0/ekkacl/nodes/emqx@10.37.4.173
emqx@10.37.4.173
emqx/daily/c0/ekkacl/nodes/emqx@10.37.51.213
emqx@10.37.51.213
emqx/daily/c1/ekkacl/nodes/emqx@10.37.50.115
emqx@10.37.50.115
emqx/daily/c1/ekkacl/nodes/emqx@10.37.51.95
emqx@10.37.51.95
emqx/perf/c0/ekkacl/nodes/emqx@10.37.34.219
emqx@10.37.34.219
emqx/perf/c0/ekkacl/nodes/emqx@10.37.34.94
emqx@10.37.34.94
emqx/perf/c0/ekkacl/nodes/emqx@10.37.49.212
emqx@10.37.49.212
emqx/perf/c1/ekkacl/nodes/emqx@10.37.34.120
emqx@10.37.34.120
emqx/perf/c1/ekkacl/nodes/emqx@10.37.49.34
emqx@10.37.49.34
emqx/perf/c1/ekkacl/nodes/emqx@10.37.50.76
emqx@10.37.50.76```

我们以emqx/环境/集群名作为集群发现的prefix
集群c0如下

$etcdctl get emqx/daily/c0/ekkacl/nodes/ emqx/daily/c0/ekkacl/nodes0
emqx/daily/c0/ekkacl/nodes/emqx@10.37.34.93
emqx@10.37.34.93
emqx/daily/c0/ekkacl/nodes/emqx@10.37.4.173
emqx@10.37.4.173
emqx/daily/c0/ekkacl/nodes/emqx@10.37.51.213
emqx@10.37.51.213

我们在其中一台机器上执行命令,发现会获取etcd上的所有节点

v5.8.2(emqx@10.37.34.93)1> {ok, {etcd, Options}} = ekka:env(cluster_discovery).
{ok,{etcd,[{server,["http://10.37.43.175:2379"]},
           {prefix,"emqx/daily/c0"},
           {node_ttl,60000},
           {ssl_options,[{ciphers,[]},
                         {depth,10},
                         {enable,false},
                         {hibernate_after,5000},
                         {log_level,notice},
                         {reuse_sessions,true},
                         {secure_renegotiate,true},
                         {verify,verify_none},
                         {versions,['tlsv1.3','tlsv1.2']}]}]}}
v5.8.2(emqx@10.37.34.93)2>  ekka_cluster_strategy:discover(ekka_cluster_etcd, Options).
{ok,['emqx@10.37.34.93','emqx@10.37.4.173',
     'emqx@10.37.51.213','emqx@10.37.50.115','emqx@10.37.51.95',
     'emqx@10.37.34.219','emqx@10.37.34.94','emqx@10.37.49.212',
     'emqx@10.37.34.120','emqx@10.37.49.34','emqx@10.37.50.76',
     test,'emqx@10.37.51.64']}

错误日志

2025-08-26T11:45:30.778963+08:00 [info] Ekka(AutoCluster): discovered nodes are not responding: ['emqx@10.37.50.115','emqx@10.37.51.95','emqx@10.37.34.219','emqx@10.37.34.94','emqx@10.37.49.212','','emqx@10.37.34.120','emqx@10.37.49.34','emqx@10.37.50.76',test,'emqx@10.37.51.64']

猜测和etcd获取逻辑有关,下面ekka中的代码

v3_nodes_context(Prefix) ->
    Ctx = eetcd_kv:new(?MODULE),
    Ctx1 = eetcd_kv:with_key(Ctx, v3_nodes_key(Prefix)),
    Ctx2 = eetcd_kv:with_range_end(Ctx1, "\0"),
    eetcd_kv:with_sort(Ctx2, 'KEY', 'ASCEND').