emqx 集群状态没问题但是总是 [discovered nodes outside cluster]

2024-03-22T07:59:03.040000+00:00 [info] Ekka(AutoCluster): joining with 'emqx@node1.emqx.io'
2024-03-22T07:59:03.040551+00:00 [debug] Ekka(AutoCluster): join result: {error,{already_in_cluster,'emqx@node1.emqx.io'}}
2024-03-22T07:59:03.040622+00:00 [warning] Ekka(AutoCluster): discovered nodes outside cluster: [' emqx@node2.emqx.io']
2024-03-22T07:59:03.040692+00:00 [warning] Ekka(AutoCluster): discovery did not succeed; retrying in 5000 ms
2024-03-22T07:59:03.248550+00:00 [info] 10.0.0.3:60720 file: emqx_connection.erl, line: 544, mfa: {emqx_connection,terminate,2}, msg: terminate, pid: <0.3786.16>, reason: {shutdown,tcp_closed}
2024-03-22T07:59:05.340335+00:00 [info] 10.0.0.3:47222 file: emqx_connection.erl, line: 544, mfa: {emqx_connection,terminate,2}, msg: terminate, pid: <0.3787.16>, reason: {shutdown,tcp_closed}
2024-03-22T07:59:07.287570+00:00 [info] 10.0.0.3:25425 file: emqx_connection.erl, line: 544, mfa: {emqx_connection,terminate,2}, msg: terminate, pid: <0.3789.16>, reason: {shutdown,tcp_closed}
2024-03-22T07:59:09.226633+00:00 [info] 10.0.0.3:34578 file: emqx_connection.erl, line: 544, mfa: {emqx_connection,terminate,2}, msg: terminate, pid: <0.3791.16>, reason: {shutdown,tcp_closed}
2024-03-22T07:59:09.602769+00:00 [error] ** Cannot get connection id for node 'emqx@node2.emqx.io'

你是用什么方式发现节点的? cluster.discovery 相关的配置是怎么配的?

现在发现了 ’ emqx@node2.emqx.io’ 节点但是连不上他。我注意到 ’ emqx@node2.emqx.io’ 这个节点名里有一个空格,这应该是个配置的问题。

是的已经处理了,警告也消失了,但是新问题是首次连接的时候会出现 授权错误, 然后设置自动重连后有时候可以重连上有时候需要很久才行

2024-03-22T08:46:44.117689+00:00 [debug] 10.0.0.3:35729 [MQTT] RECV <<16,59,0,4,77,81,84,84,4,194,0,60,0,5,49,54,48,54,54,0,6,49,49,48,50,52,55,0,32,85,104,101,97,51,68,76,73,102,51,114,120,53,49,88,51,76,119,117,122,99,54,110,108,68,50,48,116,54,97,70,115>>
2024-03-22T08:46:44.117827+00:00 [debug] 10.0.0.3:35729 [MQTT] RECV CONNECT(Q0, R0, D0ClientId=16066, ProtoName=MQTT, ProtoVsn=4, CleanStart=true, KeepAlive=60, Username=110247, Password=******)
2024-03-22T08:46:44.118803+00:00 [error] 16066@10.0.0.3:35729 [Redis] Auth from redis failed: not_authorized
2024-03-22T08:46:44.118856+00:00 [warning] 16066@10.0.0.3:35729 [Channel] Client 16066 (Username: '110247') login failed for not_authorized
2024-03-22T08:46:44.118925+00:00 [debug] 16066@10.0.0.3:35729 [MQTT] SEND CONNACK(Q0, R0, D0AckFlags=0, ReasonCode=5)

你的 emqx 版本是哪个?你可以尝试升级到最新版本试试看,怕是已修复过的问题。 4.4.19 (开源)或者 4.4.23 (企业版)。

通过你的描述,我觉着很可能是跟 redis 的 idle-timeout 配置有关。你检查一下,是不是 redis 有启用发呆断连这种配置,关掉它试试看。

用的这个个版本: emqx/emqx:4.4.10

阿里云的redis