emqx 集群,使用nginx负载均衡之后,客户端每十分钟断线重连一次,什么原因?

今天开发了emqx集群,使用nginx 实现负载均衡。测试连接了一个客户端,但是出现每10分钟自动断线重连一下。重连机制是我自己写的。但是为什么会自动断线呢?

1、使用的用户和密码也是超级用户和密码。
2、keepAlive设置的0; reconnect为true

3、以下代码为nginx配置文件实现的负载均衡

stream {
    upstream stream_backend {
        server 192.168.2.116:1883 weight=1;
        server 192.168.2.5:1883 weight=1;
    }

    server {
        listen 1893;
        proxy_pass stream_backend;
        proxy_buffer_size 4k;
    }
}

4、这是在emqx track的日志:

  1. 1 emqx@192.168.2.5节点的日志:
2024-01-09T15:32:07.992780+08:00 [MQTT] MQTT-TRANSFER-HUB@192.168.2.5:49364 msg: mqtt_packet_received, packet: CONNECT(Q0, R0, D0, ClientId=MQTT-TRANSFER-HUB, ProtoName=MQTT, ProtoVsn=4, CleanStart=true, KeepAlive=0, Username=admin, Password=******)

2024-01-09T15:32:07.992780+08:00 [AUTHN] MQTT-TRANSFER-HUB@192.168.2.5:49364 msg: http_response, provider: emqx_authn_http, request: [base_url: http://192.168.2.5:8082/login, body: {"username":"admin","password":"[password]"}, headers: [{<<"accept">>,<<"application/json">>},{<<"cache-control">>,<<"no-cache">>},{<<"connection">>,<<"keep-alive">>},{<<"content-type">>,<<"application/json">>},{<<"keep-alive">>,<<"timeout=30, max=1000">>}], method: post, path_query: /login], resource: emqx_authn_http:2, response: [error: {resource_error,#{msg => "resource not connected",reason => not_connected}}]

2024-01-09T15:32:07.992780+08:00 [AUTHN] MQTT-TRANSFER-HUB@192.168.2.5:49364 msg: authenticator_result, authenticator: password_based:http, result: ignore

2024-01-09T15:32:07.992780+08:00 [AUTHN] MQTT-TRANSFER-HUB@192.168.2.5:49364 msg: authenticator_result, authenticator: password_based:built_in_database, result: {ok,#{is_superuser => true}}

2024-01-09T15:32:07.992780+08:00 [AUTHN] MQTT-TRANSFER-HUB@192.168.2.5:49364 msg: authentication_result, reason: chain_result, result: {stop,{ok,#{is_superuser => true}}}

2024-01-09T15:32:07.992780+08:00 [BRIDGE] MQTT-TRANSFER-HUB@192.168.2.5:49364 msg: bridge_action, bridge_id: {bridge_v2,http,'Device_Data_Push_WH_D'}

2024-01-09T15:32:07.992780+08:00 [BRIDGE] MQTT-TRANSFER-HUB@192.168.2.5:49364 msg: bridge_action, bridge_id: {bridge_v2,http,'Device_Data_Push_WH_D'}

2024-01-09T15:32:07.992780+08:00 [MQTT] MQTT-TRANSFER-HUB@192.168.2.5:49364 msg: mqtt_packet_sent, packet: CONNACK(Q0, R0, D0, AckFlags=0, ReasonCode=0)

2024-01-09T15:42:07.992194+08:00 [SOCKET] MQTT-TRANSFER-HUB@192.168.2.5:49364 msg: emqx_connection_terminated, reason: {shutdown,tcp_closed}

2024-01-09T15:52:09.054607+08:00 [MQTT] MQTT-TRANSFER-HUB@192.168.2.5:50048 msg: mqtt_packet_received, packet: CONNECT(Q0, R0, D0, ClientId=MQTT-TRANSFER-HUB, ProtoName=MQTT, ProtoVsn=4, CleanStart=true, KeepAlive=0, Username=admin, Password=******)

2024-01-09T15:52:09.054607+08:00 [QUERY] MQTT-TRANSFER-HUB@192.168.2.5:50048 msg: http_connector_received, connector: emqx_authn_http:2, note: the request body is redacted due to security reasons, request: {"/login",[{<<"accept">>,<<"application/json">>},{<<"cache-control">>,<<"no-cache">>},{<<"connection">>,<<"keep-alive">>},{<<"content-type">>,<<"application/json">>},{<<"keep-alive">>,<<"timeout=30, max=1000">>}],<<"******">>}, state: [base_path: /, connect_timeout: 15000, host: {192,168,2,5}, pool_name: emqx_authn_http:2, pool_type: random, port: 8082, request: undefined]

2024-01-09T15:52:09.069607+08:00 [AUTHN] MQTT-TRANSFER-HUB@192.168.2.5:50048 msg: http_response, provider: emqx_authn_http, request: [base_url: http://192.168.2.5:8082/login, body: {"username":"admin","password":"[password]"}, headers: [{<<"accept">>,<<"application/json">>},{<<"cache-control">>,<<"no-cache">>},{<<"connection">>,<<"keep-alive">>},{<<"content-type">>,<<"application/json">>},{<<"keep-alive">>,<<"timeout=30, max=1000">>}], method: post, path_query: /login], resource: emqx_authn_http:2, response: [body: {"msg":"用户不存在/密码错误","code":500}, headers: [{<<"vary">>,<<"Origin">>},{<<"vary">>,<<"Access-Control-Request-Method">>},{<<"vary">>,<<"Access-Control-Request-Headers">>},{<<"x-content-type-options">>,<<"nosniff">>},{<<"x-xss-protection">>,<<"1; mode=block">>},{<<"content-type">>,<<"application/json;charset=UTF-8">>},{<<"transfer-encoding">>,<<"chunked">>},{<<"date">>,<<"Tue, 09 Jan 2024 07:52:09 GMT">>},{<<"keep-alive">>,<<"timeout=60">>},{<<"connection">>,<<"keep-alive">>}], status: 200]

2024-01-09T15:52:09.069607+08:00 [AUTHN] MQTT-TRANSFER-HUB@192.168.2.5:50048 msg: authenticator_result, authenticator: password_based:http, result: ignore

2024-01-09T15:52:09.069607+08:00 [AUTHN] MQTT-TRANSFER-HUB@192.168.2.5:50048 msg: authenticator_result, authenticator: password_based:built_in_database, result: {ok,#{is_superuser => true}}

2024-01-09T15:52:09.069607+08:00 [AUTHN] MQTT-TRANSFER-HUB@192.168.2.5:50048 msg: authentication_result, reason: chain_result, result: {stop,{ok,#{is_superuser => true}}}

2024-01-09T15:52:09.069607+08:00 [MQTT] MQTT-TRANSFER-HUB@192.168.2.5:50048 msg: mqtt_packet_sent, packet: CONNACK(Q0, R0, D0, AckFlags=0, ReasonCode=0)
  1. 2 emqx@192.168.2.116节点的日志:
2024-01-09T15:30:04.181642+08:00 [MQTT] MQTT-TRANSFER-HUB@192.168.2.5:49160 msg: mqtt_packet_received, packet: CONNECT(Q0, R0, D0, ClientId=MQTT-TRANSFER-HUB, ProtoName=MQTT, ProtoVsn=4, CleanStart=true, KeepAlive=0, Username=fant, Password=******)

2024-01-09T15:30:04.181642+08:00 [AUTHN] MQTT-TRANSFER-HUB@192.168.2.5:49160 msg: http_response, provider: emqx_authn_http, request: [base_url: http://192.168.2.5:8082/login, body: {"username":"fant","password":"[password]"}, headers: [{<<"accept">>,<<"application/json">>},{<<"cache-control">>,<<"no-cache">>},{<<"connection">>,<<"keep-alive">>},{<<"content-type">>,<<"application/json">>},{<<"keep-alive">>,<<"timeout=30, max=1000">>}], method: post, path_query: /login], resource: emqx_authn_http:2, response: [error: {resource_error,#{msg => "resource not connected",reason => not_connected}}]

2024-01-09T15:30:04.181642+08:00 [AUTHN] MQTT-TRANSFER-HUB@192.168.2.5:49160 msg: authenticator_result, authenticator: password_based:http, result: ignore

2024-01-09T15:30:04.181642+08:00 [AUTHN] MQTT-TRANSFER-HUB@192.168.2.5:49160 msg: authenticator_result, authenticator: password_based:built_in_database, result: {ok,#{is_superuser => false}}

2024-01-09T15:30:04.181642+08:00 [AUTHN] MQTT-TRANSFER-HUB@192.168.2.5:49160 msg: authentication_result, reason: chain_result, result: {stop,{ok,#{is_superuser => false}}}

2024-01-09T15:30:04.196642+08:00 [BRIDGE] MQTT-TRANSFER-HUB@192.168.2.5:49160 msg: bridge_action, bridge_id: {bridge_v2,http,'Device_Data_Push_WH_D'}

2024-01-09T15:30:04.196642+08:00 [BRIDGE] MQTT-TRANSFER-HUB@192.168.2.5:49160 msg: bridge_action, bridge_id: {bridge_v2,http,'Device_OnoffLine_Push_WH_D'}

2024-01-09T15:30:04.196642+08:00 [BRIDGE] MQTT-TRANSFER-HUB@192.168.2.5:49160 msg: bridge_action, bridge_id: {bridge_v2,http,'Device_Data_Push_WH_D'}

2024-01-09T15:30:04.196642+08:00 [MQTT] MQTT-TRANSFER-HUB@192.168.2.5:49160 msg: mqtt_packet_sent, packet: CONNACK(Q0, R0, D0, AckFlags=0, ReasonCode=0)

2024-01-09T15:31:38.103459+08:00 [BRIDGE] MQTT-TRANSFER-HUB@192.168.2.5:49160 msg: bridge_action, bridge_id: {bridge_v2,http,'Device_Data_Push_WH_D'}

2024-01-09T15:31:38.103459+08:00 [BRIDGE] MQTT-TRANSFER-HUB@192.168.2.5:49160 msg: bridge_action, bridge_id: {bridge_v2,http,'Device_OnoffLine_Push_WH_D'}

2024-01-09T15:31:38.118458+08:00 [SOCKET] MQTT-TRANSFER-HUB@192.168.2.5:49160 msg: emqx_connection_terminated, reason: {shutdown,tcp_closed}

2024-01-09T15:42:09.336226+08:00 [MQTT] MQTT-TRANSFER-HUB@192.168.2.5:49719 msg: mqtt_packet_received, packet: CONNECT(Q0, R0, D0, ClientId=MQTT-TRANSFER-HUB, ProtoName=MQTT, ProtoVsn=4, CleanStart=true, KeepAlive=0, Username=admin, Password=******)

2024-01-09T15:42:09.336226+08:00 [QUERY] MQTT-TRANSFER-HUB@192.168.2.5:49719 msg: http_connector_received, connector: emqx_authn_http:2, note: the request body is redacted due to security reasons, request: {"/login",[{<<"accept">>,<<"application/json">>},{<<"cache-control">>,<<"no-cache">>},{<<"connection">>,<<"keep-alive">>},{<<"content-type">>,<<"application/json">>},{<<"keep-alive">>,<<"timeout=30, max=1000">>}],<<"******">>}, state: [base_path: /, connect_timeout: 15000, host: {192,168,2,5}, pool_name: emqx_authn_http:2, pool_type: random, port: 8082, request: undefined]

2024-01-09T15:42:09.742225+08:00 [AUTHN] MQTT-TRANSFER-HUB@192.168.2.5:49719 msg: http_response, provider: emqx_authn_http, request: [base_url: http://192.168.2.5:8082/login, body: {"username":"admin","password":"[password]"}, headers: [{<<"accept">>,<<"application/json">>},{<<"cache-control">>,<<"no-cache">>},{<<"connection">>,<<"keep-alive">>},{<<"content-type">>,<<"application/json">>},{<<"keep-alive">>,<<"timeout=30, max=1000">>}], method: post, path_query: /login], resource: emqx_authn_http:2, response: [body: {"msg":"用户不存在/密码错误","code":500}, headers: [{<<"vary">>,<<"Origin">>},{<<"vary">>,<<"Access-Control-Request-Method">>},{<<"vary">>,<<"Access-Control-Request-Headers">>},{<<"x-content-type-options">>,<<"nosniff">>},{<<"x-xss-protection">>,<<"1; mode=block">>},{<<"content-type">>,<<"application/json;charset=UTF-8">>},{<<"transfer-encoding">>,<<"chunked">>},{<<"date">>,<<"Tue, 09 Jan 2024 07:42:08 GMT">>},{<<"keep-alive">>,<<"timeout=60">>},{<<"connection">>,<<"keep-alive">>}], status: 200]

2024-01-09T15:42:09.742225+08:00 [AUTHN] MQTT-TRANSFER-HUB@192.168.2.5:49719 msg: authenticator_result, authenticator: password_based:http, result: ignore

2024-01-09T15:42:09.758225+08:00 [AUTHN] MQTT-TRANSFER-HUB@192.168.2.5:49719 msg: authenticator_result, authenticator: password_based:built_in_database, result: {ok,#{is_superuser => true}}

2024-01-09T15:42:09.758225+08:00 [AUTHN] MQTT-TRANSFER-HUB@192.168.2.5:49719 msg: authentication_result, reason: chain_result, result: {stop,{ok,#{is_superuser => true}}}

2024-01-09T15:42:09.773225+08:00 [MQTT] MQTT-TRANSFER-HUB@192.168.2.5:49719 msg: mqtt_packet_sent, packet: CONNACK(Q0, R0, D0, AckFlags=0, ReasonCode=0)

2024-01-09T15:52:09.771751+08:00 [SOCKET] MQTT-TRANSFER-HUB@192.168.2.5:49719 msg: emqx_connection_terminated, reason: {shutdown,tcp_closed}

这个 shutdown tcp_close是 emqx 的连接被对端断开了。
建议找一找 nginx 关于 keepalive 相关的配置。

经过不断地测试发现,将客户端的连接参数keepAlive设置为非0,就不出现问题了。我现在想知道keepAlive如果设置为0,是什么机制呢?官方对这个参数的具体概念是什么?

为 0 就是不发心跳,这个是mqtt协议规定的,详细的机制可以搜 mqtt + keepalive
我想你这情况是当不发心跳时,nginx 有配置多长时间没收到 tcp 包就断开连接的机制。所以 nginx 就把 2 边的连接都断开了。

嗯,但是我的nginx也没有配置相关的时间,代码我在下面贴着呢,昨天的现象是每隔10分钟断开一次,然后我会让他自动重连

http{
    keepalive_timeout  65;
}

stream {
    upstream stream_backend {
        server 192.168.2.116:1883 weight=1;
        server 192.168.2.5:1883 weight=1;
    }

    server {
        listen 1893;
        proxy_pass stream_backend;
        proxy_buffer_size 4k;
    }
}

抱歉,nginx 相关的配置和知识,我不太熟悉。
emqx 的日志里面看到 tcp_close 可以判定是连接被他的上游(现在的情况就是 nginx)断开了。 如果你想进一步确定,可以用 tcpdump 工具看 tcp 包的情况。

根据你给的 nginx 配置加上简单的搜索:
应该是没有设置 proxy_timeout 1800s; 他默认就是 10 分钟。

https://www.emqx.io/docs/zh/latest/deploy/cluster/lb-nginx.html#反向代理-mqtt

好的 谢谢,我看看去

那我如果设置了proxy_timeout 1800s,那就相当于不是10分钟自动断开了,而是30分钟自动断开了吧。没有解决实质性问题呀

推荐是要有这个断开机制的,如果 30 分钟都没有 tcp 包,断开了可以节省资源。
业务上你可以发mqtt

推荐是有断开机制的,如果 30 分钟都没有 tcp 包,断开了可以节省资源。
业务上你可以mqtt的心跳,只要小于 30 分钟,这个连接就一直在的。

是模拟发一个mqtt客户端的心跳包还是任意一条数据都可以?

不用模拟,mqtt 心跳是在客户端代码定时发的。
这个是mqtt标准的一部分。可以了解一下。

是不是任意数据也可以? 这个查一下 nginx 的文档就应该能了解。是的,只要有 tcp 包在传输,nginx就不会断开连接。

感谢。了解了很多 :+1:

您好,我还想咨询一个问题,就是我如果在nginx的配置中设置了proxy_protocol on;如:

stream {
  upstream mqtt_servers {
    # down:表示当前的 server 暂时不参与负载
    # max_fails:允许请求失败的次数;默认为 1
    # fail_timeout:失败超时时间,默认 10s, max_fails 达到次数后暂停的请求时间
    # backup:其它所有的非backup机器down或者忙的时候,请求backup机器

    server emqx1-cluster.emqx.io:1883 max_fails=2 fail_timeout=10s;
    server emqx2-cluster.emqx.io:1883 down;
    server emqx3-cluster.emqx.io:1883 backup;
  }

  server {
    listen 1883;
    proxy_pass mqtt_servers;

    # 启用此项时,对应后端监听器也需要启用 proxy_protocol
    proxy_protocol on;
    proxy_connect_timeout 10s;   
    # 默认心跳时间为 10 分钟
    proxy_timeout 1800s;
    proxy_buffer_size 3M;
    tcp_nodelay on;       
  }
}

我是不是需要在平台的管理→监听器→tcp→代理协议设置为true? 除了这块,还有哪里需要设置?

我进行了集群nginx负载均衡