EMQX 4.x集群+负载均衡,压测超过1000后一直断开连接重连

版本:EMQX 版本4.4.18

部署方式:zip

集群节点:4

操作系统及服务器数量:1台CentOS7,3台CentOS8

认证方式:redis

优化参数:按照文档中linux系统调优已进行调优

问题:集群四个节点,单机单节点仅【连接】压测可达到6W连接且不出现断开连接重连情况。部署为集群+负载均衡后,进行【连接】压测时在1000+会出现断开连接且重新连接,当【连接】压测连接数为1W时出现断开连接且重新连接超过次数后不在进行连接。

负载均衡配置
Nginx负载均衡配置

emqx配置文件

info.log

2023-06-27T09:00:42.629598+08:00 [info] mqttx-a-81@192.168.7.133:44512 file: emqx_connection.erl, line: 544, mfa: {emqx_connection,terminate,2}, msg: terminate, pid: <0.10317.3>, reason: {shutdown,tcp_closed}
2023-06-27T09:00:42.630212+08:00 [info] mqttx-a-92@192.168.7.133:44556 file: emqx_connection.erl, line: 544, mfa: {emqx_connection,terminate,2}, msg: terminate, pid: <0.10333.3>, reason: {shutdown,tcp_closed}
2023-06-27T09:00:42.631249+08:00 [info] mqttx-a-113@192.168.7.133:44640 file: emqx_connection.erl, line: 544, mfa: {emqx_connection,terminate,2}, msg: terminate, pid: <0.10362.3>, reason: {shutdown,tcp_closed}
2023-06-27T09:00:42.631198+08:00 [info] mqttx-a-134@192.168.7.133:44724 file: emqx_connection.erl, line: 544, mfa: {emqx_connection,terminate,2}, msg: terminate, pid: <0.10393.3>, reason: {shutdown,tcp_closed}
2023-06-27T09:00:42.631049+08:00 [info] mqttx-a-105@192.168.7.133:44608 file: emqx_connection.erl, line: 544, mfa: {emqx_connection,terminate,2}, msg: terminate, pid: <0.10352.3>, reason: {shutdown,tcp_closed}
2023-06-27T09:00:42.631175+08:00 [info] mqttx-a-180@192.168.7.133:44908 file: emqx_connection.erl, line: 544, mfa: {emqx_connection,terminate,2}, msg: terminate, pid: <0.10439.3>, reason: {shutdown,tcp_closed}
2023-06-27T09:00:42.631552+08:00 [info] mqttx-a-117@192.168.7.133:44656 file: emqx_connection.erl, line: 544, mfa: {emqx_connection,terminate,2}, msg: terminate, pid: <0.10365.3>, reason: {shutdown,tcp_closed}
2023-06-27T09:00:42.631467+08:00 [info] mqttx-a-163@192.168.7.133:44840 file: emqx_connection.erl, line: 544, mfa: {emqx_connection,terminate,2}, msg: terminate, pid: <0.10419.3>, reason: {shutdown,tcp_closed}
2023-06-27T09:00:42.631407+08:00 [info] mqttx-a-201@192.168.7.133:44992 file: emqx_connection.erl, line: 544, mfa: {emqx_connection,terminate,2}, msg: terminate, pid: <0.10462.3>, reason: {shutdown,tcp_closed}
2023-06-27T09:00:42.631811+08:00 [info] mqttx-a-317@192.168.7.133:45456 file: emqx_connection.erl, line: 544, mfa: {emqx_connection,terminate,2}, msg: terminate, pid: <0.10624.3>, reason: {shutdown,tcp_closed}

debug日志

2023-06-27T09:04:00.815000+08:00 [info] mqttx-bench-743@192.168.7.133:51266 file: emqx_connection.erl, line: 544, mfa: {emqx_connection,terminate,2}, msg: terminate, pid: <0.12654.3>, reason: {shutdown,tcp_closed}
2023-06-27T09:04:00.816906+08:00 [info] mqttx-bench-1497@192.168.7.133:54536 file: emqx_connection.erl, line: 544, mfa: {emqx_connection,terminate,2}, msg: terminate, pid: <0.13601.3>, reason: {shutdown,tcp_closed}
2023-06-27T09:04:00.835552+08:00 [debug] client_id: <<"mqttx-bench-87">>, file: emqx_cm.erl, line: 566, mfa: {emqx_cm,clean_down,1}, msg: emqx_cm_clean_down, pid: <0.20747.0>
2023-06-27T09:04:00.816989+08:00 [info] mqttx-bench-1433@192.168.7.133:54720 file: emqx_connection.erl, line: 544, mfa: {emqx_connection,terminate,2}, msg: terminate, pid: <0.13643.3>, reason: {shutdown,tcp_closed}
2023-06-27T09:04:00.835603+08:00 [debug] client_id: <<"mqttx-bench-573">>, file: emqx_cm.erl, line: 566, mfa: {emqx_cm,clean_down,1}, msg: emqx_cm_clean_down, pid: <0.20749.0>
2023-06-27T09:04:00.832916+08:00 [debug] client_id: <<"mqttx-bench-539">>, file: emqx_cm.erl, line: 566, mfa: {emqx_cm,clean_down,1}, msg: emqx_cm_clean_down, pid: <0.20751.0>
2023-06-27T09:04:00.782676+08:00 [info] mqttx-bench-181@192.168.7.133:49018 file: emqx_connection.erl, line: 544, mfa: {emqx_connection,terminate,2}, msg: terminate, pid: <0.11940.3>, reason: {shutdown,tcp_closed}
2023-06-27T09:04:00.836095+08:00 [debug] client_id: <<"mqttx-bench-535">>, file: emqx_cm.erl, line: 566, mfa: {emqx_cm,clean_down,1}, msg: emqx_cm_clean_down, pid: <0.20751.0>
2023-06-27T09:04:00.799400+08:00 [info] mqttx-bench-366@192.168.7.133:49758 file: emqx_connection.erl, line: 544, mfa: {emqx_connection,terminate,2}, msg: terminate, pid: <0.12181.3>, reason: {shutdown,tcp_closed}
2023-06-27T09:04:00.835912+08:00 [debug] client_id: <<"mqttx-bench-709">>, file: emqx_cm.erl, line: 566, mfa: {emqx_cm,clean_down,1}, msg: emqx_cm_clean_down, pid: <0.20749.0>
2023-06-27T09:04:00.815727+08:00 [info] mqttx-bench-1452@192.168.7.133:54224 file: emqx_connection.erl, line: 544, mfa: {emqx_connection,terminate,2}, msg: terminate, pid: <0.13513.3>, reason: {shutdown,tcp_closed}
2023-06-27T09:04:00.836088+08:00 [debug] client_id: <<"mqttx-bench-743">>, file: emqx_cm.erl, line: 566, mfa: {emqx_cm,clean_down,1}, msg: emqx_cm_clean_down, pid: <0.20746.0>
2023-06-27T09:04:00.836111+08:00 [debug] client_id: <<"mqttx-bench-1433">>, file: emqx_cm.erl, line: 566, mfa: {emqx_cm,clean_down,1}, msg: emqx_cm_clean_down, pid: <0.20747.0>
2023-06-27T09:04:00.836391+08:00 [debug] client_id: <<"mqttx-bench-1017">>, file: emqx_cm.erl, line: 566, mfa: {emqx_cm,clean_down,1}, msg: emqx_cm_clean_down, pid: <0.20751.0>
2023-06-27T09:04:00.814948+08:00 [info] mqttx-bench-737@192.168.7.133:51242 file: emqx_connection.erl, line: 544, mfa: {emqx_connection,terminate,2}, msg: terminate, pid: <0.12643.3>, reason: {shutdown,tcp_closed}
2023-06-27T09:04:00.836608+08:00 [debug] client_id: <<"mqttx-bench-1497">>, file: emqx_cm.erl, line: 566, mfa: {emqx_cm,clean_down,1}, msg: emqx_cm_clean_down, pid: <0.20747.0>
2023-06-27T09:04:00.836655+08:00 [debug] client_id: <<"mqttx-bench-302">>, file: emqx_cm.erl, line: 566, mfa: {emqx_cm,clean_down,1}, msg: emqx_cm_clean_down, pid: <0.20751.0>
2023-06-27T09:04:00.815129+08:00 [info] mqttx-bench-1176@192.168.7.133:52884 file: emqx_connection.erl, line: 544, mfa: {emqx_connection,terminate,2}, msg: terminate, pid: <0.13182.3>, reason: {shutdown,tcp_closed}
2023-06-27T09:04:00.836569+08:00 [debug] client_id: <<"mqttx-bench-1019">>, file: emqx_cm.erl, line: 566, mfa: {emqx_cm,clean_down,1}, msg: emqx_cm_clean_down, pid: <0.20749.0>

看日志没有太多有帮助的信息。我感觉是 Nginx 那台机器有点问题

可以尝试下绕过 nginx,直接给集群的每个节点单独连接,做个对比测试看看

单独连接都可以达到上万且没有出现上图的那个问题,这个是整体的日志info和debug的,麻烦您再看一下有什么问题吗
日志.zip (615.7 KB)

使用的mqtt cli 压测连接命令进行的连接,单独给每个节点没什么问题,就是通过nginx代理后会出现这个问题

换了另外一台装上nginx也是同样的问题

那肯定是 Nginx 配置或者机器的问题了,和EMQX没多大关系了。

  • Nginx 的机器优化了吗?
  • Nginx 配置全文发来看看?
  • 另外客户端的心跳时间是?或者压测命令贴来看看

Nginx的配置

# For more information on configuration, see:
#   * Official English Documentation: http://nginx.org/en/docs/
#   * Official Russian Documentation: http://nginx.org/ru/docs/

user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log;
pid /run/nginx.pid;

# Load dynamic modules. See /usr/share/doc/nginx/README.dynamic.
include /usr/share/nginx/modules/*.conf;

events {
    worker_connections 1024;
}


stream {
  # 轮询负载均衡配置
  upstream emqx_cluster {
      server 192.168.30.201:1883 weight=1;
      server 192.168.30.202:1883 weight=1;
      server 192.168.30.203:1883 weight=1;
      server 192.168.7.133:1883 weight=1;
  }
  server {
      # 监听 8884 端口
      listen 8884;
      # 反向代理到 emqx_cluster
      proxy_pass emqx_cluster;
      proxy_buffer_size 3M;
      tcp_nodelay on;
      #ssl_handshake_timeout 15s;
      # 证书配置
      #ssl_certificate     /etc/nginx/cert/nginx.pem;
      #ssl_certificate_key /etc/nginx/cert/nginx.key;
  }

}

http {
    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

    access_log  /var/log/nginx/access.log  main;

    sendfile            on;
    tcp_nopush          on;
    tcp_nodelay         on;
    keepalive_timeout   65;
    types_hash_max_size 2048;

    include             /etc/nginx/mime.types;
    default_type        application/octet-stream;

    # Load modular configuration files from the /etc/nginx/conf.d directory.
    # See http://nginx.org/en/docs/ngx_core_module.html#include
    # for more information.
    include /etc/nginx/conf.d/*.conf;

    server {
        listen       80 default_server;
        listen       [::]:80 default_server;
        server_name  _;
        root         /usr/share/nginx/html;

        # Load configuration files for the default server block.
        include /etc/nginx/default.d/*.conf;

        location / {
        }

        error_page 404 /404.html;
            location = /40x.html {
        }

        error_page 500 502 503 504 /50x.html;
            location = /50x.html {
        }
    }

# Settings for a TLS enabled server.
#
#    server {
#        listen       443 ssl http2 default_server;
#        listen       [::]:443 ssl http2 default_server;
#        server_name  _;
#        root         /usr/share/nginx/html;
#
#        ssl_certificate "/etc/pki/nginx/server.crt";
#        ssl_certificate_key "/etc/pki/nginx/private/server.key";
#        ssl_session_cache shared:SSL:1m;
#        ssl_session_timeout  10m;
#        ssl_ciphers PROFILE=SYSTEM;
#        ssl_prefer_server_ciphers on;
#
#        # Load configuration files for the default server block.
#        include /etc/nginx/default.d/*.conf;
#
#        location / {
#        }
#
#        error_page 404 /404.html;
#            location = /40x.html {
#        }
#
#        error_page 500 502 503 504 /50x.html;
#            location = /50x.html {
#        }
#    }

}


都进行过文档中的参数优化

压测命令是

mqttx bench conn -h 192.168.7.133 -p 8884 -u test -P 123456 -c 1000 -i 10 -I "mqttx-bench-%i"

找到问题了。抱歉,没注意看麻烦您了

nginx 连接数的限制的问题,默认只有1024,太少了导致上图的问题

:+1::+1:

谢谢谢谢 :joy:

不用客气的~