core节点开启 DEBUG 日志内存占用过高导致OOM

我想知道怎么来查询replicant节点连接的是哪个core?
在实际使用中,如果我减少或增加core节点。replicant会平均分配到core节点么?

在实际测试来看,版本5.0.25,core节点性能消耗不足预期,如下:

core节点性能消耗不平均,如图,在建立连接过程中,其中两个core节点内存升高,其中一个oom,

且在连接批量断开后一core节点内存升高直到OOM



实际节点没有连接数,但是dashboard还显示有连接数。

断开过程中,core节点cpu使用率100%,且内存一直升高。

直到oom kill

你的 Core 节点也接收链接了吗?能简单描述下测试的场景吗?

谢谢您的回复。

目前测试场景是,3 core 4c16G,20 replicant 8c16G,计划验证500w连接,core节点不接收连接。

经过今天的验证,我发现配置文件中某些参数会影响core的性能,但是不确定是哪些配置参数影响,默认配置是core节点正常。

以下是默认配置以及修改后参数的内存图

修改过配置文件

默认配置文件

还请您们帮忙确认配置参数中,什么对core的内存影响比较大

以下是修改后的配置文件

## NOTE:
## Configs in this file might be overridden by:
## 1. Environment variables which start with 'EMQX_' prefix
## 2. File $EMQX_NODE__DATA_DIR/configs/cluster-override.conf
## 3. File $EMQX_NODE__DATA_DIR/configs/local-override.conf
##
## The *-override.conf files are overwritten at runtime when changes
## are made from EMQX dashboard UI, management HTTP API, or CLI.
## All configuration details can be found in emqx.conf.example

node = {
    name = "emqx@192.168.189.211"
    cookie = "emqxsecretcookie"
    data_dir = "data"
    db_role = core
    process_limit = 2048000
    max_ports = 1024000
    dist_buffer_size = 8192
    max_ets_tables = 256000
    crash_dump_file = "log/crash.dump"
    dist_net_ticktime = 60
    cluster_call = {
      retry_interval = 1m
      max_history = 100
      cleanup_interval = 5m
  }
}

rpc = {
    mode = async
    async_batch_size = 256
    tcp_server_port = 5369
    tcp_client_num = 96
    port_discovery = manual
    connect_timeout = 3s
    send_timeout = 3s
    authentication_timeout = 3s
    call_receive_timeout = 7s
    socket_keepalive_idle = 15m
    socket_keepalive_interval = 75s
    socket_keepalive_count = 9
    socket_sndbuf = 1MB
    socket_recbuf = 1MB
    socket_buffer = 1MB
}

mqtt = {
    max_clientid_len = 1024
    max_topic_levels = 7
    max_qos_allowed = 2
    max_topic_alias = 0
    retain_available = true
    wildcard_subscription = false
    shared_subscription = false
    ignore_loop_deliver = false
}

cluster = {
    name = "emqx_v5_cluster"
    autoheal = true
    autoclean = 5m
    proto_dist = inet_tcp
    discovery_strategy = "etcd"
    etcd = {
      server = "https://etcd.xxxx.com:2379"
      prefix = "emqx-cluster"
      node_ttl = 1m
      ssl = {
        keyfile = "/data/etcdssl/etcd-key.pem"
        cacertfile = "/data/etcdssl/ca.pem"
        certfile = "/data/etcdssl/etcd.pem"
        enable = true
      }
    }
}

log = {
    file_handlers.default = {
      enable = true
      level = debug
      file = "log/emqx.log"
      chars_limit = 8192
      formatter = json
      max_size = 32MB
      rotation.count = 5
  }
}

listeners.tcp.external = {
    bind = "0.0.0.0:1883"
    max_connections = 1024000
    proxy_protocol = true
    proxy_protocol_timeout = 3s
    enable_authn = true
    acceptors = 64
    limiter.connection = {
      rate = "237/s"
      burst = "20"
    }
    access_rules = ["allow all"]
    tcp_options = {
      active_n = 100
      backlog = 1024
      send_timeout = 7s
      send_timeout_close = true
      nodelay = true
      reuseaddr = true
    }
    zone = external
}

zone.external.mqtt = {
    idle_timeout = 15s
    max_packet_size = 128KB
    exclusive_subscription = false
    use_username_as_clientid = false
    wildcard_subscription = false
    shared_subscription = false
    max_subscriptions = 20
    upgrade_qos = false
    keepalive_backoff = 0.75
    max_inflight = 32
    retry_interval = 10s
    max_awaiting_rel = 100
    await_rel_timeout = 50s
    session_expiry_interval = 2h
    max_mqueue_len = 100
    mqueue_priorities = disabled
    mqueue_default_priority = highest
    mqueue_store_qos0 = true
    ignore_loop_deliver = false
}

listeners.tcp.internal = {
  bind = "0.0.0.0:38811"
  acceptors = 64
  max_connections = 102400
  proxy_protocol = false
  enable_authn = true
  limiter.connection = {
    rate = "237/s"
    burst = "20"
  }
  tcp_options = {
    active_n = 300
    backlog = 1024
    send_timeout = 3s
    send_timeout_close = true
    nodelay = true
    reuseaddr = true
  }
  zone = internal
}

zone.internal.mqtt = {
    wildcard_subscription = true
    shared_subscription = true
    max_subscriptions = infinity
    max_inflight = 128
    max_awaiting_rel = 200
    max_mqueue_len = 2000
    mqueue_store_qos0 = true
    use_username_as_clientid = false
    ignore_loop_deliver = false
}

listeners.ssl.default = {
  bind = "0.0.0.0:8883"
  max_connections = 512000
  ssl_options {
    keyfile = "etc/certs/key.pem"
    certfile = "etc/certs/cert.pem"
    cacertfile = "etc/certs/cacert.pem"
  }
}

listeners.ws.default = {
  bind = "0.0.0.0:8083"
  max_connections = 1024
  acceptors = 8
  proxy_protocol = false
  websocket = {
    mqtt_path = "/mqtt"
    proxy_address_header = x-forwarded-for
    proxy_port_header = x-forwarded-port
  }
  limiter.connection = {
    rate = "100/s"
    burst = "20"
  }
  access_rules = ["allow all"]
}

listeners.wss.default = {
  bind = "0.0.0.0:8084"
  max_connections = 512000
  websocket.mqtt_path = "/mqtt"
  ssl_options = {
    keyfile = "etc/certs/key.pem"
    certfile = "etc/certs/cert.pem"
    cacertfile = "etc/certs/cacert.pem"
  }
}

# listeners.quic.default {
#  enabled = true
#  bind = "0.0.0.0:14567"
#  max_connections = 1024000
#  ssl_options {
#   verify = verify_none
#   keyfile = "etc/certs/key.pem"
#   certfile = "etc/certs/cert.pem"
#   cacertfile = "etc/certs/cacert.pem"
#  }
# }

dashboard = {
    listeners.http = {
        bind = 38080
    }
}

authorization = {
  deny_action = ignore
  no_match = deny
  sources = [
    {
    type = file
    enable = true
    path = "etc/acl.conf"
    }
  ]
  cache = {
    enable = true
    max_size = 64
    ttl = 15m
  }
}

authentication = [
  {
    backend = http
    method = post
    mechanism = password_based
    enable = true
    url = "http://xxxxxx/auth"
    body = {
      clientid =  "${clientid}"
      from =  "emqx5"
      ipaddr = "${peerhost}"
      password = "${password}"
      username = "${username}"
    }
    headers = {
      "Content-Type" = "application/json"
      "X-Request-Source" = "EMQX"
      "accept" = "application/json"
      "cache-control" = "no-cache"
      "connection" = "keep-alive"
      "keep-alive" = "timeout=30, max=1000"
    }
  }
]

sysmon = {
  os = {
    cpu_check_interval = 60s
    cpu_high_watermark = 95%
    cpu_low_watermark = 90%
    mem_check_interval = 60s
    sysmem_high_watermark = 80%
    procmem_high_watermark = 5%
  }
  vm = {
    long_gc = disabled
    long_schedule = 240ms
    large_heap = 8MB
    busy_port = false
    busy_dist_port = true
    process_high_watermark = 80%
    process_low_watermark = 70%
  }
}

force_gc = {
    enable = true
    bytes = 1MB
    count = 1000
}

force_shutdown = {
    enable = true
    max_message_queue_len = 1000
    max_heap_size = 100MB
}

看着是由于开启了 DEBUG 日志的原因

的确有可能,后面对日志 做了 overload kill 优化,未再出现这个问题,谢谢。

1 个赞

感谢关注,有什么测试成果也欢迎到社区进行分享哦~~