emqx5.7.2压测到40w+时,报错

1.报错内容
2025-11-24T14:29:56.590577+08:00 [warning] clientid: AP_E0E1001210F2, msg: unclean_terminate, peername: 10.0.0.20:11500, username: E0E1001210F2, context: noimpl, stacktrace: [{emqx_session,maybe_mock_impl_mod,[undefined],[{file,“emqx_session.erl”},{line,622}]},{emqx_channel,is_durable_session,1,[{file,“emqx_channel.erl”},{line,2753}]},{emqx_channel,maybe_publish_will_msg,2,[{file,“emqx_channel.erl”},{line,2528}]},{emqx_channel,terminate,2,[{file,“emqx_channel.erl”},{line,1550}]},{emqx_connection,terminate,2,[{file,“emqx_connection.erl”},{line,637}]},{proc_lib,wake_up,3,[{file,“proc_lib.erl”},{line,251}]}], exception: error

2.报错2

2025-11-24T08:54:38.970421+08:00 [error] Process: <0.387030.0> on node ‘emqx@10.0.0.14’, Context: maximum heap size reached, Max Heap Size: 6291456, Total Heap Size: 16352172, Kill: true, Error Logger: true, Message Queue Len: 398, GC Info: [{old_heap_block_size,1727361},{heap_block_size,8866796},{mbuf_size,5758045},{recent_size,294957},{stack_size,26},{old_heap_size,219845},{heap_size,1439435},{bin_vheap_size,774784},{bin_vheap_block_size,832883},{bin_old_vheap_size,117907},{bin_old_vheap_block_size,196630}]
2025-11-24T08:54:48.131089+08:00 [error] Generic event handler emqx_alarm_handler crashed. Installed: alarm_handler. Last event: {set_alarm,{lc_runq_alarm,#{node => ‘emqx@10.0.0.14’,runq_length => 174724}}}. State: . Reason: {timeout,{gen_server,call,[emqx_alarm,{activate_alarm,runq_overload,#{node => ‘emqx@10.0.0.14’,runq_length => 174724},[86,77,32,105,115,32,111,118,101,114,108,111,97,100,101,100,32,111,110,32,110,111,100,101,58,32,“‘emqx@10.0.0.14’”,58,32,“174724”]}]}}.
2025-11-24T08:54:57.375972+08:00 [warning] msg: long_schedule, info: [{timeout,246},{in,{emqx_utils,drain_down,2}},{out,undefined}], procinfo: [{pid,<0.2720.0>},{memory,88782920},{total_heap_size,10626418},{heap_size,1199557},{stack_size,31},{min_heap_size,233},{proc_lib_initial_call,{emqx_broker_helper,init,[‘Argument__1’]}},{initial_call,{proc_lib,init_p,5}},{current_stacktrace,[{emqx_pmon,erase,2,[{file,“emqx_pmon.erl”},{line,90}]},{emqx_pmon,‘-erase_all/2-fun-0-’,2,[{file,“emqx_pmon.erl”},{line,99}]},{lists,foldl_1,3,[{file,“lists.erl”},{line,1599}]},{emqx_broker_helper,handle_info,2,[{file,“emqx_broker_helper.erl”},{line,180}]},{gen_server,try_handle_info,3,[{file,“gen_server.erl”},{line,1095}]},{gen_server,handle_msg,6,[{file,“gen_server.erl”},{line,1183}]},{proc_lib,init_p_do_apply,3,[{file,“proc_lib.erl”},{line,241}]}]},{registered_name,emqx_broker_helper},{status,running},{message_queue_len,42839},{group_leader,<0.2591.0>},{priority,normal},{trap_exit,false},{reductions,68513947},{last_calls,false},{catchlevel,2},{trace,0},{suspending,},{sequential_trace_token,},{error_handler,error_handler}]
2025-11-24T08:55:50.133921+08:00 [warning] msg: alarm_is_de

3.压测过程逐渐施压
1w,3w, 4w, 5w单位

多搞点 CPU

CPU消耗不大,目前是8h32G ,是从4h8G慢慢压上来的

如果 cpu 消耗不大,但是报 runq_overload 的话,应该算 bug,麻烦在 github 上描述一下可复现场景。

image
这是当时的CPU监控

而且大量的消息丢弃

有消息丢的话,那大概率是压测场景有点问题:

自己算了下:
不要让单个的订阅客户端压力过大:

建议每个订阅客户端的消息接收速率不超过 1500 消息/秒(按每条消息 1KB 计算)。

好滴,我这边压测过程中记录了一些问题能解答下吗

2025-11-24T16:30:00.643432+08:00 [error] Process: <0.3962.0> on node ‘emqx@10.0.0.14’, Context: maximum heap size reached, Max Heap Size: 6291456, Total Heap Size: 7572567, Kill: true, Error Logger: true, Message Queue Len: 94, GC Info: [{old_heap_block_size,2984878},{heap_block_size,4560232},{mbuf_size,27588},{recent_size,1655518},{stack_size,130},{old_heap_size,0},{heap_size,2072698},{bin_vheap_size,917136},{bin_vheap_block_size,1427042},{bin_old_vheap_size,0},{bin_old_vheap_block_size,514761}]

上线到50w左右就被kill掉了,但是内存还剩10G左右

这个内存是说单个客户端的内存,有大量消息积累,处理不了了就会内存高kill掉,还是上来那个原因,不要让单个客户端处理太多消息,tcp都发不过来了