How can i make the “Kill: true”, when “maximum heap size reached”?

As time goes, the whole memory of the emqx inscreases。
How can i make the process killed ,when “maximum heap size reached”?
2024-01-23 00:33:21.096 [error] Process: <0.8166.8> on node ‘emqx@127.0.0.1’
Context: maximum heap size reached
Max Heap Size: 2097152
Total Heap Size: 2787348
Kill: false
Error Logger: true
GC Info: [{old_heap_block_size,514838},
{heap_block_size,2272494},
{mbuf_size,16},
{recent_size,305321},
{stack_size,122},
{old_heap_size,455223},
{heap_size,832890},
{bin_vheap_size,33306},
{bin_vheap_block_size,46422},
{bin_old_vheap_size,8341},
{bin_old_vheap_block_size,46422}]

Hi, chris. emqx will automatically kill some client processes that exceed the memory limit. We don’t need to kill it manually.

For example, in 5.x, you can refer to this configuration:

i.e:

force_shutdown {
  enable = true
  max_mailbox_size = 1000
  max_heap_size = "32MB"
}

Isn’t version of 4.2.14 not supported ?

I changed all the parameters i could imagine, but it doesn’t worked, until a system oom-killer occurs.

My configuration as follows:
emqx.conf:

zone.external.force_gc_policy = 100|5MB
zone.external.force_shutdown_policy = 300|10MB
zone.external.max_packet_size = 1MB
zone.external.max_mqueue_len = 300
zone.external.session_expiry_interval = 10m

vm.args:
image

It’s version 5 format. It is equivalent to zone.external.force_shutdown_policy = 1000|10MB

If it still happens to have OOM issues. It might be worth investigating the memory footprint of EMQX’s VM near the time of the crash.

i.e,

  1. using the emqx_ctl vm memory to print the VM memory distribution
  2. using the emqx_ctl recon proc_count memory 100 to print the 100 processes that use the most memory.

BTW, 4.2 is beyond our maintenance cycle, you can try upgrading to 5.x or 4.4 in the future.

emqx_ctl vm memory

memory/total            : 6452771512
memory/processes        : 3279384232
memory/processes_used   : 3279378656
memory/system           : 3173387280
memory/atom             : 1443993
memory/atom_used        : 1431260
memory/binary           : 3108468544
memory/code             : 30387746
memory/ets              : 7269568

emqx_ctl recon proc_count memory 10

[{<0.6096.0>,2091810560,
  [{current_function,{gen,do_call,4}},{initial_call,{proc_lib,init_p,5}}]},
 {<0.12448.13>,1132481760,
  [{current_function,{gen,do_call,4}},{initial_call,{proc_lib,init_p,5}}]},
 {<0.17894.21>,13821256,
  [{current_function,{emqx_connection,recvloop,2}},
   {initial_call,{proc_lib,init_p,5}}]},
 {<0.935.22>,603752,
  [{current_function,{emqx_connection,recvloop,2}},
   {initial_call,{proc_lib,init_p,5}}]},
 {<0.1288.0>,461412,
  [application_controller,
   {current_function,{gen_server,loop,7}},
   {initial_call,{erlang,apply,2}}]},
 {<0.923.22>,426888,
  [{current_function,{emqx_connection,recvloop,2}},
   {initial_call,{proc_lib,init_p,5}}]},
 {<0.918.22>,426888,
  [{current_function,{emqx_connection,recvloop,2}},
   {initial_call,{proc_lib,init_p,5}}]},
 {<0.913.22>,426888,
  [{current_function,{emqx_connection,recvloop,2}},
   {initial_call,{proc_lib,init_p,5}}]},
 {<0.1402.0>,372648,
  [ranch_sup,
   {current_function,{gen_server,loop,7}},
   {initial_call,{proc_lib,init_p,5}}]},
 {<0.2418.0>,286296,
  [{current_function,{emqx_connection,recvloop,2}},
   {initial_call,{proc_lib,init_p,5}}]}]

emqx remote_console information on the remote server.

========================================================================================
 'emqx@127.0.0.1'                                                          01:01:02
 Load:  cpu         1               Memory:  total     5515460    binary    2269778
        procs     381                        processes 3072341    code        29676
        runq        0                        atom         1410    ets          7194

Pid            Name or Initial Func    Time    Reds  Memory    MsgQ Current Function
----------------------------------------------------------------------------------------
<8867.29600.2> emqx_connection:init     '-'      42******** 2246890 gen:do_call/4
<8867.19863.2> emqx_connection:init     '-'      42******** 2320382 gen:do_call/4
<8867.28846.11>emqx_connection:init     '-'     753 4723508       0 emqx_connection:recv
<8867.28855.11>emqx_connection:init     '-'     753 4722476       0 emqx_connection:recv
<8867.28890.11>emqx_connection:init     '-'     753 4721388       0 emqx_connection:recv
<8867.28845.11>emqx_connection:init     '-'     753 4720788       0 emqx_connection:recv
<8867.28851.11>emqx_connection:init     '-'     712 2919140       0 emqx_connection:recv
<8867.28840.11>emqx_connection:init     '-'     753 1948636       0 emqx_connection:recv
<8867.28959.11>emqx_connection:init     '-'     891 1804500       0 emqx_connection:recv
<8867.29000.11>emqx_connection:init     '-'  218185 1804456       0 emqx_connection:recv
<8867.28946.11>emqx_connection:init     '-'     753 1804444       0 emqx_connection:recv
<8867.28862.11>emqx_connection:init     '-'    2343 1803900       0 emqx_connection:recv
<8867.1405.0>  ranch_sup                '-'       1  602176       0 gen_server:loop/7
<8867.28966.11>etop_txt:init/1          '-'   19209  426700       0 etop:update/1
<8867.1288.0>  application_controll     '-'       1  319556       0 gen_server:loop/7
<8867.28942.11>erlang:apply/2           '-'       1  264164       0 shell:get_command1/5
<8867.1621.0>  ranch_conns_sup:init     '-'      42  236152       0 ranch_conns_sup:loop
<8867.1599.0>  ranch_conns_sup:init     '-'       2  230544       0 ranch_conns_sup:loop
<8867.1954.0>  emqx_connection:init     '-'    3434  143280       0 emqx_connection:recv
<8867.1807.0>  emqx_connection:init     '-'    3574  143224       0 emqx_connection:recv
========================================================================================one process information:(emqx@127.0.0.1)3> process_info(pid(0,29600,2), current_stacktrace).
{current_stacktrace,[{gen,do_call,4,
                          [{file,"gen.erl"},{line,167}]},
                     {gen_server,call,3,[{file,"gen_server.erl"},{line,219}]},
                     {emqx_cm,takeover_session,2,
                              [{file,"emqx_cm.erl"},{line,267}]},
                     {emqx_cm,'-open_session/3-fun-1-',5,
                              [{file,"emqx_cm.erl"},{line,222}]},
                     {emqx_cm_locker,trans,3,
                                     [{file,"emqx_cm_locker.erl"},{line,46}]},
                     {emqx_channel,process_connect,2,
                                   [{file,"emqx_channel.erl"},{line,434}]},
                     {emqx_connection,with_channel,3,
                                      [{file,"emqx_connection.erl"},{line,553}]},
                     {emqx_connection,process_msg,3,
                                      [{file,"emqx_connection.erl"},{line,293}]}]}
rp(erlang:process_info(pid(0,19863,2), backtrace)).
{backtrace,<<"Program counter: 0x00007fe26a53d860 (gen:do_call/4 + 3168)\n
CP: 0x0000000000000000 (invalid)\n\n
0x00007fde7c33fa28 Return addr 0x00007fe26a551908 (gen_server:call/3 + 128)\n
y(0)     infinity\n
y(1)     []\n
y(2)     #Ref<0.2499957634.4089446403.17060>\n
y(3)     <0.19863.2>\n\n
0x00007fde7c33fa50 Return addr 0x00007fe2681a03d0 (emqx_cm:takeover_session/2 + 312)\n
y(0)     infinity\n
y(1)     {takeover,begin}\n
y(2)     <0.19863.2>\n
y(3)     Catch 0x00007fe26a551908 (gen_server:call/3 + 128)\n\n
0x00007fde7c33fa78 Return addr 0x00007fe2681a28192 (emqx_cm:'-open_session/3-fun-1-'/5 + 96)\n
y(0)     emqx_connection\n
y(1)     <0.19863.2>\n\n
0x00007fde7c33fa90 Return addr 0x00007fe26819a978 (emqx_cm_locker:trans/3 + 360)\n
y(0)     []\n
y(1)     []\n
y(2)     <0.19863.2>\n
y(3)     <<\"192.168.30.16_1701152524553\">>\n
y(4)     #{clean_start=>false,clientid=><<\"192.168.30.16_1701152524553\">>,conn_mod=>emqx_connection,conn_props=>#{},connected_at=>1701153199092,disconnected_at=>1701153199091,expiry_interval=>71680,keepalive=>60,peercert=>nossl,peername=>{{192,168,30,43},57440},proto_name=><<\"MQTT\">>,proto_ver=>4,receive_maximum=>32,sockname=>{{192,168,30,16},1883},socktype=>tcp,username=><<\"admin\">>}\n
y(5)     #{anonymous=>true,auth_result=>success,clientid=><<\"192.168.30.16_1701152524553\">>,is_bridge=>false,is_superuser=>false,mountpoint=>undefined,peerhost=>{192,168,30,43},protocol=>mqtt,sockport=>1883,username=><<\"admin\">>,zone=>external}\n\n
0x00007fde7c33fac8 Return addr 0x00007fe26818192b8 (emqx_channel:process_connect/2 + 168)\n
y(0)     []\n
y(1)     []\n
y(2)     <<\"192.168.30.16_1701152524553\">>\n
y(3)     Catch 0x00007fe26819a9b8 (emqx_cm_locker:trans/3 + 424)\n\n
0x00007fde7c33faf0 Return addr 0x00007fe268197608 (emqx_connection:with_channel/3 + 232)\n
y(0)     []\n
y(1)     []\n
y(2)     {channel,#{clean_start=>false,clientid=><<\"192.168.30.16_1701152524553\">>,conn_mod=>emqx_connection,conn_props=>#{},connected_at=>1701153199092,disconnected_at=>1701153199091,expiry_interval=>71680,keepalive=>60,peercert=>nossl,peername=>{{192,168,30,43},57440},proto_name=><<\"MQTT\">>,proto_ver=>4,receive_maximum=>32,sockname=>{{192,168,30,16},1883},socktype=>tcp,username=><<\"admin\">>},#{anonymous=>true,auth_result=>success,clientid=><<\"192.168.30.16_1701152524553\">>,is_bridge=>false,is_superuser=>false,mountpoint=>undefined,peerhost=>{192,168,30,43},protocol=>mqtt,sockport=>1883,username=><<\"admin\">>,zone=>external},{session,#{<<\"product/direct/data/#\">>=>#{nl=>0,qos=>2,rap=>0,rh=>0,sub_props=>#{}},<<\"product/direct/info/#\">>=>#{nl=>0,qos=>2,rap=>0,rh=>0,sub_props=>#{}},<<\"product/direct/status/#\">>=>#{nl=>0,qos=>2,rap=>0,rh=>0,sub_props=>#{}},<<\"product/gateway/data/#\">>=>#{nl=>0,qos=>2,rap=>0,rh=>0,sub_props=>#{}},<<\"product/gateway/status/#\">>=>#{nl=>0,qos=>2,rap=>0,rh=>0,sub_props=>#{}},<<\"product/gw/info/#\">>=>#{nl=>0,qos=>2,rap=>0,rh=>0,sub_props=>#{}},<<\"product/gw_chlidev/data/#\">>=>#{nl=>0,qos=>2,rap=>0,rh=>0,sub_props=>#{}},<<\"product/gw_chlidev/status/#\">>=>#{nl=>0,qos=>2,rap=>0,rh=>0,sub_props=>#{}},<<\"product/order_return/#\">>=>#{nl=>0,qos=>2,rap=>0,rh=>0,sub_props=>#{}}},0,false,{inflight,32,{0,nil}},{mqueue,true,1916800,0,0,none,infinity,{queue,[],[],0}},1,30000,#{},19168,300000,1701152524859},{keepalive,45000,0,0},undefined,#{inbound=>#{},outbound=>#{}},undefined,#{},undefined,#{alive_timer=>#Ref<0.2499957634.4089446403.15885>,expire_timer=>#Ref<0.2499957634.4089446403.17052>},connected,false,false,[]}\ny(3)     #{}\n\n0x00007fde7c33fb18 Return addr 0x00007fe2681948b0 (emqx_connection:process_msg/3 + 136)\n
y(0)     []\n
y(1)     {state,esockd_transport,#Port<0.17354>,{{192,168,30,43},57440},{{192,168,30,16},1883},closed,19168,undefined,undefined,{none,#{max_size=>19248576,strict_mode=>false,version=>4}},#Fun<emqx_frame.4.19219284926>,{channel,#{clean_start=>false,clientid=><<\"192.168.30.16_1701152524553\">>,conn_mod=>emqx_connection,conn_props=>#{},connected_at=>1701153199074,disconnected_at=>1701153199091,expiry_interval=>71680,keepalive=>60,peercert=>nossl,peername=>{{192,168,30,43},57440},proto_name=><<\"MQTT\">>,proto_ver=>4,receive_maximum=>32,sockname=>{{192,168,30,16},1883},socktype=>tcp,username=><<\"admin\">>},#{anonymous=>true,auth_result=>success,clientid=><<\"192.168.30.16_1701152524553\">>,is_bridge=>false,is_superuser=>false,mountpoint=>undefined,peerhost=>{192,168,30,43},protocol=>mqtt,sockport=>1883,username=><<\"admin\">>,zone=>external},{session,#{<<\"product/direct/data/#\">>=>#{nl=>0,qos=>2,rap=>0,rh=>0,sub_props=>#{}},<<\"product/direct/info/#\">>=>#{nl=>0,qos=>2,rap=>0,rh=>0,sub_props=>#{}},<<\"product/direct/status/#\">>=>#{nl=>0,qos=>2,rap=>0,rh=>0,sub_props=>#{}},<<\"product/gateway/data/#\">>=>#{nl=>0,qos=>2,rap=>0,rh=>0,sub_props=>#{}},<<\"product/gateway/status/#\">>=>#{nl=>0,qos=>2,rap=>0,rh=>0,sub_props=>#{}},<<\"product/gw/info/#\">>=>#{nl=>0,qos=>2,rap=>0,rh=>0,sub_props=>#{}},<<\"product/gw_chlidev/data/#\">>=>#{nl=>0,qos=>2,rap=>0,rh=>0,sub_props=>#{}},<<\"product/gw_chlidev/status/#\">>=>#{nl=>0,qos=>2,rap=>0,rh=>0,sub_props=>#{}},<<\"product/order_return/#\">>=>#{nl=>0,qos=>2,rap=>0,rh=>0,sub_props=>#{}}},0,false,{inflight,32,{0,nil}},{mqueue,true,1916800,0,0,none,infinity,{queue,[],[],0}},1,30000,#{},19168,300000,1701152524859},{keepalive,45000,0,0},undefined,#{inbound=>#{},outbound=>#{}},undefined,#{},undefined,#{alive_timer=>#Ref<0.2499957634.4089446403.15885>,expire_timer=>#Ref<0.2499957634.4089446403.17052>},disconnected,false,false,[]},{gc_state,#{cnt=>{16000,16000},oct=>{16777216,16777216}}},#Ref<0.2499957634.4089708546.12231>,15000,undefined}\n
y(2)     []\n\n0x00007fde7c33fb38 Return addr 0x00007fe26a4ddee8 (proc_lib:wake_up/3 + 1168)\n
y(0)     []\n
y(1)     {state,esockd_transport,#Port<0.17354>,{{192,168,30,43},57440},{{192,168,30,16},1883},closed,19168,undefined,undefined,{none,#{max_size=>19248576,strict_mode=>false,version=>4}},#Fun<emqx_frame.4.19219284926>,{channel,#{clean_start=>false,clientid=><<\"192.168.30.16_1701152524553\">>,conn_mod=>emqx_connection,conn_props=>#{},connected_at=>1701153199074,disconnected_at=>1701153199091,expiry_interval=>71680,keepalive=>60,peercert=>nossl,peername=>{{192,168,30,43},57440},proto_name=><<\"MQTT\">>,proto_ver=>4,receive_maximum=>32,sockname=>{{192,168,30,16},1883},socktype=>tcp,username=><<\"admin\">>},#{anonymous=>true,auth_result=>success,clientid=><<\"192.168.30.16_1701152524553\">>,is_bridge=>false,is_superuser=>false,mountpoint=>undefined,peerhost=>{192,168,30,43},protocol=>mqtt,sockport=>1883,username=><<\"admin\">>,zone=>external},{session,#{<<\"product/direct/data/#\">>=>#{nl=>0,qos=>2,rap=>0,rh=>0,sub_props=>#{}},<<\"product/direct/info/#\">>=>#{nl=>0,qos=>2,rap=>0,rh=>0,sub_props=>#{}},<<\"product/direct/status/#\">>=>#{nl=>0,qos=>2,rap=>0,rh=>0,sub_props=>#{}},<<\"product/gateway/data/#\">>=>#{nl=>0,qos=>2,rap=>0,rh=>0,sub_props=>#{}},<<\"product/gateway/status/#\">>=>#{nl=>0,qos=>2,rap=>0,rh=>0,sub_props=>#{}},<<\"product/gw/info/#\">>=>#{nl=>0,qos=>2,rap=>0,rh=>0,sub_props=>#{}},<<\"product/gw_chlidev/data/#\">>=>#{nl=>0,qos=>2,rap=>0,rh=>0,sub_props=>#{}},<<\"product/gw_chlidev/status/#\">>=>#{nl=>0,qos=>2,rap=>0,rh=>0,sub_props=>#{}},<<\"product/order_return/#\">>=>#{nl=>0,qos=>2,rap=>0,rh=>0,sub_props=>#{}}},0,false,{inflight,32,{0,nil}},{mqueue,true,1916800,0,0,none,infinity,{queue,[],[],0}},1,30000,#{},19168,300000,1701152524859},{keepalive,45000,0,0},undefined,#{inbound=>#{},outbound=>#{}},undefined,#{},undefined,#{alive_timer=>#Ref<0.2499957634.4089446403.15885>,expire_timer=>#Ref<0.2499957634.4089446403.17052>},disconnected,false,false,[]},{gc_state,#{cnt=>{16000,16000},oct=>{16777216,16777216}}},#Ref<0.2499957634.4089708546.12231>,15000,undefined}\n
y(2)     <0.1652.0>\n
y(3)     Catch 0x00007fe2681948b0 (emqx_connection:process_msg/3 + 136)\n\n0x00007fde7c33fb60 Return addr 0x000055a7bdfc8248 (<terminate process normally>)\n
y(0)     []\n
y(1)     []\n
y(2)     Catch 0x00007fe26a4ddf08 (proc_lib:wake_up/3 + 152)\n">>}

Is there anything wrong?

<8867.29600.2>,<8867.19863.2>
These two processes seem to be stuck with too many messages.
They keep waiting for taken over session and don’t have a chance to check the OOM watermark you set, so the system directly OOM. How many devices are like this? You can try to manually kill these two processes and see if they get stuck again after reconnecting.
erlang:exit(<8867.29600.2>,kill).

No more than 100 devices with 5 seconds one message per device. The memory goes down normal after i killed the two processes.
So What can i do, that the two processes can have a chance to check the OOM watermark and make sure the broker don’t go down when some one using it ?
Thank u very much !

There’s no other better way to do it on your version. You can only check and then manually kill those processes.

Upgrade to the latest version 4.4.23 or 5.5.0 ASAP.
If you have a limited number of devices, it should be straightforward to upgrade. The new version has resolved similar stuck bugs.

where is the memory info in the dashboard of the new version of 5.4.1,i cann’t find it?
Thank u very much for the past replies.


This is the memory of the os, not the memory of Erlang Node.