5.0.15版本emqx自动停止,无法重启

错误报告

emqx运行了几天后,在夜间突然停止,日志如下:
2023-02-04T04:05:57.403419+08:00 [error] Generic server disksup terminating. Reason: {badarg,[{erlang,port_close,[#Port<0.6116>],[{error_info,#{module => erl_erts_errors}}]},{disksup,terminate,2,[{file,“disksup.erl”},{line,169}]},{gen_server,try_terminate,3,[{file,“gen_server.erl”},{line,733}]},{gen_server,terminate,10,[{file,“gen_server.erl”},{line,918}]},{proc_lib,init_p_do_apply,3,[{file,“proc_lib.erl”},{line,226}]}]}. Last message: timeout. State: [{data,[{“OS”,{unix,linux}},{“Timeout”,1800000},{“Threshold”,80},{“DiskData”,[]}]}].
2023-02-04T04:05:57.403741+08:00 [error] crasher: initial call: disksup:init/1, pid: <0.4926.7>, registered_name: disksup, error: {badarg,[{erlang,port_close,[#Port<0.6116>],[{error_info,#{module => erl_erts_errors}}]},{disksup,terminate,2,[{file,“disksup.erl”},{line,169}]},{gen_server,try_terminate,3,[{file,“gen_server.erl”},{line,733}]},{gen_server,terminate,10,[{file,“gen_server.erl”},{line,918}]},{proc_lib,init_p_do_apply,3,[{file,“proc_lib.erl”},{line,226}]}]}, ancestors: [os_mon_sup,<0.1798.0>], message_queue_len: 0, messages: [], links: [<0.1799.0>], dictionary: [], trap_exit: true, status: running, heap_size: 6772, stack_size: 29, reductions: 19283; neighbours:
2023-02-04T04:05:57.404120+08:00 [error] Supervisor: {local,os_mon_sup}. Context: child_terminated. Reason: {badarg,[{erlang,port_close,[#Port<0.6116>],[{error_info,#{module => erl_erts_errors}}]},{disksup,terminate,2,[{file,“disksup.erl”},{line,169}]},{gen_server,try_terminate,3,[{file,“gen_server.erl”},{line,733}]},{gen_server,terminate,10,[{file,“gen_server.erl”},{line,918}]},{proc_lib,init_p_do_apply,3,[{file,“proc_lib.erl”},{line,226}]}]}. Offender: id=disksup,pid=<0.4926.7>.
2023-02-04T04:05:57.404351+08:00 [error] Supervisor: {local,os_mon_sup}. Context: shutdown. Reason: reached_max_restart_intensity. Offender: id=disksup,pid=<0.4926.7>.
[os_mon] memory supervisor port (memsup): Erlang has closed
[os_mon] cpu supervisor port (cpu_sup): Erlang has closed
2023-02-04T04:05:57.410064+08:00 [notice] Application: os_mon. Exited: shutdown. Type: permanent.
2023-02-04T04:05:57.428204+08:00 [info] line: 312, mfa: emqx_exhook_mgr:terminate/2, msg: exhook_mgr_terminated, reason: shutdown, servers: #{}
Stop listener http:dashboard on :18083 successfully.
2023-02-04T04:05:57.520838+08:00 [notice] ssl:default stopped on 0.0.0.0:8884
Listener ssl:default on 0.0.0.0:8884 stopped.
2023-02-04T04:05:57.521799+08:00 [notice] tcp:default stopped on 0.0.0.0:9004
Listener tcp:default on 0.0.0.0:9004 stopped.
Listener ws:default on 0.0.0.0:8084 stopped.
Listener wss:default on 0.0.0.0:1884 stopped.
2023-02-04T04:05:57.590355+08:00 [notice] line: 49, mfa: mria_app:stop/1, msg: Mria is stopped
{“Kernel pid terminated”,application_controller,“{application_terminated,os_mon,shutdown}”}
Kernel pid terminated (application_controller) ({application_terminated,os_mon,shutdown})

Crash dump is being written to: log/erl_crash.dump…done

随后手动启动显示120秒超时启动失败。ps后可以看到emqx进程,不过没有监听mqtt端口,日志最后显示:

  • export PROGNAME
  • ARGS=console
  • ‘[’ no = no ‘]’
  • set – /usr/lib/emqx/erts-12.3.2.2/bin/erlexec -boot /usr/lib/emqx/releases/5.0.15/start -boot_var RELEASE_LIB /usr/lib/emqx/lib -boot_var ERTS_LIB_DIR /usr/lib/emqx/lib -mode embedded -config /var/lib/emqx/configs/app.2023.02.06.09.51.51.config -args_file /var/lib/emqx/configs/vm.2023.02.06.09.51.51.args -start_epmd false -epmd_module ekka_epmd -proto_dist ekka
  • logger -t ‘emqx[25982]’ ‘EXEC: /usr/lib/emqx/erts-12.3.2.2/bin/erlexec -boot /usr/lib/emqx/releases/5.0.15/start -boot_var RELEASE_LIB /usr/lib/emqx/lib -boot_var ERTS_LIB_DIR /usr/lib/emqx/lib -mode embedded -config /var/lib/emqx/configs/app.2023.02.06.09.51.51.config -args_file /var/lib/emqx/configs/vm.2023.02.06.09.51.51.args -start_epmd false -epmd_module ekka_epmd -proto_dist ekka – console -emqx_data_dir /var/lib/emqx’
  • exec /usr/lib/emqx/erts-12.3.2.2/bin/erlexec -boot /usr/lib/emqx/releases/5.0.15/start -boot_var RELEASE_LIB /usr/lib/emqx/lib -boot_var ERTS_LIB_DIR /usr/lib/emqx/lib -mode embedded -config /var/lib/emqx/configs/app.2023.02.06.09.51.51.config -args_file /var/lib/emqx/configs/vm.2023.02.06.09.51.51.args -start_epmd false -epmd_module ekka_epmd -proto_dist ekka – console -emqx_data_dir /var/lib/emqx
    Erlang/OTP 24 [erts-12.3.2.2] [emqx] [64-bit] [smp:16:16] [ds:16:16:8] [async-threads:4] [jit]

[os_mon] memory supervisor port (memsup): Erlang has closed
[os_mon] cpu supervisor port (cpu_sup): Erlang has closed

环境

  • EMQX 版本:5.0.15
  • 操作系统版本:Ubuntu 18.04.6 LTS

重现此问题的步骤

  1. 使用命令env DEBUG=1 /usr/bin/emqx start 最终提示启动超时失败。

系统、网络优化都做了,配置文件仅改了监听端口。

你看下是不是磁盘快满了?

Filesystem 1K-blocks Used Available Use% Mounted on
udev 16445344 0 16445344 0% /dev
tmpfs 3293796 4700 3289096 1% /run
/dev/vda1 515929508 115548464 379184068 24% /
tmpfs 16468976 0 16468976 0% /dev/shm
tmpfs 5120 0 5120 0% /run/lock
tmpfs 16468976 0 16468976 0% /sys/fs/cgroup
tmpfs 3293792 4 3293788 1% /run/user/0

磁盘有空间。

后来通过执行: service stop emqx.service 结束一次进程。
然后再启动就正常了。
我再跟踪一下吧。