EMQX 如何查看集群中的节点重启记录?

环境

  • EMQX 版本:8.6.0 open source
  • 操作系统版本:ubuntu

生产环境,4个节点;
观察到节点的 运行时间是8天之前,但8天前并未手动重启过节点,内存、cpu等监控指标也都连续稳定,但emqx 日志中未搜到相关的重启日志;

想问下,如何查看节点重启记录?或者有其他的指标辅助查看

在日志下面的erlang.log找找

/var/log/emqx 目录找不到这个日志文件 ,需要特殊配置开启么

你用什么方式启动的,不需要设置 一般都有的

apt 安装 ,sudo systemctl start emqx 方式启动 的

systemd的启动关停日志要看journal 日志

1 个赞

通过journal 看到重启日志了;

但重启的莫名奇妙,也没有理由;一下是两次重启的日志; 为什么会自己重启呢?

Jul 17 03:01:13 ip-10-212-75-57 bash[686660]: Listener http:dashboard on :18083 started.
Jul 17 03:01:14 ip-10-212-75-57 bash[686660]: EMQX 5.8.6 is running now!
Sep 23 06:54:17 ip-10-212-75-57 systemd[1]: Stopping emqx.service - emqx daemon...
Sep 23 06:54:17 ip-10-212-75-57 bash[686660]: Stop listener http:dashboard on :18083 successfully.
Sep 23 06:54:18 ip-10-212-75-57 bash[686660]: Listener tcp:default on 0.0.0.0:1883 stopped.
Sep 23 06:54:18 ip-10-212-75-57 bash[686660]: Listener ssl:default on 0.0.0.0:8883 stopped.
Sep 23 06:54:20 ip-10-212-75-57 bash[1076516]: ok
Sep 23 06:54:20 ip-10-212-75-57 emqx[1076756]: STOP: OK
Sep 23 06:54:20 ip-10-212-75-57 systemd[1]: emqx.service: Deactivated successfully.
Sep 23 06:54:20 ip-10-212-75-57 systemd[1]: Stopped emqx.service - emqx daemon.
Sep 23 06:54:20 ip-10-212-75-57 systemd[1]: emqx.service: Consumed 3w 5d 5h 32min 48.014s CPU time, 3.6G memory peak, 0B memory swap peak.
Sep 23 06:54:20 ip-10-212-75-57 systemd[1]: Started emqx.service - emqx daemon.
Sep 23 06:54:21 ip-10-212-75-57 emqx[1077057]: EXEC: /usr/lib/emqx/erts-14.2.5.2/bin/erlexec -enable-feature maybe_expr -noinput -noshell +Bd -boot /usr/lib/emqx/releases/5.8.6/start -boot_var RELEASE_LIB >
Sep 23 06:54:23 ip-10-212-75-57 bash[1076760]: Listener tcp:default on 0.0.0.0:1883 started.
Sep 23 06:54:23 ip-10-212-75-57 bash[1076760]: Listener ssl:default on 0.0.0.0:8883 started.
Sep 23 06:54:23 ip-10-212-75-57 bash[1076760]: Listener ws:default is NOT started due to: disabled.
Sep 23 06:54:23 ip-10-212-75-57 bash[1076760]: Listener wss:default is NOT started due to: disabled.
Sep 23 06:54:24 ip-10-212-75-57 bash[1076760]: Listener http:dashboard on :18083 started.
Sep 23 06:54:24 ip-10-212-75-57 bash[1076760]: EMQX 5.8.6 is running now!
Oct 01 06:03:30 ip-10-212-75-57 systemd[1]: Stopping emqx.service - emqx daemon...
Oct 01 06:03:30 ip-10-212-75-57 bash[1076760]: Stop listener http:dashboard on :18083 successfully.
Oct 01 06:03:31 ip-10-212-75-57 bash[1076760]: Listener tcp:default on 0.0.0.0:1883 stopped.
Oct 01 06:03:31 ip-10-212-75-57 bash[1076760]: Listener ssl:default on 0.0.0.0:8883 stopped.
Oct 01 06:03:33 ip-10-212-75-57 bash[1128008]: ok
Oct 01 06:03:33 ip-10-212-75-57 emqx[1128173]: STOP: OK
Oct 01 06:03:33 ip-10-212-75-57 systemd[1]: emqx.service: Deactivated successfully.
Oct 01 06:03:33 ip-10-212-75-57 systemd[1]: Stopped emqx.service - emqx daemon.
Oct 01 06:03:33 ip-10-212-75-57 systemd[1]: emqx.service: Consumed 4h 50min 13.319s CPU time, 301.5M memory peak, 0B memory swap peak.
Oct 01 06:03:33 ip-10-212-75-57 systemd[1]: Started emqx.service - emqx daemon.
Oct 01 06:03:34 ip-10-212-75-57 emqx[1128472]: EXEC: /usr/lib/emqx/erts-14.2.5.2/bin/erlexec -enable-feature maybe_expr -noinput -noshell +Bd -boot /usr/lib/emqx/releases/5.8.6/start -boot_var RELEASE_LIB >
Oct 01 06:03:36 ip-10-212-75-57 bash[1128177]: Listener tcp:default on 0.0.0.0:1883 started.
Oct 01 06:03:36 ip-10-212-75-57 bash[1128177]: Listener ssl:default on 0.0.0.0:8883 started.
Oct 01 06:03:36 ip-10-212-75-57 bash[1128177]: Listener ws:default is NOT started due to: disabled.
Oct 01 06:03:36 ip-10-212-75-57 bash[1128177]: Listener wss:default is NOT started due to: disabled.
Oct 01 06:03:36 ip-10-212-75-57 bash[1128177]: Listener http:dashboard on :18083 started.
Oct 01 06:03:36 ip-10-212-75-57 bash[1128177]: EMQX 5.8.6 is running now! 

从日志来看:

  • 服务停止和启动之间只有 3-13 秒的间隔
  • 显示 Stopping emqx.service 而不是崩溃信息
  • 消耗的资源统计正常显示

很可能是

  1. 手动执行 systemctl restart emqx
  2. 自动化脚本或配置管理工具(如 Ansible、Chef)定期重启服务
  3. 更新或维护操作

建议重点检查 cron 任务journalctl 的完整日志,看看是否有定期维护任务。

也可以看看内核的日志有没有错误提示 :

# 查看系统日志
journalctl --since "2024-10-01 06:00:00" --until "2024-10-01 06:10:00"

# 检查是否有 OOM killer 或其他系统事件
dmesg -T | grep -i emqx
1 个赞

根据您的建议,通过查看 journalctl 的完整日志 找到原因了, 系统自动更新服务 apt-daily-upgrade.service ,导致相关依赖服务也重启了;

:pray: :pray: :pray:感谢