long_schedule warning的警告如何排查,会有什么影响?

环境信息

  • EMQX 版本:4.4.3
  • 操作系统及版本:centos7
  • broker 2核 4G
    两台集群

问题描述

经常有long_schedule warning的警告,是否有消息因此丢失无法到达客户端处理?如何排查原因?emqx的配置是否需要调整?

配置文件及日志

2022-07-26T01:00:16.958372+08:00 [warning] [SYSMON] long_schedule warning: pid = <0.1770.0>, info: [{timeout,3837}, {in,{gen_server,loop,7}}, {out,{gen_server,loop,7}}], [{proc_lib_initial_call,{ekka_node_monitor,init,[‘Argument__1’]}},{memory,12164},{total_heap_size,1399},{heap_size,987},{stack_size,12},{min_heap_size,233},{initial_call,{proc_lib,init_p,5}},{current_stacktrace,[{gen_server,loop,7,[{file,“gen_server.erl”},{line,443}]},{proc_lib,init_p_do_apply,3,[{file,“proc_lib.erl”},{line,226}]}]},{registered_name,ekka_node_monitor},{status,waiting},{message_queue_len,0},{group_leader,<0.1764.0>},{priority,normal},{trap_exit,true},{reductions,1068756},{last_calls,false},{catchlevel,1},{trace,0},{suspending,[]},{sequential_trace_token,[]},{error_handler,error_handler}]
2022-07-26T01:00:17.378737+08:00 [warning] [SYSMON] long_schedule warning: pid = <0.1903.0>, info: [{timeout,384}, {in,undefined}, {out,undefined}], [{proc_lib_initial_call,{emqx_retainer,init,[‘Argument__1’]}},{memory,1264},{total_heap_size,49},{heap_size,49},{stack_size,1},{min_heap_size,233},{initial_call,{proc_lib,init_p,5}},{current_stacktrace,[]},{registered_name,emqx_retainer},{status,waiting},{message_queue_len,0},{group_leader,<0.1900.0>},{priority,normal},{trap_exit,false},{reductions,1677672},{last_calls,false},{catchlevel,0},{trace,0},{suspending,[]},{sequential_trace_token,[]},{error_handler,error_handler}]
2022-07-26T01:00:17.379083+08:00 [warning] [SYSMON] long_schedule warning: pid = <0.1609.0>, info: [{timeout,1911}, {in,{gen,do_call,4}}, {out,{gen,do_call,4}}], [{proc_lib_initial_call,{lc_flag_man,init,[‘Argument__1’]}},{memory,18896},{total_heap_size,2240},{heap_size,1598},{stack_size,8},{min_heap_size,233},{initial_call,{proc_lib,init_p,5}},{current_stacktrace,[{timer,sleep,1,[{file,“timer.erl”},{line,152}]},{lc_flag_man,flag_man_loop,1,[{file,“lc_flag_man.erl”},{line,71}]},{proc_lib,init_p_do_apply,3,[{file,“proc_lib.erl”},{line,226}]}]},{registered_name,[]},{status,waiting},{message_queue_len,0},{group_leader,<0.1605.0>},{priority,max},{trap_exit,false},{reductions,46001184},{last_calls,false},{catchlevel,1},{trace,0},{suspending,[]},{sequential_trace_token,[]},{error_handler,error_handler}]

不会导致消息无法抵达客户端。

这个 long_schedule 的告警是表示进程的调度时间太长了,升下 cpu 就好了