加入cluster后,emqx服务起不来

环境信息

  • EMQX 版本:
    两台机器:
    a@serverIpA:4.3.8
    b@serverIpB:4.4.9

  • 操作系统及版本:
    centOS:
    serverA:7.6
    serverB:7.9

  • 其他

问题描述

在serverA 机器上执行 cluster join 后,a的emqx服务起不来了。
紧急处理在serverB机器上 cluster force-leave a@serverIpA。
可是A机器上的emqx服务还是起不来

配置文件及日志

2022-10-14T12:18:13.698505+08:00 [notice] alarm_handler: {set,{system_memory_high_watermark,[]}}
2022-10-14T12:18:13.704672+08:00 [info] event=server_setup_successfully driver=tcp socket="#Port<0.11>"
2022-10-14T12:18:13.704886+08:00 [info] event=start
2022-10-14T12:18:13.842564+08:00 [info] Ekka(Membership): Node pe06b@serverIpB up
2022-10-14T12:18:13.843797+08:00 [error] Supervisor: {local,emqx_router_sup}. Context: start_error. Reason: {{badmatch,{error,{not_active_local,emqx_routing_node}}},[{emqx_router_helper,init,1,[{file,“emqx_router_helper.erl”},{line,99}]},{gen_server,init_it,2,[{file,“gen_server.erl”},{line,417}]},{gen_server,init_it,6,[{file,“gen_server.erl”},{line,385}]},{proc_lib,init_p_do_apply,3,[{file,“proc_lib.erl”},{line,226}]}]}. Offender: id=helper,pid=undefined.
2022-10-14T12:18:13.844599+08:00 [error] Supervisor: {local,emqx_sup}. Context: start_error. Reason: {shutdown,{failed_to_start_child,helper,{{badmatch,{error,{not_active_local,emqx_routing_node}}},[{emqx_router_helper,init,1,[{file,“emqx_router_helper.erl”},{line,99}]},{gen_server,init_it,2,[{file,“gen_server.erl”},{line,417}]},{gen_server,init_it,6,[{file,“gen_server.erl”},{line,385}]},{proc_lib,init_p_do_apply,3,[{file,“proc_lib.erl”},{line,226}]}]}}}. Offender: id=emqx_router_sup,pid=undefined.
2022-10-14T12:18:13.845252+08:00 [error] crasher: initial call: application_master:init/4, pid: <0.1582.0>, registered_name: [], exit: {{bad_return,{{emqx_app,start,[normal,[]]},{‘EXIT’,{{badmatch,{error,{shutdown,{failed_to_start_child,emqx_router_sup,{shutdown,{failed_to_start_child,helper,{{badmatch,{error,{not_active_local,emqx_routing_node}}},[{emqx_router_helper,init,1,[{file,“emqx_router_helper.erl”},{line,99}]},{gen_server,init_it,2,[{file,“gen_server.erl”},{line,417}]},{gen_server,init_it,6,[{file,“gen_server.erl”},{line,385}]},{proc_lib,init_p_do_apply,3,[{file,“proc_lib.erl”},{line,226}]}]}}}}}}},[{emqx_app,start,2,[{file,“emqx_app.erl”},{line,43}]},{application_master,start_it_old,4,[{file,“application_master.erl”},{line,277}]}]}}}},[{application_master,init,4,[{file,“application_master.erl”},{line,138}]},{proc_lib,init_p_do_apply,3,[{file,“proc_lib.erl”},{line,226}]}]}, ancestors: [<0.1581.0>], message_queue_len: 1, messages: [{‘EXIT’,<0.1583.0>,normal}], links: [<0.1581.0>,<0.1433.0>], dictionary: [], trap_exit: true, status: running, heap_size: 987, stack_size: 28, reductions: 303; neighbours:
2022-10-14T12:18:13.846252+08:00 [notice] Application: emqx. Exited: {bad_return,{{emqx_app,start,[normal,[]]},{‘EXIT’,{{badmatch,{error,{shutdown,{failed_to_start_child,emqx_router_sup,{shutdown,{failed_to_start_child,helper,{{badmatch,{error,{not_active_local,emqx_routing_node}}},[{emqx_router_helper,init,1,[{file,“emqx_router_helper.erl”},{line,99}]},{gen_server,init_it,2,[{file,“gen_server.erl”},{line,417}]},{gen_server,init_it,6,[{file,“gen_server.erl”},{line,385}]},{proc_lib,init_p_do_apply,3,[{file,“proc_lib.erl”},{line,226}]}]}}}}}}},[{emqx_app,start,2,[{file,“emqx_app.erl”},{line,43}]},{application_master,start_it_old,4,[{file,“application_master.erl”},{line,277}]}]}}}}. Type: permanent.
2022-10-14T12:18:13.866331+08:00 [notice] alarm_handler: {clear,system_memory_high_watermark}

在 serverA 上
[emqxUser ~]# emqx_ctl cluster status
RPC to pe06@serverIpA failed: {‘EXIT’,
{badarg,
[{ets,match,
[emqx_command,{{’’,cluster},’$1’,’’}],
[]},
{emqx_ctl,lookup_command,1,
[{file,“emqx_ctl.erl”},{line,118}]},
{emqx_ctl,run_command,2,
[{file,“emqx_ctl.erl”},{line,103}]},
{emqx_ctl,run_command,1,[]}]}}

4.3 与4.4最好不要放在一个集群里面。

是4.3和4.4之间的问题,还是说,最好是相同版本的放在一个集群里

直接原因是:2个大版本之间的内置数据库表不一致。报错了。{error,{not_active_local,emqx_routing_node}

给的建议是集群内的节点保持同一个版本。

您好。Windows开源版的启动报错,这是什么问题。错误日志如下:
2022-10-31T17:31:40.254000+08:00 [error] event=failed_to_setup_server driver=tcp reason=“eacces”
2022-10-31T17:31:40.262000+08:00 [error] crasher: initial call: gen_rpc_server:init/1, pid: <0.159.0>, registered_name: [], exit: {eacces,[{gen_statem,init_result,8,[{file,“gen_statem.erl”},{line,842}]},{proc_lib,init_p_do_apply,3,[{file,“proc_lib.erl”},{line,226}]}]}, ancestors: [gen_rpc_sup,<0.156.0>], message_queue_len: 0, messages: [], links: [<0.157.0>], dictionary: [], trap_exit: false, status: running, heap_size: 376, stack_size: 29, reductions: 14821; neighbours:
2022-10-31T17:31:40.261000+08:00 [error] Supervisor: {local,gen_rpc_sup}. Context: start_error. Reason: eacces. Offender: id=gen_rpc_server_tcp,pid=undefined.
2022-10-31T17:31:40.262000+08:00 [error] crasher: initial call: application_master:init/4, pid: <0.155.0>, registered_name: [], exit: {{{shutdown,{failed_to_start_child,gen_rpc_server_tcp,eacces}},{gen_rpc_app,start,[normal,[]]}},[{application_master,init,4,[{file,“application_master.erl”},{line,142}]},{proc_lib,init_p_do_apply,3,[{file,“proc_lib.erl”},{line,226}]}]}, ancestors: [<0.154.0>], message_queue_len: 1, messages: [{‘EXIT’,<0.156.0>,normal}], links: [<0.154.0>,<0.44.0>], dictionary: [], trap_exit: true, status: running, heap_size: 610, stack_size: 29, reductions: 209; neighbours: