emqx 基于k8s address_type = hostname 方式,无法自动集群,但通过address_type = ip 方式可以自动集群成功

通过k8s address_type = ip 方式 可以自动集群

emqx.conf 配置
cluster { discovery_strategy = k8s name = emqx-cluster k8s { apiserver = "https://kubernetes.default.svc:443" service_name = "emqx-headless" namespace = "emqx" address_type = "ip" } }

node拉起正常
kubectl logs -n emqx emqx-0 EMQX_RPC__PORT_DISCOVERY [rpc.port_discovery]: manual EMQX_NODE__NAME [node.name]: emqx@10.244.149.41 Listener tcp:default on 0.0.0.0:1883 started. Listener ssl:default on 0.0.0.0:8883 started. Listener ws:default on 0.0.0.0:8083 started. Listener wss:default on 0.0.0.0:8084 started. Listener http:dashboard on 0.0.0.0:18083 started. EMQX 5.8.6 is running now!

自动集群成功,且节点重启后可自动加入集群
emqx ctl cluster status Cluster status: #{running_nodes => ['emqx@10.244.149.41','emqx@10.244.172.226', 'emqx@10.244.198.40'], stopped_nodes => []}

以下是通过k8s address_type = hostname 方式 无法自动集群

emqx.conf 配置
cluster { discovery_strategy = k8s name = emqx-cluster k8s { apiserver = "https://kubernetes.default.svc:443" service_name = "emqx-headless" namespace = "emqx" address_type = "hostname" } }

环境配置
- name: EMQX_NODE__NAME value: "emqx@$(POD_NAME).$(SERVICE_NAME).$(POD_NAMESPACE).svc.cluster.local"

pod创建没问题,node.name正确
kubectl logs -n emqx emqx-2 EMQX_RPC__PORT_DISCOVERY [rpc.port_discovery]: manual EMQX_NODE__NAME [node.name]: emqx@emqx-2.emqx-headless.emqx.svc.cluster.local Listener tcp:default on 0.0.0.0:1883 started. Listener ssl:default on 0.0.0.0:8883 started. Listener ws:default on 0.0.0.0:8083 started. Listener wss:default on 0.0.0.0:8084 started.

基于配置的yaml创建后 ,无法完成自动集群
emqx ctl cluster status Cluster status: #{running_nodes => ['emqx@emqx-1.emqx-headless.emqx.svc.cluster.local'], stopped_nodes => []}

必须要手动加入节点完成集群,且节点宕机重启后,无法自动接入
emqx ctl cluster join emqx@emqx-0.emqx-headless.emqx.svc.cluster.local Join the cluster successfully. Cluster status: #{running_nodes => ['emqx@emqx-0.emqx-headless.emqx.svc.cluster.local', 'emqx@emqx-1.emqx-headless.emqx.svc.cluster.local'], stopped_nodes => []}

其中rbac检查通过,k8s网络插件为calico
kubectl auth can-i get endpoints --as=system:serviceaccount:emqx:emqx-sa -n emqx yes kubectl auth can-i list pods --as=system:serviceaccount:emqx:emqx-sa -n emqx yes

想请教下,这两种方式除以上的address_type和环境变量EMQX_NODE__NAME 配置差异外,还需要哪些配置,如何正确配置 hostname 方式集群

address_type = hostname 时,EMQX 不是直接用 Pod IP,而是从 Endpoints 里取 Pod hostname,再拼成 PodName.serviceName.namespace.suffix,这个结果必须和 node.name@ 后面的 host 完全一致。
你现在的节点名是这种形式:

emqx@emqx-2.emqx-headless.emqx.svc.cluster.local

那 k8s 发现配置也要拼出同样的 FQDN:

cluster {
  name = emqx-cluster
  discovery_strategy = k8s
  k8s {
    apiserver = "https://kubernetes.default.svc:443"
    service_name = "emqx-headless"
    namespace = "emqx"
    address_type = "hostname"
    suffix = "svc.cluster.local"
  }
}

如果用环境变量,对应是:

- name: EMQX_CLUSTER__K8S__SERVICE_NAME
  value: "emqx-headless"
- name: EMQX_CLUSTER__K8S__NAMESPACE
  value: "emqx"
- name: EMQX_CLUSTER__K8S__ADDRESS_TYPE
  value: "hostname"
- name: EMQX_CLUSTER__K8S__SUFFIX
  value: "svc.cluster.local"

同时确认 StatefulSet 和 Headless Service 是一组:

# StatefulSet
spec:
  serviceName: emqx-headless
---
# Service
spec:
  clusterIP: None
  publishNotReadyAddresses: true

先在 Pod 里验证 DNS 和 Endpoints:

kubectl get endpoints -n emqx emqx-headless -o jsonpath='{range .subsets[*].addresses[*]}{.hostname}{" "}{.ip}{"
"}{end}'
kubectl exec -n emqx emqx-0 -- getent hosts emqx-1.emqx-headless.emqx.svc.cluster.local

如果第一条里的 .hostname 是空的,hostname 模式拿不到 Pod 名,先检查 Service selector 是否选中了 StatefulSet 的 Pod,以及 Pod 是否真的是 StatefulSet 管出来的。ip 能工作是因为它直接用 Endpoints 里的 IP,不依赖这套 hostname/FQDN 拼接。