弱网情况 tls可能挂起死锁

车端弱网。nanomq 0.24.6 (最新的0.25.1我编译了也是同样的问题) mqtts 桥接模式

tls hang在tcp_recv,但是你们塞了个connect进去,nng abort了tcp_send。然后nng超时是无穷大,然后就死锁了。

你们的10s超时对于这种情况没效果。

nni_aio_set_timeout(p->negoaio, 10000); // 10 sec timeout to negotiate  // L1231

nng源码(我加了点log看表现):

static void
tls_cancel(nni_aio *aio, void *arg, int rv)
{
	tls_conn *conn = arg;
	nni_mtx_lock(&conn->lock);
	if (aio == nni_list_first(&conn->recv_queue)) {
		log_warn("tls_cancel: recv head, conn=%p aio=%p rv=%d(%s), "
		    "abort tcp_recv", conn, aio, rv, nng_strerror(rv));
		nni_aio_abort(&conn->tcp_recv, rv);
	} else if (aio == nni_list_first(&conn->send_queue)) {
		log_warn("tls_cancel: send head, conn=%p aio=%p rv=%d(%s), "
		    "abort tcp_send", conn, aio, rv, nng_strerror(rv));
		nni_aio_abort(&conn->tcp_send, rv);
	} else if (nni_aio_list_active(aio)) {
		log_warn("tls_cancel: aio active (not head), conn=%p aio=%p "
		    "rv=%d(%s), remove+finish", conn, aio, rv, nng_strerror(rv));
		nni_aio_list_remove(aio);
		nni_aio_finish_error(aio, rv);
	} else {
		log_warn("tls_cancel: no branch matched, conn=%p aio=%p rv=%d(%s)",
		    conn, aio, rv, nng_strerror(rv));
	}
	nni_mtx_unlock(&conn->lock);
}

复现:你们起一个python的服务端,然后 tls 去桥接这个黑洞服务端。

# blackhole.py —— 接受 TCP,读走数据但永不回任何字节
import socket
s = socket.socket()
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.bind(('0.0.0.0', 58883))
s.listen(16)
print('listening on 58883 (black hole)')
conns = []
while True:
    c, a = s.accept()
    print('accepted', a)          # 此时客户端 TCP 已 ESTABLISHED
    conns.append(c)               # 持有连接、不关闭、不回写 -> 永远等不到 ServerHello
    try:
        c.recv(65536)             # 读走 ClientHello,让客户端 Send-Q=0(更贴近现场)
    except Exception:
        pass

理论上是能得到底下这段日志

026-06-17 02:52:09 [17984] WARN  /workspace/nanomq2/nanomq/apps/broker.c:1321 broker: NanoMQ (ver 0.25.1) Serving HTTP Server on http://(null):8081
NanoMQ Broker is started successfully!
2026-06-17 02:52:19 [18009] WARN  /workspace/nanomq2/nng/src/supplemental/tls/tls_common.c:678 tls_cancel: tls_cancel: send head, conn=0xffffb623a570 aio=0xffffb61461a0 rv=5(Timed out), abort tcp_send


^C2026-06-17 02:53:35 [17984] ERROR /workspace/nanomq2/nanomq/apps/broker.c:117 sig_handler: signal signumber: 2 received!

要么你们找nng改。。要么你们nanomq加个看门狗机制,连不上超时就强拆。。。