emqx 规则引擎调用api批量订阅问题

错误报告

环境

  • EMQX 版本:5.0.18、5.0.19
  • 操作系统版本:centos 7.6

重现此问题的步骤

使用规则引擎监听上线事件,webhook上线消息转发,通过api批量订阅topic,emqtt连接速度为500/s,在5.0.18、5.0.19版本调用时会有不少订阅失败的情况,webhook用的异步发送请求,使用同步发送的话失败会更多;相比4.4.x版本性能相差很多,4.4.x版本emqtt连接速度为1000/s,调用api批量订阅也不会出现订阅失败的情况

emqx 的部署方式是什么样的?集群还是单机?另外,是否方便提供一下机器的配置信息?

单机跟集群都试过,都有很多失败的,机器配置16c 32g centos 7.6

Hi @1070842701 感谢反馈。描述里面有不少不清晰的地方,所以问题是:

  1. 使用 emqtt 连接速度为 500/s 时,配置 Webhook 来异步转发客户端上线事件存在很多失败
  2. 在相同场景下,emqtt 连接后,使用 HTTP API 为这些客户端创建订阅关系,会很多订阅失败?

如果是这样,需要

  1. Webhook 转发失败的日志需要贴下
  2. 提供下 emqtt 的连接配置,发起最大的连接数、调用 HTTP API 的参数、返回报错的日志和HTTP Response结果

我是这么来判断的,emqtt去建立6W个连接,http服务端通过批量订阅订阅10个跟client相关的topic,期望值是60W topic


emqtt的连接
./emqtt_bench conn -u gtja -P eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbGllbnRJZCI6Imd0amFUZXN0MiIsImV4cCI6MTY3ODI3NzAxNCwiaXRhIjoxNjc4MjQxMDE0fQ.X_evG3qFAOgU7CENYfCDYaa1v7XEsrv_52dChguHRi27MJ3yV7KTz2Jz9Qoxnivus5gTRy4_nVubt8enF6MQDA -h 10.187.4.135 -i 2 -c 60000 -k 100

调用 HTTP API 有失败的日志么?

用同步方式有看到服务器有报错日志,异步调用没有看到

同步调用?你是指 同步或异步调用EMQX的添加订阅的接口么?

调用的参数方便贴下么?

同步异步调用是webhook中提供的

你好想问下这个问题有修复的具体时间吗?预计在哪个版本能修复呢

@rocky 这个问题有具体修复的版本吗?

您好,我们的工程师已经在看这个问题了,有进展将在此更新,谢谢。

好的,多谢

我们已经有修复了,等这个修复合并之后,就没有这个问题了。

I tried to setup something similar locally, and I got the following:

./emqtt_bench \ sub \ -h localhost \ -p 1883 \ -c 17000 \ -i 1 \ -k 100 \ -t some/topic

Usinc sync query mode in the current master (cb995e20330d89aa44270c925058318b5a9e128b + this ehttpc patch), I saw no hard timeouts, and the metrics indicate 185 retries out of the 17_000 connections. According to emqtt_bench output, the connection rate is a bit uneven, as expected when using sync mode, as there’ll be pushback/backpressure. But, again, all clients connected successfully.

example emqtt_bench output (sync)

1s sub total=914 rate=910.36/sec
1s connect_succ total=914 rate=910.36/sec
2s sub total=1867 rate=952.05/sec
2s connect_succ total=1867 rate=952.05/sec
3s sub total=2836 rate=969.00/sec
3s connect_succ total=2836 rate=969.00/sec
4s sub total=3790 rate=954.00/sec
4s connect_succ total=3790 rate=954.00/sec
5s sub total=4532 rate=742.00/sec
5s connect_succ total=4532 rate=742.00/sec
10s sub total=4826 rate=58.80/sec
10s connect_succ total=4827 rate=59.00/sec
11s sub total=7017 rate=2191.00/sec
11s connect_succ total=7019 rate=2192.00/sec
12s sub total=8917 rate=1900.00/sec
12s connect_succ total=8917 rate=1898.00/sec
13s sub total=10608 rate=1691.00/sec
13s connect_succ total=10609 rate=1692.00/sec
14s sub total=12789 rate=2181.00/sec
14s connect_succ total=12789 rate=2180.00/sec
15s sub total=13797 rate=1009.01/sec
15s connect_succ total=13797 rate=1009.01/sec
16s sub total=15148 rate=1349.65/sec
16s connect_succ total=15149 rate=1350.65/sec
17s sub total=16210 rate=1062.00/sec
17s connect_succ total=16211 rate=1062.00/sec
18s sub total=17000 rate=790.00/sec
18s connect_succ total=17000 rate=789.00/sec

Using async, the connection rate reported by emqtt_bench is much smoother, as expected. No timeouts either, and in this particular run I got only 98 retries out of the 17_000 connections.

example emqtt_bench output (async)

1s sub total=960 rate=956.18/sec
1s connect_succ total=960 rate=956.18/sec
2s sub total=1918 rate=957.04/sec
2s connect_succ total=1918 rate=957.04/sec
3s sub total=2853 rate=935.00/sec
3s connect_succ total=2853 rate=935.00/sec
4s sub total=3813 rate=960.00/sec
4s connect_succ total=3813 rate=960.00/sec
5s sub total=4771 rate=958.00/sec
5s connect_succ total=4771 rate=958.00/sec
6s sub total=5731 rate=960.00/sec
6s connect_succ total=5731 rate=960.00/sec
7s sub total=6691 rate=960.00/sec
7s connect_succ total=6691 rate=960.00/sec
8s sub total=7651 rate=960.00/sec
8s connect_succ total=7651 rate=960.00/sec
9s sub total=8587 rate=936.00/sec
9s connect_succ total=8587 rate=936.00/sec
10s sub total=9547 rate=960.00/sec
10s connect_succ total=9547 rate=960.00/sec
11s sub total=10507 rate=960.00/sec
11s connect_succ total=10507 rate=960.00/sec
12s sub total=11467 rate=960.00/sec
12s connect_succ total=11467 rate=960.00/sec
13s sub total=12427 rate=960.00/sec
13s connect_succ total=12427 rate=960.00/sec
14s sub total=13387 rate=960.00/sec
14s connect_succ total=13387 rate=960.00/sec
15s sub total=14347 rate=960.00/sec
15s connect_succ total=14347 rate=960.00/sec
16s sub total=15307 rate=960.00/sec
16s connect_succ total=15307 rate=960.00/sec
17s sub total=16266 rate=958.04/sec
17s connect_succ total=16266 rate=958.04/sec
18s sub total=17000 rate=734.73/sec
18s connect_succ total=17000 rate=734.73/sec

In both cases, the rule would be triggered by $events/client_connected and call an HTTP server that in turn subscribes the connecting clientid to 10 topics using /clients/:clientid/subscribe/bulk API.

:+1::+1::+1:期待

@rocky 你好,我看5.0.25这个版本合并了这个patch,但是我尝试了下在连接数多的时候还是会有问题,规则引擎配置的是异步,配置中的有些参数已经配的比较大了,连接数在两三万的时候订阅时是满足预期的,但当连接到6万时还是有问题





./emqtt_bench conn -u gtja -P eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbGllbnRJZCI6Imd0amFUZXN0MiIsImV4cCI6MTY4NDM2NDU1MiwiaWF0IjoxNjgwNzU0NTUyfQ.sDF3IrCOGk7CUtz4GqkV22pHgaBhImcOe4vk2hDd7HzMPQKB-1pvgJBd1cScO3J_5h70AufGxOB9ySyxLGMPEA -h 10.187.4.135 -i 0.1 -c 60000
Start with 8 workers, addrs pool size: 1 and req interval: 8 ms

1s connect_succ total=888 rate=884.46/sec
2s connect_succ total=1776 rate=888.89/sec
3s connect_succ total=2664 rate=887.11/sec
4s connect_succ total=3559 rate=895.00/sec
5s connect_succ total=4448 rate=889.00/sec
6s connect_succ total=5336 rate=888.00/sec
7s connect_succ total=6224 rate=888.00/sec
8s connect_succ total=7112 rate=888.00/sec
9s connect_succ total=8000 rate=888.00/sec
10s connect_succ total=8888 rate=888.89/sec
11s connect_succ total=9776 rate=888.00/sec
12s connect_succ total=10668 rate=891.11/sec
13s connect_succ total=11559 rate=891.00/sec
14s connect_succ total=12448 rate=889.00/sec
15s connect_succ total=13328 rate=880.00/sec
16s connect_succ total=14224 rate=896.00/sec
17s connect_succ total=15112 rate=888.00/sec
18s connect_succ total=16000 rate=888.00/sec
19s connect_succ total=16888 rate=888.00/sec
20s connect_succ total=17777 rate=889.00/sec
21s connect_succ total=18664 rate=887.00/sec
22s connect_succ total=19559 rate=895.00/sec
23s connect_succ total=20448 rate=889.00/sec
24s connect_succ total=21336 rate=888.00/sec
25s connect_succ total=22224 rate=888.00/sec
26s connect_succ total=23112 rate=888.89/sec
27s connect_succ total=24000 rate=887.11/sec
28s connect_succ total=24888 rate=888.00/sec
29s connect_succ total=25778 rate=890.00/sec
30s connect_succ total=26666 rate=888.00/sec
31s connect_succ total=27555 rate=889.00/sec
32s connect_succ total=28446 rate=891.00/sec
33s connect_succ total=29336 rate=890.00/sec
34s connect_succ total=30224 rate=888.00/sec
35s connect_succ total=31112 rate=888.00/sec
36s connect_succ total=32000 rate=888.00/sec
37s connect_succ total=32888 rate=888.89/sec
38s connect_succ total=33778 rate=889.11/sec
39s connect_succ total=34667 rate=889.00/sec
40s connect_succ total=35552 rate=885.00/sec
41s connect_succ total=36443 rate=891.00/sec
42s connect_succ total=37335 rate=892.00/sec
43s connect_succ total=38218 rate=883.00/sec
44s connect_succ total=39112 rate=894.00/sec
45s connect_succ total=40000 rate=888.00/sec
46s connect_succ total=40886 rate=886.00/sec
47s connect_succ total=41778 rate=892.00/sec
48s connect_succ total=42666 rate=888.00/sec
49s connect_succ total=43555 rate=889.00/sec
50s connect_succ total=44447 rate=892.00/sec
51s connect_succ total=45335 rate=888.00/sec
52s connect_succ total=46218 rate=883.00/sec
53s connect_succ total=47108 rate=890.00/sec
54s connect_succ total=48000 rate=892.00/sec
55s connect_succ total=48887 rate=887.00/sec
56s connect_succ total=49776 rate=889.00/sec
57s connect_succ total=50665 rate=889.89/sec
58s connect_succ total=51553 rate=887.11/sec
59s connect_succ total=52444 rate=891.00/sec
1m0s connect_succ total=53335 rate=891.00/sec
1m1s connect_succ total=54223 rate=888.00/sec
1m2s connect_succ total=55112 rate=889.00/sec
1m3s connect_succ total=56000 rate=888.00/sec
1m4s connect_succ total=56888 rate=888.00/sec
1m5s connect_succ total=57777 rate=889.00/sec
1m6s connect_succ total=58665 rate=888.00/sec
1m7s connect_succ total=59554 rate=889.00/sec
1m8s connect_succ total=60000 rate=446.00/sec

您说的问题是指 tps 往下掉,还是系统的日志表明出错了?

系统日志里面没看到有报错,比如我对每个连接订阅通过 subscribe/bulk api订阅10个topic,那么6万个连接期望的就是60W个topic,在连接数小的时候topic数跟连接是满足10倍关系的,在连接数大的时候就会有订阅不上的情况;或者说我的参数配置是不是哪里还有问题,还可以优化的

@rocky 通过emqtt建立60000个连接,连接速率在每500/s时可以达到预期,连接速率1000/s时就会存在订阅不上的,./emqtt_bench conn -u gtja -P eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbGllbnRJZCI6Imd0amFUZXN0MiIsImV4cCI6MTcyMDM3NTYwMiwiaWF0IjoxNjg0MzY1NjAyfQ.DG-9N41P7bROv5ciTV6dkrwEKBw0Y6P6elZvnBOwBajtOfswq1s_LuHzkLck2RKdZt6UHu6sYSTJNUoXHexeoQ -h 10.187.4.135 -i 2 -c 60000


hi,我们正在着手解决这个问题,最新信息也已经输出给正在调查此处问题的工程师。
有相关进展我们会及时进行同步。