MQTT reconnection on JS

#1
  1. My goal is:
    Be able to force a MQTT reconnection while on a JS firmware.

  2. My actions are:
    Even when MQTT auto-reconnect is enabled, I can detect detect situations where it does not work, usually after several days of uptime.

  3. The result I see is:
    When the anomaly occurs, " MQTT.isConnected(); " returns “false”. But Mongoose OS does not reconnect to the MQTT broker.
    On our broker logs, there is no attempt whatsoever from the device to reconnect.

  4. My expectation & question is:
    Either have the MQTT automatically detect this and reconnect, or to use a workaround.
    By the way, what is the proper syntax to call the pertinent reconnect C functions over JS / FFI (below)?

bool mgos_mqtt_global_connect(void);
void mgos_mqtt_global_disconnect(void);

I believe I could try these as a possible workaround.

Thank you!

#2

Well, that should not happen… I suggest you monitor heap usage and system logs and file a proper bug report if it is not your fault.

#3

I’ll gather more information for a bug report, but I can advance that we don’t see the heap decreasing over time. And there is nothing on the system logs (not even reconnection attempts).
Our firmware is not doing much else at the moment (just reading from the serial port occasionally).
And, we’re using certificate-based authentication to a MQTTS broker.

While I do that (and wait for the bug analysis), can you help with the FFI syntax for the void functions mentioned above? Since I can detect the issue, implementing a workaround in the mean time would be extremely helpful.

For JS, would it be something like this?
let reconnectMQTT = ffi(‘bool mgos_mqtt_global_connect(void)’);
let reconnectNOW = reconnectMQTT();

#4

Basically an FFI call is like the C declaration. As long as you correctly handle the parameter passing and return value, you’ll be fine.
Looks OK.

#5

Hi @scaprile

We’ve tried adding the code mentioned above to our init.js and built the firmware again.
But calling the reconnectMQTT() function has no effect whatsoever. MQTTS continues failing and rebooting is the only thing that works so far.

After building with 2.19.1, we’ve noticed the ESP32 CPU has a more available ram (around 160k at all times, which never drops), but MQTTS still stops working completely after around 2 days.
We’ve thought about memory fragmentation and considered calling “gc(full)” for triggering garbage-collection, but it is our understanding that this will happen automatically and it is not needed.

Again, the issue is detectable since " MQTT.isConnected(); " returns false each and every time. Unfortunately, auto reconnection never works and we’re trying to fix this without rebooting the whole device.

Anything else we can log or check in order to get to the bottom of this?

#6

There is no garbage collection in C. Whether or not mOS implements some form of it for mJS escapes me.
If MQTT.isConnected() returns true, then for mOS you are connected. If it thinks it is connected, it will not reconnect. You opened this thread stating that it returned false, not true. I don’t know if there is a specific function to force a disconnect, and I didn’t check the source code for that.
What do you see in your device logs, your server logs, your network analyzer captures (yes, you can sniff TLS if you have the keys and at least you can see TCP and check for a half-open connection). Did you try without TLS ? Is your network connection stable ?

#7

Oops, it was a mistake. MQTT.isConnected(); returns FALSE every time. Just as stated earlier.
I have since corrected this on my last post.

On our device logs, there is nothing showing. We’re on log level two.
But we do have that timer that checks MQTT.isConnected(); every 30mins and eventually it begins returning false and MQTT is no longer operating when that happens (no pubs and nothing coming from subscribed topics) . This is where a reconnect is expected, but never happens. We’re also printing ram_free to syslog and it stays constant around 160k.

We’ve checked our server logs and there is nothing coming from the device when the anomaly happens.
We’re able to log even the initial handshake, but not even that happens. It seems there is no attempt in reconnecting at all.

Testing without TLS is not a option for us, since the project requires certificate-based encryption and mutual authentication to be enabled. Internet is surely stable, this is actively monitored and we have a dedicated testing infrastructure for our prototypes.

We’re hoping someone from the development team can help us diagnose this issue, as is it the only thing (stable MQTT operation) preventing us from moving forward and releasing a new product.

#8

I don’t mean to lecture you on project development nor analysis methodology, I’ll limit myself to just say that I would go a different path, as mentioned.

Your mileage may vary, of course, though I don’t think you’ll catch their attention without some convincing data. Many people is already using mOS without issues.
Without repeating my former disclaimer again, you need to reduce your application to the bare minimum reproducing your issue and submit that along with all the info you may get.
If the state of .isConnected() is FALSE, there should be a log entry of the disconnection in the device (when it moved from TRUE to FALSE that was because a disconnection event was triggered due to some condition), quite likely another one at the server, and your network sniffing should show (at least) what is going on at the TCP level. Raise your device debug level to 3 or even 4, if you don’t see anything.

If you would like to have technical support, you might as well license Mongoose-OS and ask to be serviced. (Yet another disclaimer: I’m in no way related to Cesanta)

#9

If MQTT disconnects, you should see something like this in the console log.