GCP JWT timeout without internet causes device reboot

I’m trying to get better performance in my project when my devices lose WiFi connection.
I’ve not yet been able to capture a core dump, but I’m seeing some connection issues int the logs after network drops out. I’m wondering if there is anything I should be doing to monitor MQTT channels and do something when WiFi disconnects.

mgos_rpc_channel_ws:263 0x3ffc6cc0 Connecting to wss://mdash.net/api/v2/rpc, SSL? 1
mongoose.c:12068        Failed to resolve 'mdash.net', server
mgos_rpc_channel_ws:205 0x3ffc6cc0 TCP connect failed: -1
main.c:1901             Cloud disconnected (1)


mgos_mqtt_conn.c:471    MQTT0 connecting to mqtt.2030.ltsapis.goog:8883
mongoose.c:12068        Failed to resolve 'mqtt.2030.ltsapis.goog', server
mgos_mqtt_conn.c:230    MQTT0 TCP connect error (-1)
mgos_mqtt_conn.c:256    MQTT0 Disconnect
mgos_mqtt_conn.c:549    MQTT0 connecting after 63067 ms

The device usually stays up for about 30-60 minutes before crashing, but because my devices are setup in the field I don’t have easy access to core dumps, unless there is another way other than UART.

Any ideas or help would be much appreciated.

The shelly-homekit application saves the core dump in the inactive app slot. This is for ESP8266 only.

Damn, I’m using ESP32.

I’ve been able to set up UDP logging on a device on my LAN and the last message received before reboot was:

mgos_gcp.c:86           Dropping MQTT connection due to imminent token expiration

I’m wondering if there is something in mgos_disconnect that is called from the GCP lib when JWT expires.

What I find most strange is that the JWT timeout is set to 3600 seconds, but the device was up for about 40 minutes…

The JWT token has an expiration time and AFAIK (may have changed) it is set to one hour. The JWT is used for key derivation and so the connection is restarted. Don’t know if this can be avoided by some hocus pocus, AFAIK it is like that. It shouldn’t reboot, though.
Your logs suggest the DNS is not reachable (if WiFi is operational and you got an IP address)

Spot on - this is what I’m seeing. I’ve watched the logs (UDP) a few times now and seen that during the JWT expiration with no network connection the device core dumps.

So the makeshift setup I have right now is a device connected to a mobile hotspot, and I’m also connected to that hotspot with a laptop and viewing the logs via UDP. I turn off the mobile phone’s mobile data connection to that I can still see the logs (UDP) but the device is not able to reach the internet.

I guess I’m going to have to set up a new device and connect to UART to view the core dump, not sure how else I can see what is actually causing the core dump

I’ve turned off GCP and done the same set of tests as before. My device has been up without internet for hours with no issues - I’ve renamed the topic since I’m sure it’s GCP JWT timeout function that’s causing the reboot.

I guess there is something in this call that is causing my issue…

Try to get the smallest app possible that reproduces your issue in case it has to be raised to the developers.
So, your claim is that the disconnect function is causing a reset when called in an already disconnected state, is that so ? A core dump will be extremely useful.

Yep that’s exactly what I’m seeing, but agreed I’ll have to get a core dump to see where it’s actually falling over. I’ll have to wait for another dev board to arrive :disappointed_relieved:

Finally captured a core dump. Not much more info than I already knew, but its at least confirmed.

Loaded core dump from last snippet in  /core
mgos_disconnect (c=0x0) at /data/tmp/mos_prebuild/tmp/cesanta/mos-libs/mongoose/src/mgos_mongoose.c:144
144     /data/tmp/mos_prebuild/tmp/cesanta/mos-libs/mongoose/src/mgos_mongoose.c: No such file or directory.
#0  mgos_disconnect (c=0x0) at /data/tmp/mos_prebuild/tmp/cesanta/mos-libs/mongoose/src/mgos_mongoose.c:144

I’ve created two issues as I’m not sure at the actual cause, if it’s GCP library or the underlying mongoose library:

Good one, let’s see how it goes

apparently fixed in latest, let us know if the issue has been fixed for you


… but how do I set the mongoose core lib to master?

There are many commits in the libs and in the base system since 2.19.1, so I’d suggest to move to mos latest, without modifying your mos.yml, or keep your mos and modify mos.yml:

libs_version: latest
modules_version: latest
mongoose_os_version: latest

Upgrading mos tool to latest has bitten me in the past, luckily I didn’t brick any devices I had in the field.

Is there any way just to take the bare minimum, or do I need to accept it all? EG: I’d be happy to try mos core lib at latest but not all the other libs, I need stability.

That issue has been fixed.

With mos tool 2.19.1 an empty application builds ok using

libs_version: ${mos.version}
modules_version: latest
mongoose_os_version: latest

libs: # which need latest
  - location: https://github.com/mongoose-os-libs/mbedtls
    version: latest
  - location: https://github.com/mongoose-os-libs/mongoose
    version: latest
  - location: https://github.com/mongoose-os-libs/vfs-fs-spiffs
    version: latest

You might need to add other libraries with version: latest.

1 Like

Doesn’t work for my project, I get:

Sorry, I didn’t quite understand the part where you had “libs that require latest”.
It registered when I saw my error… it’s compiling for me now with the libs you’ve listed at latest.

I’ve been looking around for some docs or explanation on those mos.yml variables and how they interact with the mos tool version, but not having much luck.

I haven’t been able to piece it together, I’m hoping someone might be able to explain it.

libs_version - the default version of the libraries. It can be overridden by the version key for one or more libraries.
modules_version - the version of the modules. Modules are used by mjs, mbedtls,…
mongoose_os_version - the version of the mongoose-os repo

version can be ${mos.version} which means the version of the mos tool, or a tag, or a commit in the github repo.