I’m trying to get better performance in my project when my devices lose WiFi connection.
I’ve not yet been able to capture a core dump, but I’m seeing some connection issues int the logs after network drops out. I’m wondering if there is anything I should be doing to monitor MQTT channels and do something when WiFi disconnects.
mgos_rpc_channel_ws:263 0x3ffc6cc0 Connecting to wss://mdash.net/api/v2/rpc, SSL? 1
mongoose.c:12068 Failed to resolve 'mdash.net', server 192.168.67.177
mgos_rpc_channel_ws:205 0x3ffc6cc0 TCP connect failed: -1
main.c:1901 Cloud disconnected (1)
and
mgos_mqtt_conn.c:471 MQTT0 connecting to mqtt.2030.ltsapis.goog:8883
mongoose.c:12068 Failed to resolve 'mqtt.2030.ltsapis.goog', server 192.168.67.177
mgos_mqtt_conn.c:230 MQTT0 TCP connect error (-1)
mgos_mqtt_conn.c:256 MQTT0 Disconnect
mgos_mqtt_conn.c:549 MQTT0 connecting after 63067 ms
The device usually stays up for about 30-60 minutes before crashing, but because my devices are setup in the field I don’t have easy access to core dumps, unless there is another way other than UART.
The JWT token has an expiration time and AFAIK (may have changed) it is set to one hour. The JWT is used for key derivation and so the connection is restarted. Don’t know if this can be avoided by some hocus pocus, AFAIK it is like that. It shouldn’t reboot, though.
Your logs suggest the DNS is not reachable (if WiFi is operational and you got an IP address)
Spot on - this is what I’m seeing. I’ve watched the logs (UDP) a few times now and seen that during the JWT expiration with no network connection the device core dumps.
So the makeshift setup I have right now is a device connected to a mobile hotspot, and I’m also connected to that hotspot with a laptop and viewing the logs via UDP. I turn off the mobile phone’s mobile data connection to that I can still see the logs (UDP) but the device is not able to reach the internet.
I guess I’m going to have to set up a new device and connect to UART to view the core dump, not sure how else I can see what is actually causing the core dump
I’ve turned off GCP and done the same set of tests as before. My device has been up without internet for hours with no issues - I’ve renamed the topic since I’m sure it’s GCP JWT timeout function that’s causing the reboot.
I guess there is something in this call that is causing my issue… mgos_disconnect(mgos_mqtt_get_global_conn());
Try to get the smallest app possible that reproduces your issue in case it has to be raised to the developers.
So, your claim is that the disconnect function is causing a reset when called in an already disconnected state, is that so ? A core dump will be extremely useful.
Yep that’s exactly what I’m seeing, but agreed I’ll have to get a core dump to see where it’s actually falling over. I’ll have to wait for another dev board to arrive
Finally captured a core dump. Not much more info than I already knew, but its at least confirmed.
Loaded core dump from last snippet in /core
mgos_disconnect (c=0x0) at /data/tmp/mos_prebuild/tmp/cesanta/mos-libs/mongoose/src/mgos_mongoose.c:144
144 /data/tmp/mos_prebuild/tmp/cesanta/mos-libs/mongoose/src/mgos_mongoose.c: No such file or directory.
#0 mgos_disconnect (c=0x0) at /data/tmp/mos_prebuild/tmp/cesanta/mos-libs/mongoose/src/mgos_mongoose.c:144
I’ve created two issues as I’m not sure at the actual cause, if it’s GCP library or the underlying mongoose library:
There are many commits in the libs and in the base system since 2.19.1, so I’d suggest to move to mos latest, without modifying your mos.yml, or keep your mos and modify mos.yml:
Upgrading mos tool to latest has bitten me in the past, luckily I didn’t brick any devices I had in the field.
Is there any way just to take the bare minimum, or do I need to accept it all? EG: I’d be happy to try mos core lib at latest but not all the other libs, I need stability.
Sorry, I didn’t quite understand the part where you had “libs that require latest”.
It registered when I saw my error… it’s compiling for me now with the libs you’ve listed at latest.
I’ve been looking around for some docs or explanation on those mos.yml variables and how they interact with the mos tool version, but not having much luck.
I haven’t been able to piece it together, I’m hoping someone might be able to explain it.
libs_version - the default version of the libraries. It can be overridden by the version key for one or more libraries. modules_version - the version of the modules. Modules are used by mjs, mbedtls,… mongoose_os_version - the version of the mongoose-os repo
version can be ${mos.version} which means the version of the mos tool, or a tag, or a commit in the github repo.