RPC OTA Update over mDash fails

If you are asking a question, please follow this template:

  1. My goal is to remotely update my devices.
  2. My actions are: I have deployed the firmware on multiple devices and am trying to update them via the OTA mechanism.
  3. The result I see is:

[Sep 1 22:33:42.980] mgos_aws_shadow.c:250 Update: {“state”: {“reported”: {“ota”:{“message”: “Waiting for data”, “status”: 1, “is_committed”: true, “commit_timeout”: 0, “partition”: 0, “progress_percent”: 98}}}, “clientToken”: “da4fa0f3”}
[Sep 1 22:33:44.985] mg_rpc.c:293 OTA.Write via WSS_out 148.251.54.236:443
[Sep 1 22:33:47.041] mg_rpc.c:293 OTA.Write via WSS_out 148.251.54.236:443
[Sep 1 22:33:49.041] mg_rpc.c:293 OTA.Write via WSS_out 148.251.54.236:443
[Sep 1 22:33:49.249] mgos_ota_core.c:506 98.81% total, fs.img 252416 of 262144
[Sep 1 22:33:49.444] UPDATE {}
[Sep 1 22:33:49.580] mgos_aws_shadow.c:250 Update: {“state”: {“reported”: {“ota”:{“message”: “Waiting for data”, “status”: 1, “is_committed”: true, “commit_timeout”: 0, “partition”: 0, “progress_percent”: 98}}}, “clientToken”: “########”}
[Sep 1 22:33:50.731] mg_ssl_if_mbedtls.c:35 0x3fff762c mbedtls_ssl_flush_output() returned -78 (-0x004e)
[Sep 1 22:33:51.005] mg_ssl_if_mbedtls.c:35 0x3fff762c mbedtls_ssl_write_record() returned -78 (-0x004e)
[Sep 1 22:33:51.019] mg_ssl_if_mbedtls.c:35 0x3fff762c mbedtls_ssl_send_alert_message() returned -78 (-0x004e)
[Sep 1 22:33:51.213] mgos_mqtt.c:167 MQTT Disconnect
[Sep 1 22:33:51.396] mgos_mqtt.c:511 MQTT connecting after 2019 ms
[Sep 1 22:33:53.625] mgos_mqtt.c:435 MQTT connecting to ghtrhrbasab-ats.iot.eu-west-2.amazonaws.com:8883
[Sep 1 22:33:54.321] mg_rpc.c:293 OTA.Write via WSS_out 148.251.54.236:443
[Sep 1 22:33:54.611] mgos_ota_core.c:506 98.93% total, fs.img 254464 of 262144
[Sep 1 22:33:54.819] UPDATE {}
[Sep 1 22:33:56.678] mg_ssl_if_mbedtls.c:35 0x3fff76e4 ciphersuite: TLS-ECDHE-RSA-WITH-AES-128-GCM-SHA256
[Sep 1 22:33:57.423] E:M 516
[Sep 1 22:33:57.516] E:M 1516
[Sep 1 22:33:57.539] Guru Meditation Error: Core 0 panic’ed (Unhandled debug exception)
[Sep 1 22:33:57.539] Debug exception reason: Stack canary watchpoint triggered (tiT)
[Sep 1 22:33:57.539] Core 0 register dump:
[Sep 1 22:33:57.539] PC : 0x40096559 PS : 0x00060836 A0 : 0x800965e1 A1 : 0x3ffbcc20
[Sep 1 22:33:57.539] A2 : 0x3ffae270 A3 : 0x3ffbcc50 A4 : 0x00060820 A5 : 0x3ffbce30
[Sep 1 22:33:57.561] A6 : 0x00000009 A7 : 0x00000001 A8 : 0x00000001 A9 : 0x3ffc95e0
[Sep 1 22:33:57.561] A10 : 0x3ff00040 A11 : 0x00000028 A12 : 0x00000001 A13 : 0x3ffbcd90
[Sep 1 22:33:57.583] A14 : 0x00000001 A15 : 0x00000000 SAR : 0x00000008 EXCCAUSE: 0x00000001
[Sep 1 22:33:57.583] EXCVADDR: 0x00000000 LBEG : 0x4000c349 LEND : 0x4000c36b LCOUNT : 0xffffffff
[Sep 1 22:33:57.583]
[Sep 1 22:33:57.583] Backtrace: 0x40096559 0x400965de 0x40096c35 0x4008284c 0x400e9947 0x400e935c 0x400edbd5 0x401df535 0x400f03d9 0x400f0ba1 0x400ed4cd 0x400e8cd9 0x400e1365 0x4000bd9f 0x40001162 0x400592fe 0x4005937a 0x40058bbf 0x4019def9 0x4019df45 0x400f7c11 0x400e27ad 0x400e1838 0x400e3de1 0x400e131d 0x4000bd83 0x4000117d 0x400592fe 0x4005937a 0x40058bbf 0x400d6345 0x400dbb2e 0x400d25cd 0x400819ae 0x400819f0 0x40081e41 0x4000beaf 0x40182629 0x401826d3 0x40182784 0x40184e8d 0x4018dc61 0x4018e0bc 0x4018e0ed 0x40189b89 0x40189bc1 0x40189be9 0x40186af9 0x40186e93 0x40183832 0x40183dbb
[Sep 1 22:33:57.628] Guru Meditation Error: Core 0 panic’ed (Cache disabled but cached memory region accessed)
[Sep 1 22:33:57.650] Core 0 register dump:
[Sep 1 22:33:57.650] PC : 0x400813f0 PS : 0x00040034 A0 : 0x00040023 A1 : 0x3ffbc9c0
[Sep 1 22:33:57.650] A2 : 0x3ffbca30 A3 : 0x00000000 A4 : 0x00000000 A5 : 0x3ffbcb00
[Sep 1 22:33:57.672] A6 : 0x3ffc121c A7 : 0x3ffc1225 A8 : 0x8008f7d0 A9 : 0x400813d9
[Sep 1 22:33:57.672] A10 : 0x383fc000 A11 : 0x383fc000 A12 : 0x00005440 A13 : 0x00000000
[Sep 1 22:33:57.672] A14 : 0x001d0000 A15 : 0x00003300 SAR : 0x00000013 EXCCAUSE: 0x00000007
[Sep 1 22:33:57.694] EXCVADDR: 0x00000000 LBEG : 0x4000c2e0 LEND : 0x4000c2f6 LCOUNT : 0x00000000
[Sep 1 22:33:57.694]
[Sep 1 22:33:57.694] Backtrace: 0x400813f0 0x00040020 0x400d0f0f 0x4008f5d0 0x40081359
[Sep 1 22:33:57.694] Re-entered core dump! Exception happened during core dump!
[Sep 1 22:33:57.696] Rebooting…
[Sep 1 22:33:57.719] ets Jun 8 2016 00:22:57
[Sep 1 22:33:57.719]

  1. My expectation & question is: OTA updates have worked flawlessly before so I don’t understand why they are failing now. Any direction would be greatly appreciated!

Get the .elf file for your firmware, and run addr-to-line to symbolise the backtrace. You’ll know exactly where the crash has happened.

See https://mdash.net/docs/tutorials/arduino-esp32-backtrace-analysis.md - that’s for arduino, but the same should work for mongoose os, esp-idf too.

I have run the tool and this is the output, let me know if there is anything in here that can help with debugging.

0x40096559: esp_rom_spiflash_read_status at /opt/Espressif/esp-idf/components/spi_flash/spi_flash_rom_patch.c:249
0x400965de: esp_rom_spiflash_wait_idle at /opt/Espressif/esp-idf/components/spi_flash/spi_flash_rom_patch.c:35
0x40096c35: esp_rom_spiflash_read_data at /opt/Espressif/esp-idf/components/spi_flash/spi_flash_rom_patch.c:214
(inlined by) esp_rom_spiflash_read at /opt/Espressif/esp-idf/components/spi_flash/spi_flash_rom_patch.c:586
0x4008284c: spi_flash_read at /opt/Espressif/esp-idf/components/spi_flash/flash_ops.c:567
0x400e9947: esp32_vfs_dev_partition_read at /home/b/git/DataLabs/deps/vfs-common/src/esp32/esp32_vfs_dev_partition.c:68
0x400e935c: mgos_vfs_dev_read at /home/b/git/DataLabs/deps/vfs-common/src/mgos_vfs_dev.c:152
0x400edbd5: mgos_spiffs_read at /home/b/git/DataLabs/deps/vfs-fs-spiffs/src/mgos_vfs_fs_spiffs.c:94
0x401df535: spiffs_phys_rd at /home/b/git/DataLabs/deps/vfs-fs-spiffs/src/spiffs/spiffs_nucleus.c:86
0x400f03d9: spiffs_stat_pix at /home/b/git/DataLabs/deps/vfs-fs-spiffs/src/spiffs/spiffs_hydrogen.c:732
0x400f0ba1: SPIFFS_fstat at /home/b/git/DataLabs/deps/vfs-fs-spiffs/src/spiffs/spiffs_hydrogen.c:793
0x400ed4cd: mgos_vfs_fs_spiffs_lseek at /home/b/git/DataLabs/deps/vfs-fs-spiffs/src/mgos_vfs_fs_spiffs.c:643
0x400e8cd9: mgos_vfs_lseek at /home/b/git/DataLabs/deps/vfs-common/src/mgos_vfs.c:534
0x400e1365: esp_vfs_lseek at /opt/Espressif/esp-idf/components/vfs/vfs.c:445 (discriminator 4)
0x4000bd9f: ?? ??:0
0x40001162: ?? ??:0
0x400592fe: ?? ??:0
0x4005937a: ?? ??:0
0x40058bbf: ?? ??:0
0x4019def9: _fwrite_r at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdio/…/…/…/…/./newlib/libc/stdio/fwrite.c:170
0x4019df45: fwrite at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdio/…/…/…/…/./newlib/libc/stdio/fwrite.c:211
0x400f7c11: debug_write_cb at /home/b/git/DataLabs/deps/file-logger/src/mgos_file_logger.c:131
0x400e27ad: mgos_event_trigger at /home/b/git/DataLabs/deps/modules/mongoose-os/src/mgos_event.c:128
0x400e1838: mgos_debug_write at /home/b/git/DataLabs/deps/modules/mongoose-os/src/mgos_debug.c:107
0x400e3de1: debug_write at /home/b/git/DataLabs/deps/modules/mongoose-os/platforms/esp32/src/esp32_debug.c:62
0x400e131d: esp_vfs_write at /opt/Espressif/esp-idf/components/vfs/vfs.c:432 (discriminator 4)
0x4000bd83: ?? ??:0
0x4000117d: ?? ??:0
0x400592fe: ?? ??:0
0x4005937a: ?? ??:0
0x40058bbf: ?? ??:0
0x400d6345: __sprint_r at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdio/…/…/…/…/./newlib/libc/stdio/vfprintf.c:437
0x400dbb2e: _vfprintf_r at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdio/…/…/…/…/./newlib/libc/stdio/vfprintf.c:1782 (discriminator 1)
0x400d25cd: printf at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdio/…/…/…/…/./newlib/libc/stdio/printf.c:58
0x400819ae: heap_caps_malloc at /opt/Espressif/esp-idf/components/heap/heap_caps.c:122
0x400819f0: heap_caps_malloc_default at /opt/Espressif/esp-idf/components/heap/heap_caps.c:154
0x40081e41: _times_r at ??:?
0x4000beaf: ?? ??:0
0x40182629: mem_malloc at /opt/Espressif/esp-idf/components/lwip/lwip/src/core/mem.c:124
0x401826d3: do_memp_malloc_pool at /opt/Espressif/esp-idf/components/lwip/lwip/src/core/memp.c:301
0x40182784: memp_malloc at /opt/Espressif/esp-idf/components/lwip/lwip/src/core/memp.c:398
0x40184e8d: netif_issue_reports at /opt/Espressif/esp-idf/components/lwip/lwip/src/core/netif.c:693
0x4018dc61: pppos_netif_output at /opt/Espressif/esp-idf/components/lwip/lwip/src/netif/ppp/pppos.c:256
0x4018e0bc: ppp_netif_output at /opt/Espressif/esp-idf/components/lwip/lwip/src/netif/ppp/ppp.c:587
0x4018e0ed: ppp_netif_output_ip4 at /opt/Espressif/esp-idf/components/lwip/lwip/src/netif/ppp/ppp.c:481
0x40189b89: ip4_output_if_opt_src at /opt/Espressif/esp-idf/components/lwip/lwip/src/core/ipv4/ip4.c:1031
0x40189bc1: ip4_output_if_opt at /opt/Espressif/esp-idf/components/lwip/lwip/src/core/ipv4/ip4.c:845
0x40189be9: ip4_output_if at /opt/Espressif/esp-idf/components/lwip/lwip/src/core/ipv4/ip4.c:822
0x40186af9: tcp_output_segment at /opt/Espressif/esp-idf/components/lwip/lwip/src/core/tcp_out.c:1322
(inlined by) tcp_output at /opt/Espressif/esp-idf/components/lwip/lwip/src/core/tcp_out.c:1127
0x40186e93: tcp_rexmit_rto at /opt/Espressif/esp-idf/components/lwip/lwip/src/core/tcp_out.c:1436
0x40183832: tcp_slowtmr at /opt/Espressif/esp-idf/components/lwip/lwip/src/core/tcp.c:1076
0x40183dbb: tcp_fasttmr at /opt/Espressif/esp-idf/components/lwip/lwip/src/core/tcp.c:1301

Are you expecting that somebody will debug that?

I don’t expect anything, just posting a trace that seems to point to a bug in MGOS’s underlying libraries.

The issue was in the size of the firmware I was pushing.

Having both a large firmware on the device and updating it with another large firmware caused the device to run out of memory.

I solved the issue by pushing a small firmware update with only the ota library in it and then pushing the actual firmware update.

Thanks for the background!