OTA intermittent

#1

I am trying to get OTA updates working via the following command:

mos --port ws://192.168.4.1/rpc call OTA.Update “{”"“url”"":""“http://foobar.com/fw/fw.zip”""}"

fw.zip is hosted at “foobar.com” (an internet provider hosted site) and I am successfully able to download fw.zip and save it. We are licensed users, and the device in question is properly licensed with “mos license”.

SOMETIMES, the OTA update works exactly as expected. However, MOST of the time, the response to the above command is as follows:

Error: context deadline exceeded
/src/go/src/github.com/mongoose-os/mos/common/mgrpc/mgrpc.go:460:
/src/go/src/github.com/mongoose-os/mos/cli/dev/dev_conn_impl.go:167:
/src/go/src/github.com/mongoose-os/mos/cli/dev/dev_conn_impl.go:180:
/src/go/src/github.com/mongoose-os/mos/cli/main.go:197: call failed

In ALL observed update attempts, (even after receiving the above error) when the device is rebooted, the anticipated firmware update has been successfully achieved.

Please advise any changes that could be made to make the update procedure 100% predictable.

Thanks for your help!

JSW

#2

There could be a bunch of reasons for the timeout - most probable is that your devices reboots/crashes. Attach a serial console and check the logs.
Alternatively, setup up a udp logger and catch the logs remotely.

mos config.set debug.udp_log_addr=IP:PORT

#3

Hi cpq,

Thanks for helping. I attached a serial port console, and here is what I get when the context deadline error message is received. Please note that the system successfully restarts with the correct firmware at 13:04:20.159

I can’t see anything that is obviously wrong. Can you point out any problem?

Thanks,

JSW

Mar  4 13:03:52.078] mgos_wifi.c:160         80:2b:f9:5c:65:77 connected
[Mar  4 13:03:52.079] dhcps: send_nak>>udp_sendto result 0
[Mar  4 13:03:52.086] e[0;32mI (17019) tcpip_adapter: softAP assign IP to station,IP is: 192.168.4.2e[0m
[Mar  4 13:03:52.088] esp32_main.c:98         event: 18
[Mar  4 13:03:58.163] mgos_http_server.c:180  0x3ffccc00 HTTP connection from 192.168.4.2:56223
[Mar  4 13:03:58.174] mg_rpc.c:293            OTA.Update via WS_in 192.168.4.2:56223
[Mar  4 13:03:58.181] mgos_ota_http_clien:270 Update URL: http://www.foobar.com/fw/fw.zip
[Mar  4 13:03:58.588] mgos_ota_core.c:253     Starting, timeout 600, commit timeout 0, mem 206008
[Mar  4 13:03:58.588] mgos_ota_core.c:489     FW: Flasher11 esp32 1.0 20200303-191424
[Mar  4 13:03:58.591] esp32_ota_backend.c:197 App: Flasher11.bin -> app_1, FS: fs.img -> fs_1
[Mar  4 13:03:58.858] esp32_ota_backend.c:265 Writing app image @ 0x3e0000
[Mar  4 13:04:00.849] dhcps: send_nak>>udp_sendto result 0
[Mar  4 13:04:03.141] mgos_mongoose.c:66      New heap free LWM: 187144
[Mar  4 13:04:03.142] mgos_ota_core.c:506     0.14% total, Flasher11.bin 512 of 1060768
[Mar  4 13:04:04.845] mgos_ota_core.c:506     15.09% total, Flasher11.bin 262656 of 1060768
[Mar  4 13:04:05.130] 24
[Mar  4 13:04:06.552] mgos_ota_core.c:506     30.04% total, Flasher11.bin 524800 of 1060768
[Mar  4 13:04:08.288] mgos_ota_core.c:506     45.00% total, Flasher11.bin 786944 of 1060768
[Mar  4 13:04:09.978] mgos_ota_core.c:506     59.95% total, Flasher11.bin 1049088 of 1060768
[Mar  4 13:04:10.063] e[0;32mI (34969) esp_image: segment 0: paddr=0x003e0020 vaddr=0x3f400020 size=0x2b420 (177184) mape[0m
[Mar  4 13:04:10.194] e[0;32mI (35059) esp_image: segment 1: paddr=0x0040b448 vaddr=0x3ffb0000 size=0x029f8 ( 10744) e[0m
[Mar  4 13:04:10.194] e[0;32mI (35079) esp_image: segment 2: paddr=0x0040de48 vaddr=0x40080000 size=0x00400 (  1024) e[0m
[Mar  4 13:04:10.195] e[0;32mI (35089) esp_image: segment 3: paddr=0x0040e250 vaddr=0x40080400 size=0x01dc0 (  7616) e[0m
[Mar  4 13:04:10.196] e[0;32mI (35099) esp_image: segment 4: paddr=0x00410018 vaddr=0x400d0018 size=0xc3f1c (802588) mape[0m
[Mar  4 13:04:10.580] e[0;32mI (35489) esp_image: segment 5: paddr=0x004d3f3c vaddr=0x400821c0 size=0x0effc ( 61436) e[0m
[Mar  4 13:04:10.625] e[0;32mI (35529) esp_image: segment 6: paddr=0x004e2f40 vaddr=0x400c0000 size=0x00034 (    52) e[0m
[Mar  4 13:04:10.912] esp32_ota_backend.c:383 app_1 verified (e86303fcef46610728ef3eb3deae7ade4954a193)
[Mar  4 13:04:10.912] mgos_ota_core.c:506     60.65% total, bootloader.bin 512 of 23152
[Mar  4 13:04:11.251] esp32_ota_backend.c:288 Writing FS image @ 0x708000
[Mar  4 13:04:13.876] 30
[Mar  4 13:04:13.878] mgos_ota_core.c:506     61.97% total, fs.img 512 of 655360
[Mar  4 13:04:15.467] mgos_ota_core.c:506     76.92% total, fs.img 262656 of 655360
[Mar  4 13:04:17.072] mgos_ota_core.c:506     91.88% total, fs.img 524800 of 655360
[Mar  4 13:04:18.075] esp32_ota_backend.c:383 fs_1 verified (52308bc18833b4ca1d8a3c0a73ec208a7800a6b3)
[Mar  4 13:04:18.076] mgos_ota_core.c:506     99.36% total, otadata.bin 512 of 8192
[Mar  4 13:04:18.140] mgos_ota_core.c:506     99.83% total, partitions_mgos.bin 512 of 3072
[Mar  4 13:04:18.140] mgos_ota_core.c:629     Reached the end of archive
[Mar  4 13:04:18.141] esp32_ota_backend.c:448 Setting boot partition to app_1
[Mar  4 13:04:18.141] e[0;32mI (43059) esp_image: segment 0: paddr=0x003e0020 vaddr=0x3f400020 size=0x2b420 (177184) mape[0m
[Mar  4 13:04:18.291] e[0;32mI (43159) esp_image: segment 1: paddr=0x0040b448 vaddr=0x3ffb0000 size=0x029f8 ( 10744) e[0m
[Mar  4 13:04:18.292] e[0;32mI (43169) esp_image: segment 2: paddr=0x0040de48 vaddr=0x40080000 size=0x00400 (  1024) e[0m
[Mar  4 13:04:18.295] e[0;32mI (43179) esp_image: segment 3: paddr=0x0040e250 vaddr=0x40080400 size=0x01dc0 (  7616) e[0m
[Mar  4 13:04:18.297] e[0;32mI (43189) esp_image: segment 4: paddr=0x00410018 vaddr=0x400d0018 size=0xc3f1c (802588) mape[0m
[Mar  4 13:04:18.656] e[0;32mI (43579) esp_image: segment 5: paddr=0x004d3f3c vaddr=0x400821c0 size=0x0effc ( 61436) e[0m
[Mar  4 13:04:18.717] e[0;32mI (43619) esp_image: segment 6: paddr=0x004e2f40 vaddr=0x400c0000 size=0x00034 (    52) e[0m
[Mar  4 13:04:18.718] e[0;32mI (43629) esp_ota_ops: New OTA data 0: seq 0x00000014, st 0x11, CRC 0x89c6a7f1e[0m
[Mar  4 13:04:18.793] mgos_ota_core.c:811     Update finished, result 1 (Update applied, rebooting)
[Mar  4 13:04:18.793] mgos_ota_core.c:852     Update requires reboot
[Mar  4 13:04:18.795] mgos_utils.c:38         Rebooting in 500 ms
[Mar  4 13:04:19.395] mgos_vfs.c:1026         Unmounting filesystems
[Mar  4 13:04:19.395] I (44219) wifi: state: run -> init (0)
[Mar  4 13:04:19.398] I (44219) wifi: pm stop, total sleep time: 0 us / 40104452 us
[Mar  4 13:04:19.400]
[Mar  4 13:04:19.403] I (44229) wifi: new:<5,2>, old:<5,2>, ap:<5,2>, sta:<5,0>, prof:5
[Mar  4 13:04:19.410] I (44239) wifi: station: 80:2b:f9:5c:65:77 leave, AID = 1, bss_flags is 134243, bss:0x3ffcc0f4
[Mar  4 13:04:19.411] I (44239) wifi: new:<5,0>, old:<5,2>, ap:<5,2>, sta:<5,0>, prof:5
[Mar  4 13:04:19.427] e[0;31mE (44249) event: system_event_ap_stop_handle_default 223 esp_wifi_internal_reg_rxcb ret=0x3014e[0m
[Mar  4 13:04:19.428] I (44269) wifi: flush txq
[Mar  4 13:04:19.449] I (44269) wifi: stop sw txq
[Mar  4 13:04:19.450] I (44269) wifi: lmac stop hw txq
[Mar  4 13:04:19.466] I (44269) wifi: Deinit lldesc rx mblock:10
[Mar  4 13:04:19.467] mgos_system.c:42        Restarting
[Mar  4 13:04:19.489] ets Jun  8 2016 00:22:57
[Mar  4 13:04:20.158] blah,blah blah.................................
[Mar  4 13:04:20.159] mgos_freertos.c:175     Flasher11 1.0 (20200303-191424)
[Mar  4 13:04:20.176] mgos_freertos.c:177     Mongoose OS 2.17.0 (20200303-191423/2.17.0-gc31a745)
[Mar  4 13:04:20.176] mgos_freertos.c:181     CPU: 160 MHz, FreeRTOS 8.2.0, heap: 289056 total, 256892 free
[Mar  4 13:04:20.199] mgos_freertos.c:183     Newlib 2.2.0
#4

My guess would be that a firmware doesn’t get a change to send a response, before it reboots. Sometimes it manages to reply, but most of the time, not. To verify that theory, try to make an OTA in a local network and see whether that changes anything.

#5

Hi Sergey,

Thanks for the help.

I installed a server in my local network and tried the update using:

C:\Mong\Flasher23>mos --port ws://192.168.4.1/rpc call OTA.Update “{”"“url”"":""“http://192.168.1.9/fw/fw.zip”""}"

This works 100% of the time.

I am still having intermittent trouble with a remotely hosted update. Sometimes the context deadline exceeded message occurs. Mostly not. It seems to depend on how long the firmware update takes to complete. If it’s less than 20 seconds from server logon to reboot, the update is always successful. I am trying to get a quantitative result for a “context deadline exceeded”, but at the moment updates are proceeding very quickly, and I can’t reproduce it. I will continue testing at a later time in an attempt to prove this theory.

Any thoughts?

Thanks,

JSW
Any