Core dump after setting wifi

#1

I’m having trouble with multiple core dumps after flashing new code to ESP32 devices. I am wanting to flash code, set wifi, connect to mDash and finally set Google IoT Core settings so I can connect to there too. The core dumps have appeared at various times in this process of commands, so I’m not understanding what is the problem. But at the moment, they appear when I have set the correct wifi settings (from mos tool or in mos.yml). (There is no core dump when I have the wrong settings.)

I have tried to run mos debug-core-dump, but then get the error

the input device is not a TTY

Logs right before core dump:

[Aug 20 08:41:41.511] I (2813) wifi:new:<2,0>, old:<1,0>, ap:<255,255>, sta:<2,0>, prof:1
[Aug 20 08:40:17.111] Guru Meditation Error: Core 0 panic’ed (Unhandled debug exception)
[Aug 20 08:40:17.117] Debug exception reason: Stack canary watchpoint triggered (wifi)
[Aug 20 08:40:17.122] Core 0 register dump:
[Aug 20 08:41:41.527] PC : 0x400888c3 PS : 0x00060736 A0 : 0x80088a2c A1 : 0x3ffc4350
[Aug 20 08:41:41.539] A2 : 0x3ffbd0cc A3 : 0x00000000 A4 : 0xffffffff A5 : 0x00000000
[Aug 20 08:41:41.544] A6 : 0x3ffbd1c0 A7 : 0x00000001 A8 : 0x80089261 A9 : 0x3ffc4360
[Aug 20 08:41:41.555] A10 : 0x00000003 A11 : 0x00060723 A12 : 0x00060720 A13 : 0x00000001
[Aug 20 08:41:41.561] A14 : 0x0000cdcd A15 : 0x3ffc337c SAR : 0x00000004 EXCCAUSE: 0x00000001
[Aug 20 08:41:41.572] EXCVADDR: 0x00000000 LBEG : 0x40083d98 LEND : 0x40083db7 LCOUNT : 0x00000000
[Aug 20 08:41:41.577]
[Aug 20 08:41:41.577] ELF file SHA256: b5a34c977c1888b6
[Aug 20 08:41:41.583]
[Aug 20 08:41:41.583] Backtrace: 0x400888c3 0x40088a29 0x40084195 0x400e9d8d 0x400ee151 0x4018ecc9 0x400f1109 0x400f18d1 0x400ee175 0x400e969d 0x400e1451 0x4000bd9f 0x40001191 0x4014bbb9 0x4014bb18 0x400f7002 0x400f71fd 0x400e2955 0x400e1964 0x400e4075 0x400e1409 0x4000bd83 0x4000117d 0x400592fe 0x4005937a 0x40058bbf 0x400d6461 0x400dbc0e 0x400dbd8d 0x400e46a9 0x40082176 0x400d210d 0x4010dc9d 0x401119bd 0x4011f969 0x40120a09 0x40120ae1 0x4011c8fc 0x4011ccf1 0x4011ee3d 0x4011ee5e 0x4011ea8c 0x4011ec62 0x4008ff7e
[Aug 20 08:41:41.622]
[Aug 20 08:41:41.622]
[Aug 20 08:41:41.622] — BEGIN CORE DUMP —

Configs from my mos.yml:

config_schema:
  - ["app.pin", "i", 5, {title: "GPIO pin a sensor is attached to"}] #SCL, SDA
  - ["my_app", "o", {title: "My app custom settings"}]
  - ["i2c.enable", true]
  - ["i2c.debug", true]
  - ["i2c.scl_gpio", 22]
  - ["i2c.sda_gpio", 21]
  - ["debug.level", 3]
  - ["debug.event_level", "i", 2, {title: "Max level for which a MGOS_EVENT_LOG is raised"}]
  - ["wifi.ap.enable", false]
  - ["wifi.sta.enable", true]
  - ["wifi.sta.ssid", "SSID"]
  - ["wifi.sta.pass", "PASS"]
  - ["device.id", "esp32_box_test"]
  - ["mqtt.enable.", true]
  - ["mqtt.server", "mqtt.2030.ltsapis.goog:8883"]
  - ["mqtt.ssl_ca_cert", "ca.pem"]
  - ["provision.max_state", 3]
  - ["file_logger.enable", true]	#logging OTA

Very thankful for any help!

#2

When a core dump has been captured a new file will be made in your root directory with the filename that is something like: core-mongoose-esp32-20200820-133927.185275652

try mos debug-code-dump <core-dumpfile>

#3

Thanks, but I’m afraid I still get the same error

the input device is not a TTY

#4

Ah, but I see I get a different result when running the same command in Ubuntu terminal (as opposed to mos tool). I’m not sure what is the relevant output, but here is the last part of it. Besides, it didn’t seem to be able to complete.

Loaded core dump from last snippet in /core
xQueueGenericReceive (xQueue=0x3ffbd0cc, pvBuffer=0x0,
xTicksToWait=4294967295, xJustPeeking=0)
at /opt/Espressif/esp-idf/components/freertos/queue.c:1436
1436 {
#0 xQueueGenericReceive (xQueue=0x3ffbd0cc, pvBuffer=0x0,
xTicksToWait=4294967295, xJustPeeking=0)
at /opt/Espressif/esp-idf/components/freertos/queue.c:1436
—Type to continue, or q to quit—
#1 0x40088a2c in xQueueTakeMutexRecursive (xMutex=0x3ffbd0cc,
xTicksToWait=4294967295)
at /opt/Espressif/esp-idf/components/freertos/queue.c:635
#2 0x40084198 in mgos_rlock (l=0x3ffbd0cc)
at /data/fwbuild-volumes/latest/apps/esp32_box/esp32/build_contexts/build_ctx_243277424/deps/freertos/src/mgos_freertos.c:343
#3 0x400e9d90 in dev_unlock (dev=0x3ffbd0b0)
at /data/fwbuild-volumes/latest/apps/esp32_box/esp32/build_contexts/build_ctx_243277424/deps/vfs-common/src/mgos_vfs_dev.c:92
Unmapped addr 0x28d00
#4 mgos_vfs_dev_register (dev=0x3ffbd0b0, name=0x28d00 “”)
at /data/fwbuild-volumes/latest/apps/esp32_box/esp32/build_contexts/build_ctx_243277424/deps/vfs-common/src/mgos_vfs_dev.c:107
#5 0x400ee154 in mgos_vfs_fs_spiffs_gc_all (spfs=0x3ffbd098)
at /data/fwbuild-volumes/latest/apps/esp32_box/esp32/build_contexts/build_ctx_243277424/deps/vfs-fs-spiffs/src/mgos_vfs_fs_spiffs.c:803
#6 0x4018eccc in lfs_dir_close (lfs=0x3ffbd1c0, dir=0x28d00)
at /data/fwbuild-volumes/latest/apps/esp32_box/esp32/build_contexts/build_ctx_243277424/deps/vfs-fs-lfs/littlefs/lfs.c:2062
#7 0x400f110c in spiffs_gc_check (fs=0x28d00, len=653)
at /data/fwbuild-volumes/latest/apps/esp32_box/esp32/build_contexts/build_ctx_243277424/deps/vfs-fs-spiffs/src/spiffs/spiffs_gc.c:182
#8 0x400f18d4 in SPIFFS_stat (fs=0x3ffbd1c0,
path=0x3ffbd240 “\300\321\373?\001”, s=0x3ffc44d0)
—Type to continue, or q to quit—
at /data/fwbuild-volumes/latest/apps/esp32_box/esp32/build_contexts/build_ctx_243277424/deps/vfs-fs-spiffs/src/spiffs/spiffs_hydrogen.c:769
#9 0x400ee178 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

#5

Ah yeah, should have mentioned that. Personally I just use the commands mos build, mos console, mos flash and mos debug-core-dump from the terminal inside VS code.

I’m no pro but something looks off in your core dump. Usually you would see a trace showing the last call that triggered the dump.

Maybe try a mos build --clean --local --platform esp32, then flash and try again.

#7

Ok, thanks! I tried the clean build and flashed. I still get core dumps after setting correct wifi settings, but when I tried mos debug-core-dump it looks like it at least completed. I’m not making much sense of it atm though.

Remote debugging using 127.0.0.1:1234
Found core at 23 - 460876
Mapping DRAM: 335872 @ 0x3ffae000
Mapping /opt/Espressif/rom/rom.bin at 0x40000000
Mapping /fw.elf .rtc.text: 52 @ 0x400c0000
Mapping /fw.elf .rtc.dummy: 52 @ 0x3ff80000
Mapping /fw.elf .iram0.vectors: 1024 @ 0x40080000
Mapping /fw.elf .iram0.text: 90437 @ 0x40080400
Mapping /fw.elf .dram0.data: 14600 @ 0x3ffb0000
Mapping /fw.elf .dram0.bss: 33808 @ 0x3ffb3908
Mapping /fw.elf .flash.rodata: 196192 @ 0x3f400020
Mapping /fw.elf .flash.text: 792115 @ 0x400d0018
Dump contains FreeRTOS task info
Loaded core dump from last snippet in /core
_WindowOverflow8 () at /opt/Espressif/esp-idf/components/freertos/xtensa_vectors.S:1869
1869 s32e a4, a0, -32 /* save a4 to call[j]'s stack frame */
Unmapped addr 0x7ffc4610
#0 _WindowOverflow8 () at /opt/Espressif/esp-idf/components/freertos/xtensa_vectors.S:1869
#1 0x7ffc4610 in ?? ()
#2 0x400e02d7 in spi_flash_op_lock () at /opt/Espressif/esp-idf/components/spi_flash/cache_utils.c:56
Unmapped addr 0xfffffff4
#3 0x40082855 in spi_flash_disable_interrupts_caches_and_other_cpu () at /opt/Espressif/esp-idf/components/spi_flash/cache_utils.c:95
(gdb)

#8

Try to disable your C code and try again. Some task might be trashing memory and pointers get crazy.

#9

Sounds plausible, but when I commented out my entire C file (and the ffi functions in my JS) and tried again, I had the same results I’m afraid (core dumps after setting wifi).

#10

You don’t seem to be using enterprise authentication, so WiFi is pretty stable. Something in your code or your config is triggering this.
mJS with no FFIs is stable too.
Are you remotely building on a stable release ?
You reset your device and just core dumps from start ?

#11

To answer your questions: I’m building locally at the moment. Have recently run both mos update release and mos update latest to see if this helped (if this is what you meant with stable release). I haven’t managed to reset device I’m afraid.

But I have now tried editing out parts of my code to find what is wrong. When I have empty C and JS files, and only the mos.yml, things seem to work fine. Although when I start adding things to my JS file, I start getting the core dumps. I can add the first lines of code without problem:

load('api_config.js');
load('api_gpio.js');
load('api_mqtt.js');
load('api_timer.js');
load('api_sys.js');

let temperature = null;
let humidity = null;
let datetime_temp = null;
let datetime_humid = null;

But then it seems I get the core dumps no matter what I add from the below, even if it’s only the two first variables.

/******************* Variables to set ********************/

let interval_data = 60000;
let box = 40;

/**********************************************************/

// For sending and receiving data from Google Cloud IoT Core
let configTopic = '/devices/' + Cfg.get('device.id') + '/config';
let eventsTopic = '/devices/' + Cfg.get('device.id') + '/events';
let stateTopic = '/devices/' + Cfg.get('device.id') + '/state';

//Integration of C in mJS code
let return_temp = ffi('float return_temp()');
let return_humid = ffi('float return_humid()');

// Subscription to config data from Google Cloud
MQTT.sub(configTopic, function(conn, configTopic, msg) {
  print('Topic:', configTopic, ', message:', msg);
}, null);

Timer.set(interval_data, Timer.REPEAT, function() {
  if(temperature!==null) {
    let json_str = '{"box": ' + JSON.stringify(box) +
 		  ', "temperature": ' + JSON.stringify(temperature) + ', "datetime_temp": ' + JSON.stringify(datetime_temp) +
     		  ', "humidity": ' + JSON.stringify(humidity) + ', "datetime_humid": ' + JSON.stringify(datetime_humid) + '}';
    // Publish data to pub/sub
    MQTT.pub(eventsTopic, json_str, 1); 
    print('Published to Pub/Sub: ', json_str);
  }
}, null);

The same seems to go for the C code, as I can’t add even these few lines without the core dumps starting:

#include "mgos.h"
#include<stdio.h>
#include "mgos_i2c.h"	// For humid and temp sensor SHT31
#include "mgos_sht31.h"	// For humid and temp sensor SHT31
#include "mgos_adc.h"	// For pH and EX sensors (analog to digital conversion)
#include <math.h>

#define SECOND 1000
#define MINUTE 60000

I read here a tip about lowering log level, so I also tried this, but to no avail!

#12

I managed to get some old code working without core dumps. I then made a number of changes, new builds and flashed to device several times. Then, after adding (editing to) the below code, I got a core dump again. And even if I revert to the functional code before the core dumps, the dumps keep coming. Does anyone know why this is and what can be done?

I’ve tried rebooting the device with Ctrl-u from mos tool, but I only get error

No response to handshake. Is /dev/ttyUSB0 the right port? Is rpc-uart enabled?

(I’m guessing this is because the device is giving core dumps continually). Disconnecting the device for a while doesn’t help either.

The code that gave the core dump:

char datetime_humid[64];

static void timer_temp(void *user_data) {
  //Getting timestamp
  time_t t = time(NULL);
  struct tm *tm = localtime(&t);
  //LOG(LL_INFO, ("now: %d-%02d-%02dT%02d:%02d:%02d\n", tm.tm_year + 1900, tm.tm_mon + 1, tm.tm_mday, tm.tm_hour, tm.tm_min, tm.tm_sec));
  strftime(datetime_humid, sizeof(datetime_humid), "%Y-%m-%dT%H:%M:%S", tm);
 (void) user_data;
}

I don’t know if it’s relevant to locate the error that caused the core dump this time around (as they keep coming even if I remove the above code), but the working code before the above was as follows:

char datetime_humid[64];

static void timer_temp(void *user_data) {
  //Getting timestamp
  time_t t = time(NULL);
  struct tm tm = *localtime(&t); //EDITED HERE
  LOG(LL_INFO, ("now: %d-%02d-%02dT%02d:%02d:%02d\n", tm.tm_year + 1900, tm.tm_mon + 1, tm.tm_mday, tm.tm_hour, tm.tm_min, tm.tm_sec));  //COMMENTED THIS OUT
  //strftime(datetime_humid, sizeof(datetime_humid), "%Y-%m-%dT%H:%M:%S", tm); //UNCOMMENTED THIS
 (void) user_data;
}
#13

That looks strange, try to build remotely with 2.17 that afaik is rock solid.

#14

Right, thanks! I will try 2.17, seeing as I now have 2.18. But of course I also get an error when trying to downgrade… (I followed this method.)

$ mos update 2.17.0
Update channel: release
Error: /build/mos-agrqYm/mos-2.18.0+1ec8595~focal0/cli/update/ubuntu.go:108: no package found for 2.17.0 focal amd64
/build/mos-agrqYm/mos-2.18.0+1ec8595~focal0/cli/main.go:198: update failed

#15

That guide is outdated. After removing the ppa references use this command:

curl -fsSL https://mongoose-os.com/downloads/mos/install.sh | NO_PPA=1 /bin/bash
#16

Thanks for this updated guide!

My core dumps haven’t returned, although I’m not sure of the reason. But it could be because of using 2.17 instead of 2.18.

#17

Thanks! I’m not sure what actually solved the problem, but it might well have been using 2.17 - great!

#18

Sometimes an environment obscures existing problems, sometimes the environment is buggy… I usually favor the framework over the user; but since 2.18 is brand new, it might have some new mistakes in. Anyway, you should carefully inspect your code and try to reproduce the errors on 2.18 with a minimum example in order to be reported and fixed.

#19

Self solving problems are a double edged sword. It’s nice when they go away but they sometimes reappear. Can you reproduce it? If so…

I noticed that your output gave you backtrace info just before the core dump. In my experience, this often points right to the line that triggered the issue. finding that is explained here:

https://mdash.net/docs/#crash-backtrace

#20

@ingrid 2.18.0 introduced dual core by default for ESP32. Try to disable it and rebuild.

build_vars:
  ESP_IDF_SDKCONFIG_OPTS: >
    ${build_vars.ESP_IDF_SDKCONFIG_OPTS} 
      CONFIG_FREERTOS_UNICORE=y

You can also pin your firmware to the version of your choice, e.g.

libs_version: 2.17.0
modules_version: 2.17.0
mongoose_os_version: 2.17.0
2 Likes