I’ve been spending some time recently trying to resolve the source of core-dumps I’ve been getting on my project, and as I resolve some of the issues I’m started to notice a lot of these:
Dump contains FreeRTOS task info
Loaded core dump from last snippet in /core
0x4010093e in mgos_si7021_read (sensor=0xf2)
at /home/andrew/Code/mongoose/deps/si7021-i2c/src/mgos_si7021.c:82
82 if (!sensor || !sensor->i2c) {
#0 0x4010093e in mgos_si7021_read (sensor=0xf2)
at /home/andrew/Code/mongoose/deps/si7021-i2c/src/mgos_si7021.c:82
---Type <return> to continue, or q <return> to quit---
#1 0x40100b5c in mgos_si7021_getTemperature (sensor=0xf2)
at /home/andrew/Code/mongoose/deps/si7021-i2c/src/mgos_si7021.c:149
0x400ff38e in mgos_si7021_read (sensor=0xf2) at /home/andrew/Code/goldilocks-mongoose/deps/si7021-i2c/src/mgos_si7021.c:90
90 double start = mg_time();
#0 0x400ff38e in mgos_si7021_read (sensor=0xf2) at /home/andrew/Code/goldilocks-mongoose/deps/si7021-i2c/src/mgos_si7021.c:90
That doesn’t seem to make sense, the way the original is written the compiler will not dereference sensor if it is NULL. That code will return false if either sensor is NULL or sensor->i2c is.
What do you mean by fail ? core dump/reboot ? The only way for that to happen is after that code if sensor points to somewhere not actually having what it is supposed to have. Any piece of code that writes to memory (for example the next line, sensor->stats.read++) will trash memory somewhere, and sooner or later that will have unintended effects somewhere (else).
Some code of yours seems to be trashing memory, writing where it shouldn’t. Are you using 2.17 or anything newer ? There’s someone else having similar issues though with wifi. I always blame the user, but…
The program will run for hours getting a new temp every 15 seconds then all of a sudden core-dump with the following error:
Dump contains FreeRTOS task info
Loaded core dump from last snippet in /core
0x4010093e in mgos_si7021_read (sensor=0xf2)
at /home/andrew/Code/mongoose/deps/si7021-i2c/src/mgos_si7021.c:82
82 if (!sensor || !sensor->i2c) {
#0 0x4010093e in mgos_si7021_read (sensor=0xf2)
at /home/andrew/Code/mongoose/deps/si7021-i2c/src/mgos_si7021.c:82
---Type <return> to continue, or q <return> to quit---
#1 0x40100b5c in mgos_si7021_getTemperature (sensor=0xf2)
at /home/andrew/Code/mongoose/deps/si7021-i2c/src/mgos_si7021.c:149
No idea why the var would be trashed after a number of hours. Overnight I had 2 core dumps on this, and now it’s been up for 10+ hours.
I was thinking that my program eventually runs out of memory and its just pointing there? Not sure how this would happen, I’m logging mgos_get_free_heap_size and mgos_get_min_free_heap_size regularly and don’t seem them meet.
I would have expected that if I had an overflow somewhere in my program writing to memory it shouldn’t then I would have found out sooner than 10h+, and then would have seen a guru meditation error or something.
I’m thinking now of a way to hard code the struct values in the lib and see if that points to something else going on.
Unless there is a PMU setup or some runtime pointer checking, an incorrectly initialized pointer or incorrect dimensions for a buffer can trash someone else’s memory. Usually, a C environment does not have runtime checking.
I’ve been looking deeper in to the values I’ve been logging and am now even more confused.
I’ve been logging mgos_get_free_heap_size() (blue) and mgos_get_min_free_heap_size() (green). I assumed I had solved my memory leak issue when I started seeing the blue line look mostly flat, since in my head I expected to see the current free heap size tick down as periodic functions ran and then back up when the functions finished and memory was reclaimed. I would see the green line gradually tick down over time, then eventually spike signalling to me the device had rebooted. I never really looked deeply in to why that was.
So I’m confused because I don’t understand how my free heap can seem to return back to the same number over time, but the minimum ever keeps ticking down till I run out of space. Only thing I can think of is around how the remaining memory is fragmented, but the FreeRTOS doc says that no function will tell that, so then why is the minimum ever heap size reducing over time?
The amount of available heap memory might change between sampling points and you wouldn’t notice. Since you run every x time or at point x in a loop, you only see what persists between two sample points.
A decrease in the minimum ever is an indicator that tasks are collectively requesting more memory.
An increase I can only explain as a reboot, providing I replace “since” into “system” in your quoted sentence.
I may be missing something, though, since I’m not fond of dynamic allocation in embedded systems.
Of course… makes perfect sense! I’m logging to the terminal every second, but thats probably not even enough resolution to catch the big dips.
I think I’ve got a solution while not actually solving the problem - I’ve changed from a global instance of the mgos_si7021 instance to creating and destroying on each read.
So strange why the global instance was getting corrupted since I’m seeing the same free_ram numbers after 15h that I was seeing when the program started. No other changes.