ESP32 heap analysis

Martin_Zdila · April 10, 2019, 10:54am

Hello,

We use Mongoose OS 2.12.1 on ESP32. In the console we see following messages:

[Apr 10 12:43:01.909] mgos_mongoose.c:66      New heap free LWM: 136596
[Apr 10 12:43:02.133] mgos_mongoose.c:66      New heap free LWM: 135448
[Apr 10 12:43:04.565] mgos_mongoose.c:66      New heap free LWM: 134620
[Apr 10 12:43:04.677] mgos_mongoose.c:66      New heap free LWM: 134232

The number decreases to some value and then app crashes dumping core but gdb shows no stacktrace.

Sometimes we see message about corrupted heap.

We’ve added MGOS_ENABLE_HEAP_LOG and MGOS_ENABLE_CALL_TRACE build vars but console output is not verbose as mentioned in the doc.

How can we see the stack trace of last memory allocation attempt or get the memory dump for analysis?

Also is there a way to dump memory on demand?

And how can we disable dumping core so that device restarts as quickly as possible? This would be handy for production.

Thanks in advance.

Martin_Zdila · April 10, 2019, 10:57am

Also device sometimes doens’t restart and we see:

[Apr 10 12:56:17.357] mgos_mongoose.c:66      New heap free LWM: 95684
[Apr 10 12:56:27.917] mgos_mongoose.c:66      New heap free LWM: 95116
[Apr 10 12:56:28.589] mgos_mongoose.c:66      New heap free LWM: 62748
[Apr 10 12:56:42.397] mgos_mongoose.c:66      New heap free LWM: 20972
[Apr 10 12:56:50.688] E:M 68469
[Apr 10 12:56:50.688] Heap summary for capabilities 0x00001800:
[Apr 10 12:56:50.688]   At 0x3ffb2730 len 15448 free 0 allocated 15324 min_free 0
[Apr 10 12:56:50.688]     largest_free_block 0 alloc_blocks 23 free_blocks 0 total_blocks 23
[Apr 10 12:56:50.688]   At 0x3ffaff10 len 240 free 0 allocated 172 min_free 0
[Apr 10 12:56:50.688]     largest_free_block 0 alloc_blocks 9 free_blocks 0 total_blocks 9
[Apr 10 12:56:50.688]   At 0x3ffb6388 len 7288 free 0 allocated 7160 min_free 0
[Apr 10 12:56:50.688]     largest_free_block 0 alloc_blocks 24 free_blocks 0 total_blocks 24
[Apr 10 12:56:50.688]   At 0x3ffb9a20 len 16648 free 40 allocated 16020 min_free 0
[Apr 10 12:56:50.736]     largest_free_block 28 alloc_blocks 137 free_blocks 2 total_blocks 139
[Apr 10 12:56:50.736]   At 0x3ffd0550 len 64176 free 984 allocated 61304 min_free 420
[Apr 10 12:56:50.736]     largest_free_block 432 alloc_blocks 448 free_blocks 16 total_blocks 464
[Apr 10 12:56:50.736]   At 0x3ffe0440 len 129984 free 61860 allocated 68056 min_free 20552
[Apr 10 12:56:50.736]     largest_free_block 61200 alloc_blocks 7 free_blocks 2 total_blocks 9
[Apr 10 12:56:50.736]   Totals:
[Apr 10 12:56:50.736]     free 63576 allocated 167348 min_free 20972 largest_free_block 61200

cpq · April 10, 2019, 11:23am

Something in your firmware code goes crazy allocating huge blocks of 30, 40, 60kb.
Obviously it fails at some point.

Looks like you’re trying to send a big chunk of data via the network connection.

Martin_Zdila · April 10, 2019, 1:45pm

cpq, thanks for answer.

We are using MQTT over PPPoS. We tried various configurations: send 4 KB messages (one per couple of seconds), but the same problem was with 1 KB messages (one cca every second) or 12 bytes (in this case cca 60 per second - this case was most memory unstable).

In our code there is no calloc or malloc (ok, there is one, but used only once on BT connection).

Also the memory doesn’t free if we stop sending messages.

We also use BLE for inter-device communication with smaller hack to work around https://github.com/mongoose-os-libs/bt-common/issues/12. But if MQTT is off then we don’t have problem with memory.

Still, the bug can be anywhere (in Mongoose OS, ESP-IDF or our code) and we’d like to use some means to analyze it. That’s why we are asking

cpq · April 10, 2019, 2:00pm

Thanks for the background.
You can try to run https://github.com/cesanta/mongoose-os/tree/master/tools/heaplog_viewer - if you manage to do it, the source of the problem should be instantly obvious.

Martin_Zdila · April 10, 2019, 3:00pm

The problem is that it looks like MGOS_ENABLE_HEAP_LOG and MGOS_ENABLE_CALL_TRACE has no effect. We see no hl{...} lines as they warn us.

We built it with

mos build --platform esp32 --local --verbose --clean --build-var MGOS_ENABLE_HEAP_LOG:1 --build-var MGOS_ENABLE_CALL_TRACE:1

Is it supported on ESP32?

nliviu · April 10, 2019, 3:31pm

It looks like it’s supported only for ESP8266.
grepping the makefiles in my mongoose-os local repo for them show results only in https://github.com/cesanta/mongoose-os/blob/7e2df71335b9407c0bc53778c54bdf5100b68642/fw/platforms/esp8266/Makefile.build#L45-L49

Martin_Zdila · April 10, 2019, 3:38pm

Now we test another device (but with the same HW configuration and same software) and symptoms are little different. We see New heap free LWM: .... only at the beginning and then only like once per couple of minutes. But at the end it crashes with corrupted heap: https://pastebin.com/EvhDDKe4

Martin_Zdila · April 10, 2019, 4:02pm

We’ve analysed core dump by base64 decoding DRAM data and we see that is is “full” of the data we send over MQTT. Now the question is who is putting it to heap (our code doesn’t seems to do it - it only uses local variable which should be on the stack) but more importantly why doesn’t that thing free it.

cpq · April 10, 2019, 6:08pm

mongoose send function spools data in a heap buffer.
Then the IO task drains it to the network.
Thus, if the network is not fast enough (or is down), the data gets spooled in a heap.
You can examine the size of the output buffer by looking at:

  struct mg_connection *c = mgos_mqtt_get_global_conn();
  c->send_mbuf; //  output dynamic buffer, "size" is the allocated size.
  c->recv_mbuf; //  input dynamic buffer

Martin_Zdila · April 10, 2019, 7:45pm

Then why doesn’t it empty itself if we stop publishing new messages?

cpq · April 10, 2019, 8:04pm

Not sure I understand your question.
Please familiarise yourself with the way mongoose library works, if you intend to use it.

Martin_Zdila · April 10, 2019, 8:26pm

Thanks @cpq you were right about filling up send_mbuf! Sometimes MQTT stops with publishing the messages and stores them in the buffer. After a minute or so MQTT connecton fails, reconnects and pending messages are sent which empties the buffer.

Now we need to find out why does it happen but it is a completly different story.

Thanks again!

klimbot · August 2, 2020, 7:55am

hey @Martin_Zdila did you ever resolve this, or find a good way to debug? I’m seeing a similar issue with ESP32 based project

dilbert168 · May 4, 2021, 6:52pm

Hi Guys,
Does this problem ever resolved ?? I have identical problem.

The MQTT server is just local PC with mosquitto setup (should be stable) as not even going to Internet
Sending data through UART0 to ESP8266 (2KB/s) and out to MQTT by mgos_mqtt_pub() with 1KB per call.
I add monitor of pending data by calling mgos_mqtt_num_unsent_bytes() and it has over 2000 byte once a while (total random)
Once the “unsent” number increase, I start seeing the following
disconnect/connect, something RESET

mgos_mongoose.c:66 New heap free LWM: 24944
…
mgos_mongoose.c:66 New heap free LWM: 22712
mgos_mqtt_conn.c:256 MQTT0 Disconnect
mgos_mqtt_conn.c:549 MQTT0 connecting after 1962 ms
mgos_mqtt_conn.c:471 MQTT0 connecting to 192.168.1.4:1883
mgos_mqtt_conn.c:229 MQTT0 TCP connect ok (0)
mgos_mqtt_conn.c:273 MQTT0 CONNACK 0

scaprile · May 5, 2021, 2:46pm

The MQTT implementation tries to queue the message and returns. Return code tells you if queuing was successful; you should check and log that.
If somehow for some reason it can’t send, it will keep queuing. You should sniff your network to see what is going on, or at a bare minimum check the Mosquitto logs for published messages.
Your serial might be doing something nasty. How are you handling buffers ?

dilbert168 · May 6, 2021, 2:22pm

when the int id=mgos_mqtt_pub() is called, the “id” return non-zero value
on UART, it use the receive call back
- mgos_uart_read()
- The data is stored in a static FIFO I created (not using dynamic buffer allocation)
A timer callback will check mgos_mqtt_num_unsent_bytes() and then pull data from FIFO and call mgos_mqtt_pub()