Timer issues on ESP32 based devices

I noticed some strange behaviour with mgos_set_timer on some newer ESP32 based chips, where older chips (same model number) don’t seem to have the same issue.

The issue is that on newer chips I see the timer is simply not being set, whereas on older chips there are no issues. This only seems to happen after having a few timers running concurrently.

Both chips while flashing report being ESP32D0WD-V3 R3, the flash output looks the exact same:

Connected, chip: ESP32D0WD-V3 R3
Running flasher @ 921600...
  Flasher is running
Flash size: 4194304, params: 0x022f (dio,32m,80m)
.
.
.

Trying to find out more about the issue I thought I would try the mgos_set_hw_timer and this is where it gets interesting - on the old chip it works fine, but on the new chip I get a core dump, with nothing I can make out in mos debug.

Would appreciate any thoughts or ideas to try and debug this to work out why we might be seeing this issue.

Seems as though whatever is happening at this point is breaking - will have to try and work out

IRAM void mgos_ints_disable(void) {
  ENTER_CRITICAL();
}

There is a slight difference between the CHIP ID versions, but can’t seem to find anything on it…

LATEST BROKEN BOARDS:
    Chip is ESP32-D0WD-V3 (revision v3.1)
    Features: WiFi, BT, Dual Core, 240MHz, VRef calibration in efuse, Coding Scheme None
    Crystal is 40MHz

OLDER WORKING BOARDS:
    Chip is ESP32-D0WD-V3 (revision v3.0)
    Features: WiFi, BT, Dual Core, 240MHz, VRef calibration in efuse, Coding Scheme None
    Crystal is 40MHz

My guess is that these new chips have a bug, that is probably reported somewhere and likely handled (worked around) by the newest IDF, think that mos does not include. So, your best bet is to find that, and work on it yourself, externally.

1 Like

I cannot reproduce this.

I’m running a ESP32-WROOM-32UE, identification MG4 (V3.1) and my firmware has about 40 timers + those running in libraries.

I dug into hardware timers about a year ago but never got them to work (crash, like you described).

Checked the errata and at least at this point there are no known bugs in V3.1 that don’t already exist in V3.

@OP I would like to look further into this, is there a way to contact you?

@inso2 thanks for the reply - I think you might be right, and my initial test was broken.

I’m now trying to flash my devices (v3.0 and v3.1) manually with the same firmware and they are both failing with hardware based timers.

I was stuffing around with Grok and trying to get it working, and was able to create hardware timers based on the esp-idf without using the mongoose-os wrapper, and it worked on both devices. This might be something to look into if you really need hardware timers?

My actual problem I was trying to solve was (should probably start a new thread) - sometimes on v3.1 hardware my software timers just drop off/die, but on v3.0 based hardware I’ve never seen this issue. My device will boot, run fine for a while, then all of a sudden my every-second-clock timer will just stop randomly.

@klimbot how long do your devices run before this happens? I have 3.1 devices with uptimes of half a year running in the field.

@inso2 If it doesn’t happen immediately it happens after a few days. The timer that seems to die most often is the clock timer (I have a screen and show seconds) so I can see when the clock stops updating that something has gone wrong. Nothing in the logs either which is annoying.