gettimeofday() causing reboots?

Justin2020
Posts: 11
Joined: Sun Aug 23, 2020 12:20 am

gettimeofday() causing reboots?

Postby Justin2020 » Thu Sep 03, 2020 8:56 pm

I moved to the master branch two months ago and everything was smooth as butter. A few days ago I updated to the latest version and now I'm having problems with what appears to be gettimeofday(). I won't go through everything I tried to pinpoint the source of the problem and I'm not even sure I found it, but I have had many clues pointing to that function. I should also point that it is happening only when the OTA process is ongoing, but then again, not much is happening the rest of the time.

I'm using the Exception Decoder for the Arduino IDE and every time I checked the backtrace, the gettimeofday function was there. I thought it was just part of the abort process so I wasn't paying much attention. But the backtrace pointed at the MQTT client component when it was calling platform_tick_get_ms() (line 1285 of mqtt_client.c) which in turn calls gettimeofday(), the httpd server and a few others also had a call to that function when crashing. Every time I was disabling something, it would eventually happen with something else. Then the decoder pointed at a ESP_LOGI line so that couldn't possibly be bugged. I had the new timestamp feature to print system time instead of milliseconds turned on, so I turned that feature off. After many more OTA updates than anyone would consider enough, it eventually crashed again. After an OTA update I set a timer for the next automatic update check and this time the decoder pointed at that function and the line in question was where I call time() which calls gettimeofday().

I removed every function I had with IRAM_ATTR, I changed all the malloc to heap_caps_malloc() with the MALLOC_CAP_DMA, I tried everything I could think of. I'm out of ideas. The 64 bits timer option is not enabled in sdkconfig.

I checked the esp-idf source tree and saw that newlib, and especially time.c, has been updated between the time I first installed the master branch and the update from a few days ago. Was there something changed that could be causing this?

This is the last backtrace I got, I can provide more if needed. :)

0x40082e59: _lock_acquire_recursive at D:/Work/Electronic/Espressif/esp-idf/components/newlib/locks.c line 170
0x40167531: _vfiprintf_r at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/newlib/newlib/libc/stdio/vfprintf.c line 853
0x4015f9b1: fiprintf at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/newlib/newlib/libc/stdio/fiprintf.c line 48
0x4015f925: __assert_func at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/newlib/newlib/libc/stdlib/assert.c line 58
0x400d80f9: esp_time_impl_get_boot_time at D:/Work/Electronic/Espressif/esp-idf/components/soc/include/soc/spinlock.h line 145
0x400d7ae7: adjust_boot_time at D:/Work/Electronic/Espressif/esp-idf/components/newlib/time.c line 63
0x40082f61: _gettimeofday_r at D:/Work/Electronic/Espressif/esp-idf/components/newlib/time.c line 104
0x40082ff3: _gettimeofday_r at D:/Work/Electronic/Espressif/esp-idf/components/newlib/time.c line 177
0x400dc0ce: hub_ota_set_check_timer at ../main/hub_ota.c line 341
0x400dc5fe: hub_ota_task at ../main/hub_ota.c line 208

This is the lines before and after line 341 in hub_ota.c:

time_t now = 0;
struct tm timeinfo = { 0 };
time(&now); // <-- Line 341
localtime_r(&now, &timeinfo);

And this is the for the line 208:

esp_wifi_set_ps(WIFI_PS_MIN_MODEM);
hub_ota_set_check_timer(); // <-- Line 208. It's the end of the whole OTA process.
xSemaphoreGive(xSemOTA);
vTaskDelete(NULL);

Justin2020
Posts: 11
Joined: Sun Aug 23, 2020 12:20 am

Re: gettimeofday() causing reboots?

Postby Justin2020 » Thu Sep 03, 2020 9:42 pm

This is a backtrace of the mqtt client crashing when I create the OTA task. I don't know if it's related to the OTA task itself (or the task api (doubt it)) but this is how I'm creating it:

The task:
xTaskCreate(&hub_ota_task, "OTA Task", 8192, &theArg, 7, NULL); // theArg is a global variable.

Decoding stack results
0x40087f83: panic_abort at D:/Work/Electronic/Espressif/esp-idf/components/esp_system/panic.c line 349
0x400886e1: esp_system_abort at D:/Work/Electronic/Espressif/esp-idf/components/esp_system/system_api.c line 104
0x4008f3f6: abort at D:/Work/Electronic/Espressif/esp-idf/components/newlib/abort.c line 46
0x40082da6: lock_acquire_generic at D:/Work/Electronic/Espressif/esp-idf/components/newlib/locks.c line 142
0x40082ec9: _lock_acquire_recursive at D:/Work/Electronic/Espressif/esp-idf/components/newlib/locks.c line 170
0x4016cf31: _vfiprintf_r at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/newlib/newlib/libc/stdio/vfprintf.c line 853
0x401653b1: fiprintf at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/newlib/newlib/libc/stdio/fiprintf.c line 48
0x40165325: __assert_func at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/newlib/newlib/libc/stdlib/assert.c line 58
0x400d7ed2: get_adjusted_boot_time at D:/Work/Electronic/Espressif/esp-idf/components/soc/include/soc/spinlock.h line 145
0x40082f50: _gettimeofday_r at D:/Work/Electronic/Espressif/esp-idf/components/newlib/time.c line 178
0x4016c921: gettimeofday at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/newlib/newlib/libc/syscalls/sysgettod.c line 11
0x400dff4a: platform_tick_get_ms at D:/Work/Electronic/Espressif/esp-idf/components/mqtt/esp-mqtt/lib/platform_esp32_idf.c line 30
0x400de4ed: esp_mqtt_task at D:/Work/Electronic/Espressif/esp-idf/components/mqtt/esp-mqtt/mqtt_client.c line 1285
0x400886e9: vPortTaskWrapper at D:/Work/Electronic/Espressif/esp-idf/components/freertos/xtensa/port.c line 169

Justin2020
Posts: 11
Joined: Sun Aug 23, 2020 12:20 am

Re: gettimeofday() causing reboots?

Postby Justin2020 » Thu Sep 03, 2020 11:27 pm

Last update since it's solved, sort of. I'd like some feedback about whether I did it right or wrong with git. I managed to get a list of commits using 'git log --oneline > git.log, and in the log there's this near the top:

<snip>
5785e4dfb newlib: move some functions to soc, esp32, esp32s2
a395a00d2 newlib: separate low-level code in time.c implementation
cda9c595d Merge branch 'feature/mcpwm_capture_on_both_edge' into 'master'
</snip>

Looking at the repository you can see at 'https://github.com/espressif/esp-idf/tr ... nts/newlib' that 'separate low-level code in time.c implementation' line. So I grabbed the hash right after that and did:

git revert --hard cda9c595d

I cleaned my project and recompiled it. I spammed the crap out of the OTA update and after at least a hundred attempts with no update (same version) and at least twenty full updates (same firmware with diff version), it didn't crash one single time. NOT ONE! I even moved all my 'critical' functions back in IRAM. So now I'm pretty convinced there's something wrong with that update to newlib. Or it's impacting something somewhere beyond my ability to find it. Unless someone tells me otherwise, I'm not going to update the tree until I see something related to time.c, whether it points to a bug in my code or in Espressif's.

----
Normally at this point I hit Submit and on the very next OTA attempt it would crash so I preemptively did another twenty or so checks and still no crash.

axellin
Posts: 200
Joined: Mon Sep 17, 2018 9:09 am

Re: gettimeofday() causing reboots?

Postby axellin » Fri Sep 04, 2020 12:47 am


Justin2020
Posts: 11
Joined: Sun Aug 23, 2020 12:20 am

Re: gettimeofday() causing reboots?

Postby Justin2020 » Fri Sep 04, 2020 1:19 am

Wow.. 18 hours ago. What are the odds. Thanks for pointing this. I didn't think of checking the issues, or even posting one. I'm not sure I would have found it since it doesn't seem to be related to mqtt. I'll check that fix and see if I can merge it myself.

Justin2020
Posts: 11
Joined: Sun Aug 23, 2020 12:20 am

Re: gettimeofday() causing reboots?

Postby Justin2020 » Fri Sep 04, 2020 1:53 am

That git command I used above blew in my face so I had to clone the repository again and after a scary first ota update that took twice as long as usual and stalled a few times, all the other updates and the version check spam all went without a hitch. Not a single crash.

Thanks again for pointing to the solution. Now it's really solved.

Who is online

Users browsing this forum: jcolebaker and 135 guests