Questions about Core Dump Analysis - Crashes in WiFi and TCP/IP Tasks
Posted: Wed Jun 07, 2023 5:48 pm
Hello there,
I'm working on a project with many remote units in the field, and we've just set up the project to stream the coredump ELF to our cloud systems after a crash. I could use some help understanding the meaning of some of the coredumps we've received.
tiT task (InstFetchProhibitedCause):
I see that the fetched instruction in the program counter in 0x0. As well, the stack trace for the crashed thread appears to contain all non-instructions (all addresses are not in 0x3fxxxxxx - 0x6xxxxxxx range). What would cause the stack trace to be corrupted in this way for this task? Any suggestions on where to look next for this?
wifi task (StoreProhibitedCause):
The stack trace seems to end here at the panic handler, which seems strange since that totally masks what would have invoked the panic. I see that excvaddr contains 0x0 which in combination with the StoreProhibitedCause could mean that the application has attempted to dereference a NULL pointer. Is this still relevant given that the program seems to have exited in the panic handler and not at some other line of code?
The current ESP-IDF for the project is v4.4.1. I'm thinking to upgrade to v4.4.4 to possibly address these problems.
Please let me know if anyone can point me in the right direction here.
Thanks
I'm working on a project with many remote units in the field, and we've just set up the project to stream the coredump ELF to our cloud systems after a crash. I could use some help understanding the meaning of some of the coredumps we've received.
tiT task (InstFetchProhibitedCause):
I see that the fetched instruction in the program counter in 0x0. As well, the stack trace for the crashed thread appears to contain all non-instructions (all addresses are not in 0x3fxxxxxx - 0x6xxxxxxx range). What would cause the stack trace to be corrupted in this way for this task? Any suggestions on where to look next for this?
Code: Select all
===============================================================
==================== ESP32 CORE DUMP START ====================
Crashed task handle: 0x3ffdd3e4, name: 'tiT', GDB name: 'process 1073599460'
================== CURRENT THREAD REGISTERS ===================
exccause 0x14 (InstFetchProhibitedCause)
excvaddr 0x0
epc1 0x4011182b
epc2 0x0
epc3 0x0
epc4 0x0
epc5 0x0
epc6 0x0
eps2 0x0
eps3 0x0
eps4 0x0
eps5 0x0
eps6 0x0
pc 0x0 0x0
lbeg 0x40127d54 1074953556
lend 0x40127d6b 1074953579
lcount 0x0 0
sar 0x16 22
ps 0x60b20 396064
threadptr <unavailable>
br <unavailable>
scompare1 <unavailable>
acclo <unavailable>
acchi <unavailable>
m0 <unavailable>
m1 <unavailable>
m2 <unavailable>
m3 <unavailable>
expstate <unavailable>
f64r_lo <unavailable>
f64r_hi <unavailable>
f64s <unavailable>
fcr <unavailable>
fsr <unavailable>
a0 0x80171e8b -2145968501
a1 0x3ffdd260 1073599072
a2 0x3ffde580 1073603968
a3 0x3ffe49ac 1073629612
a4 0x0 0
a5 0x3ffdc344 1073595204
a6 0x1 1
a7 0x60023 393251
a8 0x801921dc -2145836580
a9 0x3ffdd220 1073599008
a10 0x0 0
a11 0x3ffe49ac 1073629612
a12 0x0 0
a13 0x0 0
a14 0x0 0
a15 0x0 0
==================== CURRENT THREAD STACK =====================
#0 0x00000000 in ?? ()
#1 0x00171e8b in ?? ()
#2 0x0011a182 in ?? ()
#3 0x00123906 in ?? ()
#4 0x00123a2f in ?? ()
#5 0x0011f738 in ?? ()
#6 0x0011f891 in ?? ()
#7 0x0011822f in ?? ()
wifi task (StoreProhibitedCause):
The stack trace seems to end here at the panic handler, which seems strange since that totally masks what would have invoked the panic. I see that excvaddr contains 0x0 which in combination with the StoreProhibitedCause could mean that the application has attempted to dereference a NULL pointer. Is this still relevant given that the program seems to have exited in the panic handler and not at some other line of code?
Code: Select all
===============================================================
==================== ESP32 CORE DUMP START ====================
Crashed task handle: 0x3ffdf310, name: 'wifi', GDB name: 'process 1073607440'
================== CURRENT THREAD REGISTERS ===================
exccause 0x1d (StoreProhibitedCause)
excvaddr 0x0
epc1 0x4011182b
epc2 0x0
epc3 0x0
epc4 0x0
epc5 0x0
epc6 0x0
eps2 0x0
eps3 0x0
eps4 0x0
eps5 0x0
eps6 0x0
pc 0x40081eed 0x40081eed <panic_abort+21>
lbeg 0x4000c349 1073791817
lend 0x4000c36b 1073791851
lcount 0x0 0
sar 0x10 16
ps 0x60021 393249
threadptr <unavailable>
br <unavailable>
scompare1 <unavailable>
acclo <unavailable>
acchi <unavailable>
m0 <unavailable>
m1 <unavailable>
m2 <unavailable>
m3 <unavailable>
expstate <unavailable>
f64r_lo <unavailable>
f64r_hi <unavailable>
f64s <unavailable>
fcr <unavailable>
fsr <unavailable>
a0 0x80092610 -2146884080
a1 0x3ffbea80 1073474176
a2 0x3ffbeac0 1073474240
a3 0x3ffbeaed 1073474285
a4 0xa 10
a5 0x3ffbead0 1073474256
a6 0x3 3
a7 0x60023 393251
a8 0x0 0
a9 0x1 1
a10 0x3ffbeb0d 1073474317
a11 0x3ffbeb0d 1073474317
a12 0x0 0
a13 0x3ffbeaa0 1073474208
a14 0x3ff60000 1073086464
a15 0xe7ffffff -402653185
==================== CURRENT THREAD STACK =====================
#0 0x40081eed in panic_abort (details=0x3ffbeac0 \"abort() was called at PC 0x40112150 on core 0\") at C:/Espressif/frameworks/esp-idf-v4.4.1/components/esp_system/panic.c:402
#1 0x40092610 in esp_system_abort (details=0x3ffbeac0 \"abort() was called at PC 0x40112150 on core 0\") at C:/Espressif/frameworks/esp-idf-v4.4.1/components/esp_system/esp_system.c:128
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
The current ESP-IDF for the project is v4.4.1. I'm thinking to upgrade to v4.4.4 to possibly address these problems.
Please let me know if anyone can point me in the right direction here.
Thanks