uxTaskGetSystemState leads to Guru Meditation Error

BramPeeters
Posts: 2
Joined: Sat Feb 24, 2024 3:27 am

uxTaskGetSystemState leads to Guru Meditation Error

Postby BramPeeters » Sat Feb 24, 2024 3:55 am

I am trying to get some basic framework code running on an ESP32 (code comes from an stm32 where it runs fine).

One of the things I do is periodically checking and reporting the task statuses using the following code

Code: Select all

static TaskStatus_t m_aTaskStatusArray[ SHM_MAX_NB_TASKS ] ;
static UBaseType_t  m_nNbTasks = 0;    


<...>

#if configGENERATE_RUN_TIME_STATS
    uint32_t l_nTotalRunTime;
#endif

	  /* Check the stacks */
            m_nNbTasks = uxTaskGetNumberOfTasks();
            if ( m_nNbTasks > SHM_MAX_NB_TASKS )
            {
                LOGGER_LOG_ERROR( LOG_CLASS, "%s: Too many tasks in the system: %d", __FUNCTION__, m_nNbTasks  );
                l_bStartRecovery = 1;
            }
            else
            {
                l_nResult = uxTaskGetSystemState( m_aTaskStatusArray, m_nNbTasks,
    #if configGENERATE_RUN_TIME_STATS
                                                 &l_nTotalRunTime
    #else
                                                    NULL
    #endif
                                                      );
           <...>
However this call seems to be buggy on ESP32 ? The first time the function is called it immediately runs into an error.
(note configUSE_TRACE_FACILITY is set but I have left configGENERATE_RUN_TIME_STATS disabled for now)
Initially I tried with IDF5.0.1 but I just upgraded to the latest IDF5.2 hoping it would be solved there but it has the same problem.

Guru Meditation Error: Core 0 panic'ed (LoadProhibited). Exception was unhandled.

Core 0 register dump:
PC : 0x400895ae PS : 0x00060933 A0 : 0x800882cb A1 : 0x3ffc10d0
0x400895ae: prvTaskCheckFreeStackSpace at C:/bram/projects/fridgeController_esp32/esp-idf/esp-idf-v5.2/components/freertos/FreeRTOS-Kernel/tasks.c:4728

A2 : 0x00000000 A3 : 0xffffffff A4 : 0x8008669e A5 : 0x00060923
A6 : 0x00000003 A7 : 0x0000cdcd A8 : 0x800e0442 A9 : 0x00000002
A10 : 0x00000000 A11 : 0x000000fd A12 : 0x00000000 A13 : 0x3f402f80
A14 : 0x0000001a A15 : 0x00000001 SAR : 0x00000000 EXCCAUSE: 0x0000001c
EXCVADDR: 0x800e0442 LBEG : 0x00000000 LEND : 0x00000000 LCOUNT : 0x00000000


Backtrace: 0x400895ab:0x3ffc10d0 0x400882c8:0x3ffc10f0 0x4008835e:0x3ffc1110 0x400883e5:0x3ffc1140 0x400d60cb:0x3ffc1160 0x400862f9:0x3ffc1190
0x400895ab: prvTaskCheckFreeStackSpace at C:/bram/projects/fridgeController_esp32/esp-idf/esp-idf-v5.2/components/freertos/FreeRTOS-Kernel/tasks.c:4730
0x400882c8: vTaskGetInfo at C:/bram/projects/fridgeController_esp32/esp-idf/esp-idf-v5.2/components/freertos/FreeRTOS-Kernel/tasks.c:4670
0x4008835e: prvListTasksWithinSingleList at C:/bram/projects/fridgeController_esp32/esp-idf/esp-idf-v5.2/components/freertos/FreeRTOS-Kernel/tasks.c:4707
0x400883e5: uxTaskGetSystemState at C:/bram/projects/fridgeController_esp32/esp-idf/esp-idf-v5.2/components/freertos/FreeRTOS-Kernel/tasks.c:2963
0x400d60cb: lclMonitorTask at C:/bram/projects/fridgeController_esp32/workspace/VanController/components/systemhealthmonitor/systemhealthmonitor.c:560
0x400862f9: vPortTaskWrapper at C:/bram/projects/fridgeController_esp32/esp-idf/esp-idf-v5.2/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:134

Note that the line numbers in task might be a bit off from the released version, I already tried adding some code to see what is going on (but I don't yet have a debugger working and printf's from inside that path seem to run into lock problems so i was not very successful so far).

Code: Select all

[4728]        while( *pucStackByte == ( uint8_t ) tskSTACK_FILL_BYTE )
[4729]        {
[4730]            pucStackByte -= portSTACK_GROWTH;
[4731]            ulCount++;
[4732]        }
So pucStackByte containts an incorrect address, somehow.


My startup log/cpu info (it is an esp-wroom-32):
←[0;33m--- esp-idf-monitor 1.4.0 on \\.\COM6 115200 ---←[0m
--- Quit: Ctrl+] | Menu: Ctrl+T | Help: Ctrl+T followed by Ctrl+H ---
x1 (POWERON�esp32: SPI Mode : DIO
I (47) boot.esp32: SPI Flash Size : 2MB
I (51) boot: Enabling RNG early entropy source...
I (57) boot: Partition Table:
I (60) boot: ## Label Usage Type ST Offset Length
I (68) boot: 0 nvs WiFi data 01 02 00009000 00006000
I (75) boot: 1 phy_init RF data 01 01 0000f000 00001000
I (83) boot: 2 factory factory app 00 00 00010000 00100000
I (90) boot: End of partition table
I (94) esp_image: segment 0: paddr=00010020 vaddr=3f400020 size=09fdch ( 40924) map
I (117) esp_image: segment 1: paddr=0001a004 vaddr=3ffb0000 size=0224ch ( 8780) load
I (120) esp_image: segment 2: paddr=0001c258 vaddr=40080000 size=03dc0h ( 15808) load
I (129) esp_image: segment 3: paddr=00020020 vaddr=400d0020 size=17a14h ( 96788) map
I (164) esp_image: segment 4: paddr=00037a3c vaddr=40083dc0 size=09468h ( 37992) load
I (186) boot: Loaded app from partition at offset 0x10000
I (186) boot: Disabling RNG early entropy source...
I (198) cpu_start: Multicore app
I (206) cpu_start: Pro cpu start user code
I (207) cpu_start: cpu freq: 160000000 Hz
I (207) cpu_start: Application information:
I (210) cpu_start: Project name: app-template
I (215) cpu_start: App version: 1
I (219) cpu_start: Compile time: Feb 24 2024 03:56:36
I (225) cpu_start: ELF file SHA256: f9e023c03...
I (231) cpu_start: ESP-IDF: v5.2-dirty
I (236) cpu_start: Min chip rev: v0.0
I (241) cpu_start: Max chip rev: v3.99
I (246) cpu_start: Chip rev: v3.0
I (251) heap_init: Initializing. RAM available for dynamic allocation:
I (258) heap_init: At 3FFAE6E0 len 00001920 (6 KiB): DRAM
I (264) heap_init: At 3FFBCE30 len 000231D0 (140 KiB): DRAM
I (270) heap_init: At 3FFE0440 len 00003AE0 (14 KiB): D/IRAM
I (276) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
I (283) heap_init: At 4008D228 len 00012DD8 (75 KiB): IRAM
I (290) spi_flash: detected chip: generic
I (293) spi_flash: flash io: dio
W (297) spi_flash: Detected size(4096k) larger than the size in the binary image header(2048k). Using the size in the binary image header.
I (311) main_task: Started on CPU0
I (321) main_task: Calling app_main()
This is esp32 chip with 2 CPU core(s), model ESP32, WiFi/BT/BLE, silicon revision v3.0, 2MB external flash

ESP_Dazz
Posts: 308
Joined: Fri Jun 02, 2017 6:50 am

Re: uxTaskGetSystemState leads to Guru Meditation Error

Postby ESP_Dazz » Sat Feb 24, 2024 4:46 am

Smells like a stack overflow. Could you enable the `CONFIG_ESP_SYSTEM_PANIC_GDBSTUB` via menuconfig. This will automatically launch GDB on a crash. Then from the point of crash, please navigate up the call stack and print out the value of `pxTCB->pxStack` to see if it's a sane value. This should be a pointer to the start of the task's stack (which is data memory), and all data memory on the ESP32 should be an address starting of `0x3....`, so something like `0x3XXXXXXX`.

But from the register dump, it looks like `pxStack` contains some address like `0x800e0442`.

BramPeeters
Posts: 2
Joined: Sat Feb 24, 2024 3:27 am

Re: uxTaskGetSystemState leads to Guru Meditation Error

Postby BramPeeters » Sun Feb 25, 2024 5:12 am

Thanks for the feedback

My initial thought was stack overflow too but I already doubled the stack sizes of the tasks that run and the problem remains the same.
( Ironically enough I am using this function to monitor my stacks :) )

It indeed contains that value 0x800e0442, but if i look at the task name it contains semi random hex values too.... the entire tcb seems invalid.

I tried looking where it goes wrong and noticed that uxCurrentNumberOfTasks is 14, which i don't understand.
I create 7 tasks in my code, and there is one created in the idf framework that calls app_main and there is the idle task so that is 9 tasks I know about ... still 5 unaccounted for / too many ? (unless the idf framework creates more than one task ? )

I could examine the functions on the stack trace after using the CONFIG_ESP_SYSTEM_PANIC_GDBSTUB option, but I could not get anything running with a breakpoints set on uxCurrentNumberOfTasks++ in prvAddNewTaskToReadyList
Eg clicking 'Restart a process or debug target without terminating and re-launching' did not do much, except that the button became grayed out after pressing it once. ( I am using eclipse with the esp-idf plugin)

I also tried enabling CONFIG_ESP_SYSTEM_GDBSTUB_RUNTIME in menuconfig to get debugging working before it crashes but that does not seem to work at all. He runs into an internal error when parsing my elf file (cfr attachment), and if i try to ignore that I don't hit any of the breakpoints I have set, and the relaunch button still does not work.

Maybe I should give the VSCode approach a try, sadly eclipse has been nothing but a struggle so far every step of the way.
Attachments
internalerror.jpg
internalerror.jpg (47.36 KiB) Viewed 1778 times

haircuts4men
Posts: 3
Joined: Fri Jan 12, 2024 2:39 pm

Re: uxTaskGetSystemState leads to Guru Meditation Error

Postby haircuts4men » Wed Jul 10, 2024 9:23 pm

I am facing an eerily similar issue, where a task that looks something like this (pseudocode):

Code: Select all

while (true) {
  log_stack(); // this gathers the task name and stack watermark, which eventually calls into `prvTaskCheckFreeStackSpace`, causing the same exact panic
  auto event = receive_event_from_freertos_queue();
  process_event(event);
  send_reply_to_another_queue_in_another_task();
}
repeteadly pumping events into this task (something like 30-40 iterations) eventually causes the TCB data to become corrupted, producing a stacktrace *very* similar to the one shown in this post. here's a screenshot:

Image

there are no heap allocations being performed anywhere and there is plenty of stack space left on all task in the system before the corruption happens.

the two tasks that interact with each other (through freertos queues) are statically allocated, pinned to core 0 (with xTaskCreateStaticPinnedToCore) and have plenty of stack space left.

the two event queues in question are not dynamically allocated, neither are the elements being passed around between them.

here's what log_stack() show above prints before and after the corruption (fomatted like printf("[%s] -> %u", pcTaskName, usStackHighWaterMark):

Image
Image

haircuts4men
Posts: 3
Joined: Fri Jan 12, 2024 2:39 pm

Re: uxTaskGetSystemState leads to Guru Meditation Error

Postby haircuts4men » Thu Jul 11, 2024 1:47 pm

Figured out the issue. The problem was that because all task's entire stack space and "personal information" were layed out sequentially in memory, in a big 45KB block of statically-allocated ram, an out-of-bounds array access on another task (which sat in memory right before the task that panicked) poked into the panicked task's memory region, causing the corruption.

Who is online

Users browsing this forum: Majestic-12 [Bot], Teh2024 and 119 guests