Random boot failure depending on .flash.text size

sgiasm
Posts: 4
Joined: Mon Jul 01, 2024 11:25 am

Random boot failure depending on .flash.text size

Postby sgiasm » Mon Dec 09, 2024 3:58 pm

We've encountered a strange problem: completely randomly, after change to the source code (the actual change is irrelevant), firmware becomes "damaged" or "incorrect": it starts rebooting due to LoadProhibited or IllegalInstruction somewhere around esp_psram_init (usually it's movi.n a3, 0), so this is even before our code.
Rebuilding doesn't help, it reproduces on different devices (so, for example if someone pushes the code in crashing state - all developers, even with different host OS, will have bootloop on their devices)
Instructions in disassembly look completely correct and are the same in normal and bootlooping firmwares.
After investigation we found out that the size of the .flash.text block of the firmware matters. So, to fix this bootloop usually is enough to add a string to the source code, for example LOGI("crashfix", "crashfix"); line. This .flash.text block lies before instructions, so it moves them.

Here is some statistics of movi.n a3, 0 address:
0x4200b589, 0x4200b669, 0x4200b689 - LoadProhibited crash
0x4200b665, 0x4200b591 - works OK

Currently it's just an annoying ritual - to add/remove log line if firmware crashes, but last time it became worse – simple one-line log didn't fix it and only removing a bunch of logs helped. I will post here more info, once it reproduces again. And we haven't encountered this problem on ESP-IDF 4.*.* but that may be also due to the small firmware size at that moment.

So, currently I don't know if it's something wrong with our setup (we are not doing anything fancy), or toolchain or hardware bug. Any ideas and information are really appreciated.


Example of disassembly:
  1. ...
  2. Disassembly of section .flash.text:
  3.  
  4. 42000020 <_stext>:
  5. 42000020:   3fca1a24    
  6. 42000024:   3c0e0120
  7.  
  8. ...
  9.  
  10.     s_psram_ctx.is_initialised = true;
  11. 4200b661:   d4e831          l32r    a3, 42000a04 <_stext+0x9e4> (3fca22c4 <s_psram_ctx>)
  12. 4200b664:   180c        movi.n  a8, 1
  13. 4200b666:   004382          s8i a8, a3, 0
  14.     uint32_t psram_physical_size = 0;
  15. 4200b669:   030c        movi.n  a3, 0           <--------------- PC
  16. 4200b66b:   0139        s32i.n  a3, a1, 0
Example of logs:

Code: Select all

...
I (490) cpu_start: Multicore app
I (490) octal_psram: vendor id    : 0x0d (AP)
I (490) octal_psram: dev id       : 0x02 (generation 3)
I (493) octal_psram: density      : 0x03 (64 Mbit)
I (499) octal_psram: good-die     : 0x01 (Pass)
I (504) octal_psram: Latency      : 0x01 (Fixed)
I (509) octal_psram: VCC          : 0x01 (3V)
I (514) octal_psram: SRF          : 0x01 (Fast Refresh)
I (520) octal_psram: BurstType    : 0x01 (Hybrid Wrap)
I (526) octal_psram: BurstLen     : 0x01 (32 Byte)
I (531) octal_psram: Readlatency  : 0x02 (10 cycles@Fixed)
I (538) octal_psram: DriveStrength: 0x00 (1/1)
Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.

Core  0 register dump:
0x4200b669: esp_psram_init at C:/Users/.../esp/v5.1.4/esp-idf/components/esp_psram/esp_psram.c:135
PC      : 0x4200b669  PS      : 0x00060f30  A0      : 0x80376109  A1      : 0x3fceb2a0
A2      : 0x00000000  A3      : 0x3fca22c4  A4      : 0x00000001  A5      : 0x50000000
A6      : 0x00000000  A7      : 0x00000000  A8      : 0x00000001  A9      : 0x3fceb280
A10     : 0x00000000  A11     : 0x00000000  A12     : 0x00000028  A13     : 0x3fca8294
A14     : 0x00000000  A15     : 0x00000000  SAR     : 0x00000004  EXCCAUSE: 0x0000001c
EXCVADDR: 0x80376441  LBEG    : 0x40056f08  LEND    : 0x40056f12  LCOUNT  : 0x00000000
0x40056f08: __memcpy_aux in ROM
0x40056f12: __memcpy_aux in ROM

Backtrace: 0x4200b666:0x3fceb2a0 0x40376106:0x3fceb2e0 0x403cd9b4:0x3fceb340 0x403cddf9:0x3fceb380 0x403c9971:0x3fceb4b0 0x40045c01:0x3fceb570 0x40043ab6:0x3fceb6f0 0x40034c45:0x3fceb710
0x4200b666: esp_psram_init at C:/Users/.../esp/v5.1.4/esp-idf/components/esp_psram/esp_psram.c:133
0x40376106: call_start_cpu0 at C:/Users/.../esp/v5.1.4/esp-idf/components/esp_system/port/cpu_start.c:473
0x40045c01: ets_run_flash_bootloader in ROM
0x40043ab6: main in ROM
0x40034c45: .stack_ok in ROM

Chip: ESP32-S3, rev. 0.2
Board: ESP32-S3-WROOM-1
RAM: 8MB, Octal
Flash: 16MB
ESP-IDF: 5.1.4, 5.3.1
Framework: ESP-IDF
Language: C++
Host OS: Linux, Windows

Code: Select all

Total sizes:
Used static IRAM:   99994 bytes ( 262246 remain, 27.6% used)
      .text size:   98967 bytes
   .vectors size:    1027 bytes
Used stat D/IRAM:   62772 bytes ( 283084 remain, 18.1% used)
      .data size:   21276 bytes
      .bss  size:   41496 bytes
Used Flash size : 1147203 bytes
           .text:  889375 bytes
         .rodata:  257572 bytes
Total image size: 1268473 bytes (.bin may be padded larger)

ESP_Sprite
Posts: 9883
Joined: Thu Nov 26, 2015 4:08 am

Re: Random boot failure depending on .flash.text size

Postby ESP_Sprite » Tue Dec 10, 2024 4:12 am

That is odd. What hardware are you trying this on?

sgiasm
Posts: 4
Joined: Mon Jul 01, 2024 11:25 am

Re: Random boot failure depending on .flash.text size

Postby sgiasm » Thu Dec 12, 2024 9:43 am

ESP_Sprite wrote:That is odd. What hardware are you trying this on?

Code: Select all

ESP32-S3-WROOM-1
N16R8 board
FCCID: 2AC7Z-ESPS3WROOM1
IC: 21086-ESPS3WROOM1
CMIITID: 2022DP2892
ESP32-S3, rev. 0.2
and

Code: Select all

ESP32-S3-WROOM-1
M0N8R8 board
IC: 21098-ESPS3WROOM1
CMIITID: 2022DP2892
ESP32-S3, rev. 0.1

ESP_Sprite
Posts: 9883
Joined: Thu Nov 26, 2015 4:08 am

Re: Random boot failure depending on .flash.text size

Postby ESP_Sprite » Fri Dec 13, 2024 2:51 am

That is on a custom board, I assume? Anything connected to GPIO35, 36, 37 or 45?

sgiasm
Posts: 4
Joined: Mon Jul 01, 2024 11:25 am

Re: Random boot failure depending on .flash.text size

Postby sgiasm » Fri Dec 13, 2024 9:11 am

Developers are working on ESP32-S3-WROOM-1 developer boards (like this https://www.amazon.de/-/en/gp/product/B0BMQ8F7FN). We have Waveshare E-Ink display connected to 38, 39, 40, 15, 16, 18, all other pins are free.
We have custom boards, but I didn't check if crash reproduces on them (at current stage they are only remotely available, so we don't use them often). Will check next time it appears.
If this can be pins-related, next time I will also disconnect everything.

User avatar
TheRealRevK
Posts: 10
Joined: Mon Aug 26, 2019 7:32 pm

Re: Random boot failure depending on .flash.text size

Postby TheRealRevK » Thu Dec 19, 2024 9:21 am

I have started seeing the same. It went away (probably because I was adding log lines).

I can confirm, just adding a log line fixed it. i.e. many builds, fullclean, checking sdkconfig, etc, no help, add one log line, and fine.

Code: Select all

SPIWP:0xee
mode:DIO, clock div:1
load:0x3fce2810,len:0xdc
load:0x403c8700,len:0x4
load:0x403c8704,len:0xa08
load:0x403cb700,len:0x2464
entry 0x403c8854
Guru Meditation Error: Core  0 panic'ed (IllegalInstruction). Exception was unhandled.
Memory dump at 0x42006bd8: 0000dfb1 00106403 00a0b2cc
--- 0x42006bd8: esp_chip_info at /Users/adrian/esp/5.3.1/components/esp_hw_support/port/esp32s3/chip_info.c:13

Core  0 register dump:
PC      : 0x42006bdc  PS      : 0x00060d30  A0      : 0x803cc521  A1      : 0x3fceb2e0  
--- 0x42006bdc: esp_chip_info at /Users/adrian/esp/5.3.1/components/esp_hw_support/port/esp32s3/chip_info.c:13

A2      : 0x00000001  A3      : 0x3c0c0020  A4      : 0x3fc9d347  A5      : 0x00000004  
A6      : 0x0000000c  A7      : 0x0000000c  A8      : 0x8037610f  A9      : 0x3fceb2c0  
A10     : 0x3fceb2e0  A11     : 0x00000009  A12     : 0x00100000  A13     : 0xfc000000  
A14     : 0x0000003f  A15     : 0x0000000c  SAR     : 0x00000006  EXCCAUSE: 0x00000000  
EXCVADDR: 0x00000000  LBEG    : 0x40056f5c  LEND    : 0x40056f72  LCOUNT  : 0x00000000  
--- 0x40056f5c: memcpy in ROM
0x40056f72: memcpy in ROM



Backtrace: 0x42006bd9:0x3fceb2e0 0x403cc51e:0x3fceb340 0x403cc75a:0x3fceb380 0x403c88a9:0x3fceb4b0 0x40045c01:0x3fceb570 0x40043ab6:0x3fceb6f0 0x40034c45:0x3fceb710
--- 0x42006bd9: esp_chip_info at /Users/adrian/esp/5.3.1/components/esp_hw_support/port/esp32s3/chip_info.c:13
0x40045c01: ets_run_flash_bootloader in ROM
0x40043ab6: main in ROM
0x40034c45: .stack_ok in ROM

sgiasm
Posts: 4
Joined: Mon Jul 01, 2024 11:25 am

Re: Random boot failure depending on .flash.text size

Postby sgiasm » Wed Jan 15, 2025 7:00 pm

Crash happened again.
I disconnected display, it didn't help. So it's just a dev-board with nothing connected.
I pushed my code and used it remotely on custom board and it crashed also. But! The firmware on custom board is different (due to some flags). And it crashed in different line, but still crash/no-crash state is shared across all our devices/hosts.

Here is brief statistics of current crash on two devices (S3 dev-board on Windows and custom S3 on MacOS, both on ESP-IDF 5.3.1):
Developer board:

Code: Select all

Reason           | IllegalInstruction
Flash .text size | 856090
PC               | 0x420080fc
PC asm           | entry a1, 32
C++ line         | void esp_chip_info(esp_chip_info_t *out_info)
Custom board:

Code: Select all

Reason           | StoreProhibited
Flash .text size | 950330
PC               | 0x42008be2
PC asm           | s32i.n a8, a2, 4
C++ line         | out_info->features = CHIP_FEATURE_WIFI_BGN | CHIP_FEATURE_BLE;
Here is statistics I've got so far:

Code: Select all

+--------------+------------+------------------+----------------------------------------------------------------+--------------------+--------------------+---------+
|              | PC         | PC asm           | PC C++                                                         | Reason             |   Flash .text size | ESP-IDF |
+--------------+------------+------------------+----------------------------------------------------------------+--------------------+--------------------+---------+
| crash        | 0x4200b5d2 | callx8 a8        | ret = esp_psram_impl_get_physical_size(&psram_physical_size);  | LoadProhibited     |             889023 | 5.1.4   |
| crash        | 0x4200b669 | movi.n a3, 0     | uint32_t psram_physical_size = 0;                              | LoadProhibited     |             889375 | 5.1.4   |
| crash        | 0x4200b689 | movi.n a3, 0     | uint32_t psram_physical_size = 0;                              | LoadProhibited     |             889883 | 5.1.4   |
| OK           | 0x4200b665 | movi.n a3, 0     | uint32_t psram_physical_size = 0;                              | OK                 |             889363 | 5.1.4   |
| OK           | 0x4200b591 | movi.n a3, 0     | uint32_t psram_physical_size = 0;                              | OK                 |                nan | 5.1.4   |
| crash        | 0x420080fc | entry a1, 32     | void esp_chip_info(esp_chip_info_t *out_info)                  | IllegalInstruction |             856090 | 5.3.1   |
| crash        | 0x42008be2 | s32i.n a8, a2, 4 | out_info->features = CHIP_FEATURE_WIFI_BGN | CHIP_FEATURE_BLE; | StoreProhibited    |             950330 | 5.3.1   |
| crash        | 0x4200813c | entry a1, 32     | void esp_chip_info(esp_chip_info_t *out_info)                  | IllegalInstruction |             857154 | 5.3.1   |
| OK           | 0x42008138 | entry a1, 32     | void esp_chip_info(esp_chip_info_t *out_info)                  | OK                 |             857110 | 5.3.1   |
+--------------+------------+------------------+----------------------------------------------------------------+--------------------+--------------------+---------+

chegewara
Posts: 2423
Joined: Wed Jun 14, 2017 9:00 pm

Re: Random boot failure depending on .flash.text size

Postby chegewara » Thu Jan 16, 2025 2:02 am

I have the same or similar issue from time to time.
When adding code even before that code have chance to start there is boot loop, but adding some extra log code is solving the issue.
It is very annoying, but i didnt try to debug or report it.

Who is online

Users browsing this forum: No registered users and 9 guests