How to debug an "just after bootloader" hang?
Posted: Thu Sep 03, 2020 10:00 pm
Hi,
I have a very unusual scenario of a device which has been sent to a client, was working flawlessly for several months, and then just went dead out of the blue. It's stuck just after the bootloader - when I connect to the serial port, I see this:
There are 9 seconds of delay between the two resets, and indeed I've configured my bootloader to have 9000 ms timeout for RTC watchdog.
So it seems it the main program is loaded and hangs. I have many devices like the bricked one, on the working ones the bootup messages are exactly the same, but the main program runs.
My question is: is it possible to get more debug info? What does it try to do after the "entry 0x4008064c" line? What region of the flash does it execute?
System info:
- ESP IDF v3.2
- Custom ESP PCB, but it has worked solidly on 100s of devices over 2 years
- Flash is encrypted. The bootloader, factory app and OTA_1/OTA_2 seem well (ciphered gibberish, entropy near 1)
- Same devices as the one mentioned in Device bricked ("csum err") after two months of service thread, but these are the new revision, the stable one. Hence I'm quite curious what went wrong this time.
- It's just one device. We'll be fine to throw it out if it's undebuggable. Pursuing mostly out of curiosity and for possible reliability improvement.
I have a very unusual scenario of a device which has been sent to a client, was working flawlessly for several months, and then just went dead out of the blue. It's stuck just after the bootloader - when I connect to the serial port, I see this:
Code: Select all
ets Jun 8 2016 00:22:57
rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0018,len:4
load:0x3fff001c,len:152
load:0x40078000,len:10012
load:0x40080400,len:5984
entry 0x4008064c
ets Jun 8 2016 00:22:57
rst:0x10 (RTCWDT_RTC_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0018,len:4
load:0x3fff001c,len:152
load:0x40078000,len:10012
load:0x40080400,len:5984
entry 0x4008064c
So it seems it the main program is loaded and hangs. I have many devices like the bricked one, on the working ones the bootup messages are exactly the same, but the main program runs.
My question is: is it possible to get more debug info? What does it try to do after the "entry 0x4008064c" line? What region of the flash does it execute?
System info:
- ESP IDF v3.2
- Custom ESP PCB, but it has worked solidly on 100s of devices over 2 years
- Flash is encrypted. The bootloader, factory app and OTA_1/OTA_2 seem well (ciphered gibberish, entropy near 1)
- Same devices as the one mentioned in Device bricked ("csum err") after two months of service thread, but these are the new revision, the stable one. Hence I'm quite curious what went wrong this time.
- It's just one device. We'll be fine to throw it out if it's undebuggable. Pursuing mostly out of curiosity and for possible reliability improvement.