Device bricked ("csum err") after two months of service
Re: Device bricked ("csum err") after two months of service
Hmm yes it has to make it through initial programming, self encryption, and months of operation so that is unusual. I guess there could be some kind of chip defect where if you are erasing a certain block address it could trigger an unintended and out of spec erase in your problem block. Can you correlate it to flash erase events like ota, nvs, or filesystem use?
-
- Posts: 45
- Joined: Sun Jan 06, 2019 12:42 pm
Re: Device bricked ("csum err") after two months of service
@WiFive, no, I cannot correlate it with any write operations. I don't use the NVS or the filesystem. OTA is used regularly, but the failure is not during the OTA, e.g. to give an example
- Device X works happily all April
- An OTA update is applied on May 3rd
- The device continues to work happily, reporting the new version
- Then it suddenly gets bricked on May 8th
- Device X works happily all April
- An OTA update is applied on May 3rd
- The device continues to work happily, reporting the new version
- Then it suddenly gets bricked on May 8th
Re: Device bricked ("csum err") after two months of service
I assume you are using 3.3v flash since your boot mode is 0x13. There was an issue with undervoltage on 1.8v flash in the past.
Are you trying to reproduce the failure with a torture test (reboots and ota updates, high ambient temperature)?
Are you trying to reproduce the failure with a torture test (reboots and ota updates, high ambient temperature)?
Re: Device bricked ("csum err") after two months of service
I don't think so, because the FLASH_CRYPT_CNT efuse controls both the bootloader detecting that the device needs encrypting, and that the cache will load the bootloader from encrypted flash. And it doesn't explain the "different bytes read each time" symptom or the low entropy in your broken flash chip..PanicanWhyasker wrote: ↑Fri May 10, 2019 6:32 am@ESP_Angus, interesting. So in this case is it possible that (due to signal integrity issues), when the ESP reboots in the field, it tries to reencrypt itself (or for the device which isn't encrypted - to try to initiate self-encryption)?
WiFive makes a good point about the flash chip possibly accidentally strapping to 1.8V sometimes. Is MTDI pin (GPIO12) always disconnected or driven low on reset?
-
- Posts: 45
- Joined: Sun Jan 06, 2019 12:42 pm
Re: Device bricked ("csum err") after two months of service
Yes, pin 12 is pulled down with 360k to GND. Nothing else drives this pin (it's just the ESP pin, the 360k, and a MOSFET gate). Basically this pin is always logic 0, the only time it's driven high is during lab testing.
Re: Device bricked ("csum err") after two months of service
Could it be that 360k isnt aggressive enough of a pull down?
One thing worth trying might be disabling the bootstrapping pins and forcing the flash voltage through the efuse, and then see if you get a failure? I dont know how long you have to run these before they typically fail.
You mentioned that these run near an internal combustion engine, can you shed any more details about power supply, or general device configuration?
Sam
One thing worth trying might be disabling the bootstrapping pins and forcing the flash voltage through the efuse, and then see if you get a failure? I dont know how long you have to run these before they typically fail.
You mentioned that these run near an internal combustion engine, can you shed any more details about power supply, or general device configuration?
Sam
-
- Posts: 45
- Joined: Sun Jan 06, 2019 12:42 pm
Re: Device bricked ("csum err") after two months of service
Hello Sam,
sorry for the late reply. I wasn't near a PC the last week.
On the later batches the signal from the ignition is coupled differently, and we have no problems there. Yet if there's anything software-wise that I can do to reduce the likelihood of the ESP overwriting its bootloader, I'd gladly apply it to both batches; the devices are expensive and the bricking issue is a major roadblock for us.
sorry for the late reply. I wasn't near a PC the last week.
Hmm, the datasheet states that pin 12 has an internal pull-down during bootstrapping; so it's two pulldowns in parallel, I think this should be enough, as there is really nothing pulling that pin up in any case.Could it be that 360k isnt aggressive enough of a pull down?
The device has an internal battery, it's not powered by the ICE. The signal integrity issue is on the first batch of devices, where the device and the engine's ignition system share grounds. The signal from the ignition system, after proper conditioning, is used to compute engine RPM.You mentioned that these run near an internal combustion engine, can you shed any more details about power supply, or general device configuration?
On the later batches the signal from the ignition is coupled differently, and we have no problems there. Yet if there's anything software-wise that I can do to reduce the likelihood of the ESP overwriting its bootloader, I'd gladly apply it to both batches; the devices are expensive and the bricking issue is a major roadblock for us.
Re: Device bricked ("csum err") after two months of service
No worries.
Yeah, it does seem far fetched, but weirder things have happened. If power consumption isn't an issue, you could easily lower it.
Okay, that makes sense. To clarify, the second batch of devices never overwrite their bootloader?
Or they don't have any noise issues?
Have you put the device through any sort of EMC/FCC testing? Could it be a weird radio immunity quirk?
Sam
Yeah, it does seem far fetched, but weirder things have happened. If power consumption isn't an issue, you could easily lower it.
Okay, that makes sense. To clarify, the second batch of devices never overwrite their bootloader?
Or they don't have any noise issues?
Have you put the device through any sort of EMC/FCC testing? Could it be a weird radio immunity quirk?
Sam
-
- Posts: 45
- Joined: Sun Jan 06, 2019 12:42 pm
Re: Device bricked ("csum err") after two months of service
Power consumption isn't an issue, I'll lower it.
Both - no bootloader issues as of yet, and the noise issue is gone.
Not yet. It's possible, because the devices fail while they are in use and while they are communicating over the WiFi.
Re: Device bricked ("csum err") after two months of service
Okay. Sounds like it may be fixed, although it would be nice to know the root mechanism as to why devices in noisy environments get bricked.
The immunity thing may be a red herring, but worth looking out for.
Can you share what you did differently between revisions in regards to GND coupling?
Sam
The immunity thing may be a red herring, but worth looking out for.
Can you share what you did differently between revisions in regards to GND coupling?
Sam
Who is online
Users browsing this forum: No registered users and 37 guests