Problems after updating to latest master (reproduced in IDF example code)

FulminatingGhost
Posts: 2
Joined: Wed Oct 11, 2017 10:35 am

Re: Problems after updating to latest master (reproduced in IDF example code)

Postby FulminatingGhost » Wed Oct 11, 2017 10:58 am

Greetings there,

I was using a "NodeMCU" branded devkitC clone yesterday with nothing else attached to the pins.

First, i compiled the example using the defconfig and the stack overflow seemed not to occur. I got the first overflows after I enabled debug output by means of setting the log level to verbose but the crashes seem to only occur sporadic. The stack overflow occurs after setting up the connection by means of the sta <ssid> <passwd> or the ap <ssid> <passwd> command.

permal
Posts: 384
Joined: Sun May 14, 2017 5:36 pm

Re: Problems after updating to latest master (reproduced in IDF example code)

Postby permal » Wed Oct 11, 2017 5:37 pm

Ok, here is another go with settings as requested, with the addition of more logging enabled.

As can be seen in the log, it crashed on the second time I ran the "sta" command this time.

https://pastebin.com/6VNnJtrc

sdkconfig: https://pastebin.com/2ihKuPVV

I ran it once more, this time it crashed on the first execution of the "sta"-command.
https://pastebin.com/hGw1tWZd

permal
Posts: 384
Joined: Sun May 14, 2017 5:36 pm

Re: Problems after updating to latest master (reproduced in IDF example code)

Postby permal » Wed Oct 11, 2017 5:51 pm

permal wrote:Ok, here is another go with settings as requested, with the addition of more logging enabled.

As can be seen in the log, it crashed on the second time I ran the "sta" command this time.

https://pastebin.com/6VNnJtrc

sdkconfig: https://pastebin.com/2ihKuPVV

I ran it once more, this time it crashed on the first execution of the "sta"-command.
https://pastebin.com/hGw1tWZd
Also, FulminatingGhost has posted a reply in this thread, but it is apparently in the moderation queue.

ESP_Angus
Posts: 2344
Joined: Sun May 08, 2016 4:11 am

Re: Problems after updating to latest master (reproduced in IDF example code)

Postby ESP_Angus » Thu Oct 12, 2017 12:00 am

Thanks for all this information.

I did some more testing, and via more or less total chance (my WiFi AP was a bit flaky this morning) I triggered the stack overflow bug.

Ths stack overflow issue is that if tcpip_adapter is logging at Debug level or higher, some combinations of interface up/down will stack overflow while printf() is formatting log lines from tcpip_adapter (these run in the TCP/IP Task). On the other hand, a "clean" AP connection with no problems doesn't appear to stack overflow (which is why I didn't see this yesterday.)

To resolve this we need to increase the default TCP/IP stack size, at least when logging is set to a higher level. Or you can work around it by enabling "nano" formatting in newlib libc, which will reduce the stack overhead of formatting routines.

I can't explain why you get the weird garbled stack traces when this crash happens. I don't get that, I get a clean error dump (with a stack trace, if the Watchpoint on Stack End feature is enabled). This may be something related to the 26MHz crystal, or it may be related to the timing of the stack overflow (ie for you it's happening at some critical point, but due to some other external timing factor I don't get that).

Is this likely the root cause of all the problems you're seeing? (ie were you at logging at Debug when you first encountered this problem, or did you increase the log level while trying to debug the original crash?)
FulminatingGhost wrote: I was using a "NodeMCU" branded devkitC clone yesterday with nothing else attached to the pins.
Thanks, FulminatingGhost! Is your clone using a 40MHz crystal? Do you get the same garbled post-crash output as permal, or just a crash?

WiFive
Posts: 3529
Joined: Tue Dec 01, 2015 7:35 am

Re: Problems after updating to latest master (reproduced in IDF example code)

Postby WiFive » Thu Oct 12, 2017 1:39 am

ESP_Angus wrote:Is your clone using a 40MHz crystal?
Probably using wroom32, so yes.

permal
Posts: 384
Joined: Sun May 14, 2017 5:36 pm

Re: Problems after updating to latest master (reproduced in IDF example code)

Postby permal » Thu Oct 12, 2017 5:35 am

I always run with verbose loggning enabled so that if something happens, I have all information available without having to reproduce.

However, when I first started to use the iperf example to reproduce this I am fairly certain that I didn't enable verbose logging until later, but still experienced the issue.

What makes me a bit afraid this might be something else in addition to just a too small stack is that I did (or at least think I did) see this also with a whooping stack size of 10k - what size do you intend to set the default to?

FulminatingGhost
Posts: 2
Joined: Wed Oct 11, 2017 10:35 am

Re: Problems after updating to latest master (reproduced in IDF example code)

Postby FulminatingGhost » Thu Oct 12, 2017 7:26 pm

ESP_Angus wrote:Thanks, FulminatingGhost! Is your clone using a 40MHz crystal? Do you get the same garbled post-crash output as permal, or just a crash?
WiFive wrote:Probably using wroom32, so yes.
Indeed using 40MHz and a WROOM32 module on the devkit.

I did get a mix of garbled crashes and coredumps of which one backtrace actually pointed towards the tcpip task and a printf call but sadly I didn't save that.

ESP_Angus
Posts: 2344
Joined: Sun May 08, 2016 4:11 am

Re: Problems after updating to latest master (reproduced in IDF example code)

Postby ESP_Angus » Fri Oct 13, 2017 4:53 am

What makes me a bit afraid this might be something else in addition to just a too small stack is that I did (or at least think I did) see this also with a whooping stack size of 10k - what size do you intend to set the default to?
I found the bug went away with 3072, so I'm going to add some logic to set this minimum if logging level is raised.

If you get similar looking crashes with a larger stack, please keep posting traces and steps to reproduce and we can keep bug hunting. :)

permal
Posts: 384
Joined: Sun May 14, 2017 5:36 pm

Re: Problems after updating to latest master (reproduced in IDF example code)

Postby permal » Fri Oct 13, 2017 5:16 am

That's a very specific number. How did you conclude that it is enough? Any margin included in that number?

ESP_Angus
Posts: 2344
Joined: Sun May 08, 2016 4:11 am

Re: Problems after updating to latest master (reproduced in IDF example code)

Postby ESP_Angus » Fri Oct 13, 2017 6:19 am

permal wrote:That's a very specific number. How did you conclude that it is enough? Any margin included in that number?
It's 3KB, I added 512 bytes to the previous default stack size.

You can patch tcpip_adapter_api_cb() in tcpip_adapter_lwip.c to print the TCP/IP task stack high water mark with your configuration:

Code: Select all

--- a/components/tcpip_adapter/tcpip_adapter_lwip.c
+++ b/components/tcpip_adapter/tcpip_adapter_lwip.c
@@ -81,6 +81,8 @@ static void tcpip_adapter_api_cb(void* api_msg)
     ESP_LOGD(TAG, "call api in lwip: ret=0x%x, give sem", msg->ret);
     sys_sem_signal(&api_sync_sem);
 
+    ESP_LOGD(TAG, "stack headroom %d", uxTaskGetStackHighWaterMark(NULL));
+
     return;
 }
In the iperf example with 3056 byte stack, debug log level & nano formatting disabled this hovers around 520 bytes if you're lucky when it associates but after a few "tricky" re-associates (retries, timer expiry, etc.) I've seen it go as low as 436 bytes. So the absolute minimum viable stack (based on what I've seen) may be 2620 bytes, and the rest is headroom.

You can add the debug log line and verify this for yourself, though.

(As an additional datapoint, with nano 'printf' formatting enabled the free headroom never goes below 1KB - nano formatting makes a big difference to the stack overhead of printf().)

Who is online

Users browsing this forum: No registered users and 104 guests