Problems after updating to latest master (reproduced in IDF example code)
-
- Posts: 2
- Joined: Wed Oct 11, 2017 10:35 am
Re: Problems after updating to latest master (reproduced in IDF example code)
Greetings there,
I was using a "NodeMCU" branded devkitC clone yesterday with nothing else attached to the pins.
First, i compiled the example using the defconfig and the stack overflow seemed not to occur. I got the first overflows after I enabled debug output by means of setting the log level to verbose but the crashes seem to only occur sporadic. The stack overflow occurs after setting up the connection by means of the sta <ssid> <passwd> or the ap <ssid> <passwd> command.
I was using a "NodeMCU" branded devkitC clone yesterday with nothing else attached to the pins.
First, i compiled the example using the defconfig and the stack overflow seemed not to occur. I got the first overflows after I enabled debug output by means of setting the log level to verbose but the crashes seem to only occur sporadic. The stack overflow occurs after setting up the connection by means of the sta <ssid> <passwd> or the ap <ssid> <passwd> command.
Re: Problems after updating to latest master (reproduced in IDF example code)
Ok, here is another go with settings as requested, with the addition of more logging enabled.
As can be seen in the log, it crashed on the second time I ran the "sta" command this time.
https://pastebin.com/6VNnJtrc
sdkconfig: https://pastebin.com/2ihKuPVV
I ran it once more, this time it crashed on the first execution of the "sta"-command.
https://pastebin.com/hGw1tWZd
As can be seen in the log, it crashed on the second time I ran the "sta" command this time.
https://pastebin.com/6VNnJtrc
sdkconfig: https://pastebin.com/2ihKuPVV
I ran it once more, this time it crashed on the first execution of the "sta"-command.
https://pastebin.com/hGw1tWZd
Re: Problems after updating to latest master (reproduced in IDF example code)
Also, FulminatingGhost has posted a reply in this thread, but it is apparently in the moderation queue.permal wrote:Ok, here is another go with settings as requested, with the addition of more logging enabled.
As can be seen in the log, it crashed on the second time I ran the "sta" command this time.
https://pastebin.com/6VNnJtrc
sdkconfig: https://pastebin.com/2ihKuPVV
I ran it once more, this time it crashed on the first execution of the "sta"-command.
https://pastebin.com/hGw1tWZd
Re: Problems after updating to latest master (reproduced in IDF example code)
Thanks for all this information.
I did some more testing, and via more or less total chance (my WiFi AP was a bit flaky this morning) I triggered the stack overflow bug.
Ths stack overflow issue is that if tcpip_adapter is logging at Debug level or higher, some combinations of interface up/down will stack overflow while printf() is formatting log lines from tcpip_adapter (these run in the TCP/IP Task). On the other hand, a "clean" AP connection with no problems doesn't appear to stack overflow (which is why I didn't see this yesterday.)
To resolve this we need to increase the default TCP/IP stack size, at least when logging is set to a higher level. Or you can work around it by enabling "nano" formatting in newlib libc, which will reduce the stack overhead of formatting routines.
I can't explain why you get the weird garbled stack traces when this crash happens. I don't get that, I get a clean error dump (with a stack trace, if the Watchpoint on Stack End feature is enabled). This may be something related to the 26MHz crystal, or it may be related to the timing of the stack overflow (ie for you it's happening at some critical point, but due to some other external timing factor I don't get that).
Is this likely the root cause of all the problems you're seeing? (ie were you at logging at Debug when you first encountered this problem, or did you increase the log level while trying to debug the original crash?)
I did some more testing, and via more or less total chance (my WiFi AP was a bit flaky this morning) I triggered the stack overflow bug.
Ths stack overflow issue is that if tcpip_adapter is logging at Debug level or higher, some combinations of interface up/down will stack overflow while printf() is formatting log lines from tcpip_adapter (these run in the TCP/IP Task). On the other hand, a "clean" AP connection with no problems doesn't appear to stack overflow (which is why I didn't see this yesterday.)
To resolve this we need to increase the default TCP/IP stack size, at least when logging is set to a higher level. Or you can work around it by enabling "nano" formatting in newlib libc, which will reduce the stack overhead of formatting routines.
I can't explain why you get the weird garbled stack traces when this crash happens. I don't get that, I get a clean error dump (with a stack trace, if the Watchpoint on Stack End feature is enabled). This may be something related to the 26MHz crystal, or it may be related to the timing of the stack overflow (ie for you it's happening at some critical point, but due to some other external timing factor I don't get that).
Is this likely the root cause of all the problems you're seeing? (ie were you at logging at Debug when you first encountered this problem, or did you increase the log level while trying to debug the original crash?)
Thanks, FulminatingGhost! Is your clone using a 40MHz crystal? Do you get the same garbled post-crash output as permal, or just a crash?FulminatingGhost wrote: I was using a "NodeMCU" branded devkitC clone yesterday with nothing else attached to the pins.
Re: Problems after updating to latest master (reproduced in IDF example code)
Probably using wroom32, so yes.ESP_Angus wrote:Is your clone using a 40MHz crystal?
Re: Problems after updating to latest master (reproduced in IDF example code)
I always run with verbose loggning enabled so that if something happens, I have all information available without having to reproduce.
However, when I first started to use the iperf example to reproduce this I am fairly certain that I didn't enable verbose logging until later, but still experienced the issue.
What makes me a bit afraid this might be something else in addition to just a too small stack is that I did (or at least think I did) see this also with a whooping stack size of 10k - what size do you intend to set the default to?
However, when I first started to use the iperf example to reproduce this I am fairly certain that I didn't enable verbose logging until later, but still experienced the issue.
What makes me a bit afraid this might be something else in addition to just a too small stack is that I did (or at least think I did) see this also with a whooping stack size of 10k - what size do you intend to set the default to?
-
- Posts: 2
- Joined: Wed Oct 11, 2017 10:35 am
Re: Problems after updating to latest master (reproduced in IDF example code)
ESP_Angus wrote:Thanks, FulminatingGhost! Is your clone using a 40MHz crystal? Do you get the same garbled post-crash output as permal, or just a crash?
Indeed using 40MHz and a WROOM32 module on the devkit.WiFive wrote:Probably using wroom32, so yes.
I did get a mix of garbled crashes and coredumps of which one backtrace actually pointed towards the tcpip task and a printf call but sadly I didn't save that.
Re: Problems after updating to latest master (reproduced in IDF example code)
I found the bug went away with 3072, so I'm going to add some logic to set this minimum if logging level is raised.What makes me a bit afraid this might be something else in addition to just a too small stack is that I did (or at least think I did) see this also with a whooping stack size of 10k - what size do you intend to set the default to?
If you get similar looking crashes with a larger stack, please keep posting traces and steps to reproduce and we can keep bug hunting.
Re: Problems after updating to latest master (reproduced in IDF example code)
That's a very specific number. How did you conclude that it is enough? Any margin included in that number?
Re: Problems after updating to latest master (reproduced in IDF example code)
It's 3KB, I added 512 bytes to the previous default stack size.permal wrote:That's a very specific number. How did you conclude that it is enough? Any margin included in that number?
You can patch tcpip_adapter_api_cb() in tcpip_adapter_lwip.c to print the TCP/IP task stack high water mark with your configuration:
Code: Select all
--- a/components/tcpip_adapter/tcpip_adapter_lwip.c
+++ b/components/tcpip_adapter/tcpip_adapter_lwip.c
@@ -81,6 +81,8 @@ static void tcpip_adapter_api_cb(void* api_msg)
ESP_LOGD(TAG, "call api in lwip: ret=0x%x, give sem", msg->ret);
sys_sem_signal(&api_sync_sem);
+ ESP_LOGD(TAG, "stack headroom %d", uxTaskGetStackHighWaterMark(NULL));
+
return;
}
You can add the debug log line and verify this for yourself, though.
(As an additional datapoint, with nano 'printf' formatting enabled the free headroom never goes below 1KB - nano formatting makes a big difference to the stack overhead of printf().)
Who is online
Users browsing this forum: No registered users and 78 guests