I am working on an ESP32-based IOT node that runs a simple web interface. During the last few months it happened quite often that it became unavailable - i.e. the webserver would stop responding, but I could still ping the device.
After some experimentation I finally found an easy way to repro the issue. I can reproduce it with both the built-in httpd as well as mongoose, indicating that the problem is probably lower in the TCP/IP stack, probably LWIP? And I am not sure if it's a bug, or WAI and I am simply missing a flag.
Here is the repro:
- 0. Setup: Connect the ESP32 to the wifi and start-up either httpd or mongoose. Have a phone ready, I used a Pixel phone with Chrome, on the same Wifi network.
- 1. Make a request to the webserver running on the ESP32.
- 2. Now disconnect the phone from wifi by disabling it. (This is crucial, since the socket will now remain open, as the client wasn't able to close it properly before losing connection to the network.
- 3. Re-enable WIFI on your phone.
- 4. GOTO 1 and repeat
The problem is that browsers by default will keep the connection alive in order to make future requests faster. However, by disconnecting the device from the network before the client/browser is able to close the connection, yields in the socket staying open on the ESP and never closing. When the device re-enters the network and makes a new request, it will open up a new connection. And so they pile up until max socket count is reached.
My solution for now is to send "Connection: close" in the HTTP response header so that Chrome and other browsers close the connection immediately. But it bugs me that I am dependent on the client behaving well, and it thus being very easy to DOS the device.
Am I missing a timeout setting that force closes open sockets after they have been idle for a while?
Thank you kindly
Sascha
PS: This might be the root cause of https://github.com/espressif/esp-idf/issues/3851