ESP MESH invalid state seems to cause lockup
Posted: Thu Mar 18, 2021 7:46 am
Hi,
I've been using ESP MESH for some time, but recently a strange problem has surfaced.
Sometimes my devices get hung up with the following messages:
I (19:45:13.501) mesh_net: <GOT ROOT ADDRESS> addr:f0:08:d1:85:ea:ed
I (12983903) mesh: 5080<assoc>parent layer:1, channel:1, rssi:-78, assoc:1, rssi threshold<-78,-82,-85>
W (12985112) mesh: [mesh_schedule.c,3072] [WND-RX]1200 ms timeout, seqno:0, xseqno:1, no_wnd_count:0, timeout_count:0
W (12986313) mesh: [mesh_schedule.c,3072] [WND-RX]1200 ms timeout, seqno:0, xseqno:1, no_wnd_count:0, timeout_count:1
I (12986453) wifi<1,0>, old:<1,0>, ap:<1,0>, sta:<1,0>, prof:1
I (12986454) wifi bc:dd:c2:f7:f3:1d join, AID=1, lr, 20
I (12986608) mesh: 1307[recv]invalid child bc:dd:c2:f7:f3:1d
I (12986609) mesh: 1307[recv]invalid child bc:dd:c2:f7:f3:1d
I (12986610) mesh: 1804[XON]async, from child bc:dd:c2:f7:f3:1d
I (12986610) mesh: 1307[recv]invalid child bc:dd:c2:f7:f3:1d
I (12986619) mesh: 1307[recv]invalid child bc:dd:c2:f7:f3:1d
I (12986681) mesh: 1307[recv]invalid child bc:dd:c2:f7:f3:1d
I (12986681) mesh: 1307[recv]invalid child bc:dd:c2:f7:f3:1d
I (12986682) mesh: 1804[XON]async, from child bc:dd:c2:f7:f3:1d
Once this starts, it seems to go on indefinitely (I tried waiting for about a day, at which point the mesh_schedule timeout count was in the multiple millions).
This seems to be blocking other tasks, and it also seems to be "infectious" to a certain extent, causing more nodes in the network to fail once one has failed.
I have no access to the mesh schedule file, so I can't see what is going on, but is there any way to detect such an error?
I've been using ESP MESH for some time, but recently a strange problem has surfaced.
Sometimes my devices get hung up with the following messages:
I (19:45:13.501) mesh_net: <GOT ROOT ADDRESS> addr:f0:08:d1:85:ea:ed
I (12983903) mesh: 5080<assoc>parent layer:1, channel:1, rssi:-78, assoc:1, rssi threshold<-78,-82,-85>
W (12985112) mesh: [mesh_schedule.c,3072] [WND-RX]1200 ms timeout, seqno:0, xseqno:1, no_wnd_count:0, timeout_count:0
W (12986313) mesh: [mesh_schedule.c,3072] [WND-RX]1200 ms timeout, seqno:0, xseqno:1, no_wnd_count:0, timeout_count:1
I (12986453) wifi<1,0>, old:<1,0>, ap:<1,0>, sta:<1,0>, prof:1
I (12986454) wifi bc:dd:c2:f7:f3:1d join, AID=1, lr, 20
I (12986608) mesh: 1307[recv]invalid child bc:dd:c2:f7:f3:1d
I (12986609) mesh: 1307[recv]invalid child bc:dd:c2:f7:f3:1d
I (12986610) mesh: 1804[XON]async, from child bc:dd:c2:f7:f3:1d
I (12986610) mesh: 1307[recv]invalid child bc:dd:c2:f7:f3:1d
I (12986619) mesh: 1307[recv]invalid child bc:dd:c2:f7:f3:1d
I (12986681) mesh: 1307[recv]invalid child bc:dd:c2:f7:f3:1d
I (12986681) mesh: 1307[recv]invalid child bc:dd:c2:f7:f3:1d
I (12986682) mesh: 1804[XON]async, from child bc:dd:c2:f7:f3:1d
Once this starts, it seems to go on indefinitely (I tried waiting for about a day, at which point the mesh_schedule timeout count was in the multiple millions).
This seems to be blocking other tasks, and it also seems to be "infectious" to a certain extent, causing more nodes in the network to fail once one has failed.
I have no access to the mesh schedule file, so I can't see what is going on, but is there any way to detect such an error?