Page 1 of 4

Unstable partition API flashing functions

Posted: Mon Mar 20, 2017 2:22 am
by imtiaz
Hi Espressif experts , @ESP_Angus,

I have been struggling to get reliable performance with SPI flashing functions . I am using the partition API for both OTA for esp32 and for storing a binary file for another processor. I am transferring the file over wifi TCP socket. The esp32 being the AP and the socket server.

If I just download the file without writing it to flash there are no issues - however as soon as I enable the file to be written - which means I erase the partition and then write and then read back to verify , the program becomes unstable.

While doing OTA - the socket receiving thread seems to sometimes just hang after OTA_Init() - no error messages - sometimes half way through receiving the file it seems to hang - other times it crashes as follows:

Code: Select all

erasing partition<\r><\n>
<27>[0;32mI (27748) flashops:   1<27>[0m<\r><\n>
Guru Meditation Error of type IllegalInstruction occurred on core  0. Exception was unhandled.<\r><\n>
Register dump:<\r><\n>
PC      : 0x4011b49e  PS      : 0x00060f33  A0      : 0x80046686  A1      : 0x3ffc0500  <\r><\n>
A2      : 0x00000000  A3      : 0x4011b49c  A4      : 0x00000000  A5      : 0x0000000c  <\r><\n>
A6      : 0x3ffb93cc  A7      : 0x3ffb8360  A8      : 0x80019fb8  A9      : 0x000044c4  <\r><\n>
A10     : 0x00000000  A11     : 0x00000000  A12     : 0x00060d21  A13     : 0x00000022  <\r><\n>
A14     : 0x000023ec  A15     : 0x3ffc0640  SAR     : 0x00000017  EXCCAUSE: 0x00000000  <\r><\n>
EXCVADDR: 0x00000000  LBEG    : 0x400014fd  LEND    : 0x4000150d  LCOUNT  : 0xfffffffc  <\r><\n>
<\r><\n>
Backtrace: 0x4011b49e:0x3ffc0500 0x40046686:0x3ffc0530 0x40047518:0x3ffc0550 0x40048536:0x3ffc0570 0x40048675:0x3ffc0590 0x40054ed9:0x3ffc05b0 0x40082482:0x3ffc05d0 0x400810e8:0x3ffc0600<\r><\n>
<\r><\n>
Entering gdb stub now.<\r><\n>
Guru Meditation Error of type IllegalInstruction occurred on core  0. Exception was unhandled.<\r><\n>
Register dump:<\r><\n>
PC      : 0x400d416a  PS      : 0x00060033  A0      : 0x80085060  A1      : 0x3ffc03a0  <\r><\n>
A2      : 0x3ffc0440  A3      : 0x3ff5f000  A4      : 0x383fc000  A5      : 0x3ff60000  <\r><\n>
A6      : 0x00000000  A7      : 0x383fc000  A8      : 0x80084e8f  A9      : 0x3ffc0380  <\r><\n>
A10     : 0x3ffc0600  A11     : 0x3ffc0600  A12     : 0x00000009  A13     : 0x3ffc0400  <\r><\n>
A14     : 0x3ffc1953  A15     : 0x3ffc195c  SAR     : 0x00000017  EXCCAUSE: 0x00000000  <\r><\n>
EXCVADDR: 0x00000000  LBEG    : 0x4000c2e0  LEND    : 0x4000c2f6  LCOUNT  : 0x00000000  <\r><\n>
<\r><\n>
Backtrace: 0x400d416a:0x3ffc03a0 0x40085060:0x3ffc0420 0x40080e8d:0x3ffc0440 0x4011b49e:0x3ffc0500 0x4011b49e:0x3ffc0530 0x40047518:0x3ffc0550 0x40048536:0x3ffc0570 0x40048675:0x3ffc0590 0x40054ed9:0x3ffc05b0 0x40082482:0x3ffc05d0 0x400810e8:0x3ffc0600<\r><\n>
I can give a sample of the code if you like. I am using IDF from a few days ago but I have had the same issues from previous versions as well.

Also - I believe there is an issue with partition erase if you call it with "NULL" size which is supposed to erase the whole partition.

Thank you
Imtiaz

Re: Unstable partition API flashing functions

Posted: Mon Mar 20, 2017 2:08 pm
by aaquilina
Something I found when reading from flash is that unless you pin the associated task to a single core, the processor tends to crash ( at a non repeatable location). I believe theres a bug in the spi_flash api.

Re: Unstable partition API flashing functions

Posted: Mon Mar 20, 2017 4:59 pm
by ESP_igrr
Hi imtiaz,
If you could share a code sample which exhibits the issue, that would be great.

Re: Unstable partition API flashing functions

Posted: Mon Mar 20, 2017 8:04 pm
by imtiaz

Code: Select all

static void fwUpdate_thread(void *arg)
{
    FWDWNLD_THREAD_ARGS* MyArgs = (FWDWNLD_THREAD_ARGS*) arg;
    struct sockaddr_in clientAddress;
	struct sockaddr_in serverAddress;
	TRACE_D("Firmware Update Sever Socket Starting .......");
	TRACE_D(" Port = %d", MyArgs->PortNumber);
	// Create a socket that we will listen upon.
	int sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
	if (sock < 0)
	{
		TRACE_E("socket: %d %s \n", sock, strerror(errno));
		while(1);
	}
	// Bind our server socket to a port.
	serverAddress.sin_family = AF_INET;
	serverAddress.sin_addr.s_addr = htonl(INADDR_ANY);
	serverAddress.sin_port = htons(MyArgs->PortNumber);
	int rc  = bind(sock, (struct sockaddr *)&serverAddress, sizeof(serverAddress));
	if (rc < 0)
	{
		TRACE_E("socket: %d %s \n", sock, strerror(errno));
		while(1);
	}

	// Flag the socket as listening for new connections
	rc = listen(sock, 1);
	if (rc < 0)
	{
		TRACE_E("listen: %d %s \n", rc, strerror(errno));
		while(1);
	}
	BOOL Done = False;
	while(!Done)
	{
	    socklen_t clientAddressLength = sizeof(clientAddress);
	    TRACE_D("Waiting for new Connection \n");
		int clientSock = accept(sock, (struct sockaddr *)&clientAddress, &clientAddressLength); //blocks until new connection available
		if (clientSock < 0)
		{
			TRACE_E("accept error: %d %s", clientSock, strerror(errno));
		}
		else
		{
		    TRACE_D("new accept: %d %s\n", clientSock, strerror(errno));
		    uint8_t* RxBuf  = malloc(1024); //our local Rx Buffer on the heap
		    const esp_partition_t *p = esp_partition_find_first(ESP_PARTITION_TYPE_DATA, 0x80, NULL);
		    if(p == NULL)
		    {
		    	TRACE_E("ST Code Partition not found\n");
		    	Done = True;
		    	while(1);
		    }
		    else
		    {
		    	TRACE_D("ST Partition Found : %d bytes\n",p->size);
		    	//vTaskDelay(1000 / portTICK_RATE_MS);
		    }
		    uint32_t FileSizeRecived = 0;
		    while(1)
		    {
		    	ssize_t sizeRead = recv(clientSock, RxBuf, 1024, 0);
		    	if(sizeRead > 0)
		    	{
		    		if(esp_partition_write(p, FileSizeRecived , RxBuf, sizeRead) == ESP_OK)
		    		{
		    			 TRACE_D("%d : %d\n", sizeRead,FileSizeRecived);
		    		}
		    		else
		    		{
		    			TRACE_E("ST partition write error : %d\n",sizeRead);
		    			while(1);
		    		}
		    		FileSizeRecived+=sizeRead;
		    	}
		    	else if(sizeRead == 0)
		    	{
		    		TRACE_D("File Size : %d\n", FileSizeRecived);
		    		break;
		    	}
		    	else
		    	{
		    		TRACE_E("Socket return error\n");
		    		break;
		    	}
		    }
		    free(RxBuf);
		    if(VerifyBinFileFromFlash(p,FileSizeRecived ,(unsigned char*)&MyArgs->FR.md5Hash ))
		    {
		    	memcpy(&dwnldFileRequestCopy , &MyArgs->FR , sizeof(ID_DWNLD_FILE_REQUEST_TYPE));
		    }
		    else
		    {
		    	memset(&dwnldFileRequestCopy,0, sizeof(ID_DWNLD_FILE_REQUEST_TYPE));
		    }

		    close(clientSock);
		    close(sock);
		    Done = True;
		}
	}
	TRACE_D("Exiting update thread\n");

	vTaskDelete(NULL);
}
/************************************************************
@Func:
@Inputs:
@Outputs:
*************************************************************/
static esp_err_t EraseSTPartition(void)
{
	const esp_partition_t *p = esp_partition_find_first(ESP_PARTITION_TYPE_DATA, 0x80, NULL);

	if(p)
	{
		if(esp_partition_erase_range(p, 0, p->size-4096) == ESP_OK)
		{
				TRACE_D("erase OK : %d kbytes\n",p->size/1024);
				return ESP_OK;
		}
	}
	else
	{
		TRACE_E("ST Code Partition not found\n");
	}

	return ESP_FAIL;
}
/************************************************************
@Func:
@Inputs:
@Outputs:
*************************************************************/
void syrp_dwnld_bin_Start(uint16_t portNumber , ID_DWNLD_FILE_REQUEST_TYPE* pFR)
{
	static FWDWNLD_THREAD_ARGS args;
	args.PortNumber = portNumber;
	memcpy(&args.FR , pFR , sizeof(ID_DWNLD_FILE_REQUEST_TYPE));
	uint8_t error = 0;

	if(pFR->fileSize < GetMaxFileSize(pFR->hwID)) // step 1 : check the file size does not exceed max file size for that h/w module
	{
		// step 2 : erase the memory where the file will be temporarily stored if core module / if espmodule than start OTA process
		TRACE_D("erasing partition\n");
		if(EraseSTPartition() == ESP_OK)
		{
			//sys_thread_new("fwUpdate_thread", fwUpdate_thread, &args, 2048*4, 5);
			xTaskCreate(fwUpdate_thread, "fwUpdate_thread", 8192, &args, 6, NULL);
		}
		else
		{
			TRACE_E("Fatal error with flash erase\n");
			error = 2;
		}
	}
	else
	{
		TRACE_E("File size too large\n");
		error = 1;
	}
	if(error)
	{
		Send_ID_ERROR(ID_DWNLD_FILE_REQUEST , error, 0 , 0 ); //fixme the port and client are 0
	}
	else
	{
		Send_ID_DWNLD_FILE_REQUEST_REPLY(0 , 0 , pFR->hwID ,portNumber);
	}
}

Re: Unstable partition API flashing functions

Posted: Mon Mar 20, 2017 8:19 pm
by imtiaz
I can see on a wifi analyser that very often the access point drops off soon after a flash erase . But I am not 100 percent sure whether its to do with the flash erase or some other event happening at the same time , like starting a TCP server.

Re: Unstable partition API flashing functions

Posted: Mon Mar 20, 2017 10:06 pm
by imtiaz
I can confirm that I have isolated the problem. Calling

Code: Select all

(esp_partition_erase_range(p, 0, p->size)
or

Code: Select all

esp_ota_begin( partition, OTA_SIZE_UNKNOWN, &out_handle);
causes the wifi access point to drop off.

This has caused a lot of delay, confusion and frustration - please look at it with high priority.

Thanks
Imtiaz

Re: Unstable partition API flashing functions

Posted: Tue Mar 21, 2017 11:40 pm
by imtiaz
@ESP_igrr @ESP_Sprite @ESP_Angus

Hi Guys ,

I know you are busy , but some response would be appreciated :)

Thanks
Imtiaz

Re: Unstable partition API flashing functions

Posted: Wed Mar 22, 2017 2:28 am
by ESP_Angus
Hi Imtiaz,

We had a quick discussion about how to solve this yesterday. I'm going to try and reproduce & solve today (it may be that reproducing is easier on certain APs or ambient WiFi traffic loads).

Will keep you in the loop.

Angus

Re: Unstable partition API flashing functions

Posted: Wed Mar 22, 2017 6:39 am
by ESP_Angus
Hi Imtiaz,

I haven't managed to reproduce this issue, even under high network load the ESP32 stays associated. I'm guessing either the partition you're erasing is particularly large, the SPI flash is slow, or the AP you have is more sensitive to timeouts (or maybe some combination of factors.)

However, there is a probable fix in this branch:
https://github.com/espressif/esp-idf/tr ... ock_period

Can you please try it out and let us know if it fixes the problem?

Angus

Re: Unstable partition API flashing functions

Posted: Wed Mar 22, 2017 7:48 pm
by imtiaz
Hi Angus @ESP_Angus,

It seems like from your wording you are testing with esp32 as a station. I am talking about the esp32 as a wifi access point. Then as you do SPI functions the AP just drops off and the esp32 doesnt seem to be resetting
Also the partition I am erasing is either 1Mbyte or 1.8MByte and it is on your standard dev kit and module . Its not slow because I when it does work I can send and write the whole 1Mbyte partition in a few seconds.

Also in your testing please enable bluetooth as well