undocumented UART Hardware problem?

kbaud1
Posts: 71
Joined: Wed Jan 17, 2018 11:55 pm

undocumented UART Hardware problem?

Postby kbaud1 » Mon May 04, 2020 11:43 pm

We ran into a problem with the UART hardware that doesn't seem documented. In our testing this problem only appears when we also use wifi, but we think it might just be triggered by multitasking in general, not necessarily wifi specifically.

The problem is that sometimes when we read the FIFO, its hardware read index does not automatically increment as normal, even though it still automatically decrements its hardware FIFO counter. As a result, the next time we read the FIFO we get the same byte read last time, and this pattern continues with every byte being shifted. Here is an example sequence of what might be received in the UART and how it might be read:

Byte Received Counter (before read) Read Index (before read) Byte Read Counter (after read) Read Index (after read)
1 1 50 1 0 51
2 1 51 2 0 51
3 1 51 2 0 52
4 1 52 3 0 53
5 1 53 4 0 54

The read index shown in yellow is the one that failed to update after the read, causing perpetual FIFO delay/misalignment in bytes read. If the malfunction happens again the shift becomes 2, and so on.

Here is example code that could observe the malfunction above:

Code: Select all

while (1)
{
    if (UART [1]->status.rxfifo_cnt)
	{
		printf ("Counter (before read): %d, Read Index (before read): %d\n", UART [1]->status.rxfifo_cnt, UART [1]->mem_rx_status.rd_addr);
		printf ("Byte Read: %d\n", READ_PERI_REG (UART_FIFO_AHB_REG (1)));
		printf ("Counter (after read): %d, Read Index (after read): %d\n", UART [1]->status.rxfifo_cnt, UART [1]->mem_rx_status.rd_addr);
	}
}
A solution is not to simply do extra reads of the FIFO when the counter is zero, because this messes up the read index as well. In looking through the source code we found the following function that seems to work around this hardware malfunction by continually reading the FIFO until it is really empty, based on either its counter or read index?

Code: Select all

static esp_err_t uart_reset_rx_fifo(uart_port_t uart_num)
{
    UART_CHECK((uart_num < UART_NUM_MAX), "uart_num error", ESP_FAIL);
    //Due to hardware issue, we can not use fifo_rst to reset uart fifo.
    //See description about UART_TXFIFO_RST and UART_RXFIFO_RST in <<esp32_technical_reference_manual>> v2.6 or later.

    // we read the data out and make `fifo_len == 0 && rd_addr == wr_addr`.
    while(UART[uart_num]->status.rxfifo_cnt != 0 || (UART[uart_num]->mem_rx_status.wr_addr != UART[uart_num]->mem_rx_status.rd_addr)) {
        READ_PERI_REG(UART_FIFO_REG(uart_num));
    }
    return ESP_OK;
}
However the “hardware issue” referenced in the manual says nothing about the possibility of the read index failing to increment, but rather describes a failure of the hardware FIFO reset function on UARTs 1 or 2 (which is a separate problem). Yet the code itself appears to successfully reset our malfunction and did so when we tested it. This isn’t really a solution though because you can unknowingly end up with any number of bad reads until you finally call this code to reset the FIFO.

Is there any reference to this particular failure to increment the UART index anywhere in the documentation? What would be the best recommended software fix so the FIFO can be read correctly without incident?

WiFive
Posts: 3529
Joined: Tue Dec 01, 2015 7:35 am

Re: undocumented UART Hardware problem?

Postby WiFive » Tue May 05, 2020 2:54 am

There are a few GitHub issues about this too

ESP_igrr
Posts: 2072
Joined: Tue Dec 01, 2015 8:37 am

Re: undocumented UART Hardware problem?

Postby ESP_igrr » Tue May 05, 2020 9:38 pm

Hi kbaud1,

Which address does "UART [1]" refer to in your code? Is it
in 0x3ffxxxxx range or in 0x600xxxxx range?
The errata document is being updated to explain this behavior, but the short version is that you need to:
* Use dport address of the FIFO (i.e. in 0x3ffxxxxx range) for reads, making sure you have memory barriers (memw instruction) between reads.
" Use AHB address of the FIFO (i.e. in 0x600xxxxx range) for writes.

kbaud1
Posts: 71
Joined: Wed Jan 17, 2018 11:55 pm

Re: undocumented UART Hardware problem?

Postby kbaud1 » Sat May 09, 2020 12:36 am

The problem has been observed on both UART1 and UART2. It has not been tested on UART0.


How can I be sure to have memory barriers between reads when writing in C code?"

ESP_igrr
Posts: 2072
Joined: Tue Dec 01, 2015 8:37 am

Re: undocumented UART Hardware problem?

Postby ESP_igrr » Sun May 10, 2020 10:03 am

kbaud1 wrote: How can I be sure to have memory barriers between reads when writing in C code?"
Usually "volatile" qualifier is enough to make the compiler add a memory barrier before accessing a variable.

(There is a pending issue that Xtensa GCC does not insert memory barriers when accessing 8 and 16 bit variables, when compiling at -O2 optimization level. It has been fixed upstream but we haven't released new toolchain binaries with the fix, yet. For now the workaround is to keep using -Os or -Og optimization level. For the peripheral registers access this issue usually doesn't matter, since peripheral registers are normally accessed using 32-bit loads and stores.)

samc77
Posts: 7
Joined: Sun Feb 16, 2020 8:08 pm

Re: undocumented UART Hardware problem?

Postby samc77 » Sun May 24, 2020 10:02 am

Hi,

I am also seeing this issue.
Currently on the master branch.
Using the idf library functions to read the uart, but still this problem occurs.

Adding memw into the uart functions does not help. What is the best way to work around this?

Thanks!


kbaud1
Posts: 71
Joined: Wed Jan 17, 2018 11:55 pm

Re: undocumented UART Hardware problem?

Postby kbaud1 » Mon Jun 08, 2020 3:00 pm

We see that the above github discussion of this problem is extensive and ongoing. Is there going to be a hardware change to correct it? (A software change probably won't help us since we're directly accessing memory.)

We confirmed that our results are similar to those reported by rojer: The problems do not stop with code like:

__asm__ __volatile__("memw");
rd_byte = READ_PERI_REG(UART_FIFO_REG(uart_num));

The problems do stop when we read the AHB port and disable other thread(s), but this is not a practical solution or recommended. For us the solution is to keep reading the FIFO until the count zeros and the indices agree, then accept the result only if it is the expected # of bytes. For example:

for (dres = dcnt = 0; UART1.status.rxfifo_cnt != 0 || UART1.mem_rx_status.wr_addr != UART1.mem_rx_status.rd_addr; ++dcnt)
dres = dres << 8 | READ_PERI_REG (UART_FIFO_REG (UART_NUM_1));
if (dcnt == (dtype ? 2 : 1))
data [dindex] = dres;

This seems to work reliably for us when we are scanning for updated readings because if a failure occurs, we will just rely on the old reading until the next refresh cycle. It is not a good solution if you don't know the size of the data you expect and/or cannot afford to randomly lose some of it.

Who is online

Users browsing this forum: justenyonyer, username and 94 guests