Some assembly required... seeing trap when using ee.vld.128.ip instruction

aaronw
Posts: 14
Joined: Fri Apr 17, 2020 4:33 am

Some assembly required... seeing trap when using ee.vld.128.ip instruction

Postby aaronw » Mon Oct 17, 2022 11:09 am

I am seeing a trap when I use the ee.vld.128.ip instruction on an ESP32 S3. As far as I can tell, the way I'm using it should be fine. The value of a12 before this instruction points to a valid memory location: 0x40378d40. The trap also says that the address being accessed is invalid, though the address can be accessed just fine in the preceding l32i instruction which I put in to verify I didn't screw anything up. While I have extensive assembly language experience, it's mostly 64-bit MIPS with some ARM, so I'm new to Xtensa.

My goal for this is to speed up the FastLED code by using a lookup table for each nibble rather than doing a comparison for each bit. My benchmarking of my current method shows over a 25% speedup, but I want to see how far I can take it. The ee.vld.128.ip instruction looks ideal since I need to load 16 bytes of data then perform 4 word writes to the RMT memory buffer.

What am I doing wrong here? The l32i instruction accessing the exact same address works fine and loads the expected value.

I have also added code to set the wur.sar_byte and wur.accx_x registers to 0, but this makes no difference, nor do I see why this would be necessary.
  1. entry 0x403c98d8
  2. [    83][D][esp32-hal-cpu.c:244] setCpuFrequencyMhz(): PLL: 480 / 6 = 80 Mhz, APB: 80000000 Hz
  3. Guru Meditation Error: Core  1 panic'ed (LoadStoreError). Exception was unhandled.
  4.  
  5. Core  1 register dump:
  6. PC      : 0x403752b8  PS      : 0x00060830  A0      : 0x80375558  A1      : 0x3fce2bc0  
  7. A2      : 0x3fc94088  A3      : 0x40378d40  A4      : 0x00000030  A5      : 0x00000000  
  8. A6      : 0x02ce3644  A7      : 0x00ffffff  A8      : 0x60016800  A9      : 0x00000120  
  9. A10     : 0x00000000  A11     : 0x00000000  A12     : 0x40378d40  A13     : 0x00000000  
  10. A14     : 0x0028800a  A15     : 0x40378d40  SAR     : 0x00000010  EXCCAUSE: 0x00000003  
  11. EXCVADDR: 0x40378d40  LBEG    : 0x400570e8  LEND    : 0x400570f3  LCOUNT  : 0xffffffff  
  12.  
Backtrace:0x403752b5:0x3fce2bc00x40375555:0x3fce2be0 0x4037558f:0x3fce2c00 0x40375a05:0x3fce2c20 0x42001486:0x3fce2c40 0x42001729:0x3fce2c60 0x42001849:0x3fce2cb0 0x4200166c:0x3fce2cf0 0x4200260d:0x3fce2d20

My code up until the crash looks like:
  1.                 "   srli            %[tmp], %[p], 4             \n"
  2.                 "   slli            %[tmp], %[tmp], 4           \n"
  3.                 "   add.n           %[tmp], %[tmp], %[bitTable] \n"
  4.                 "   mov.n           a15, %[tmp]                 \n"
  5.                 "   l32i            a14, %[tmp], 0              \n"
  6.                 "   ee.vld.128.ip   q0,%[tmp],0                 \n"
And the compiled code according to objdump is:
  1. 403752a2:       72b8            l32i.n  a11, a2, 28
  2. 403752a4:       fc6131          l32r    a3, 40374428 <_iram_text_start+0x8>
  3. 403752a7:       bbaa            add.n   a11, a11, a10
  4. 403752a9:       000bb2          l8ui    a11, a11, 0
  5. 403752ac:       41c4b0          srli    a12, a11, 4
  6. 403752af:       11ccc0          slli    a12, a12, 4
  7. 403752b2:       cc3a            add.n   a12, a12, a3
  8. 403752b4:       0cfd            mov.n   a15, a12
  9. 403752b6:       0ce8            l32i.n  a14, a12, 0
  10. 403752b8:       8300c4          ee.vld.128.ip   q0, a12, 0
My total inline code looks like:
  1.             register uint8_t pData = mPixelData[mCur];
  2.             register rmt_item32_t *bitTablePtr = &bitTable[0][0];
  3.             __asm__ __volatile__(
  4.                 "   srli            %[tmp], %[p], 4             \n"
  5.                 "   slli            %[tmp], %[tmp], 4           \n"
  6.                 "   add.n           %[tmp], %[tmp], %[bitTable] \n"
  7.                 "   mov.n           a15, %[tmp]                 \n"
  8.                 "   l32i            a14, %[tmp], 0              \n"
  9.                 "   ee.vld.128.ip   q0,%[tmp],0                 \n"
  10.                 "   extui           %[tmp], %[p], 0, 4          \n"
  11.                 "   slli            %[tmp], %[tmp], 4           \n"
  12.                 "   add.n           %[tmp], %[tmp], %[bitTable] \n"
  13.                 "   ee.vld.128.ip   q1,%[tmp],0                 \n"
  14.                 "   ee.movi.32.a    q0, %[tmp], 3               \n"
  15.                 "   s32i            %[tmp], %[pRmtMem], 0x0     \n"
  16.                 "   ee.movi.32.a    q0, %[tmp], 2               \n"
  17.                 "   s32i            %[tmp], %[pRmtMem], 0x4     \n"
  18.                 "   ee.movi.32.a    q0, %[tmp], 1               \n"
  19.                 "   s32i            %[tmp], %[pRmtMem], 0x8     \n"
  20.                 "   ee.movi.32.a    q0, %[tmp], 0               \n"
  21.                 "   s32i            %[tmp], %[pRmtMem], 0xc     \n"
  22.                 "   ee.movi.32.a    q1, %[tmp], 3               \n"
  23.                 "   s32i            %[tmp], %[pRmtMem], 0x10    \n"
  24.                 "   ee.movi.32.a    q1, %[tmp], 2               \n"
  25.                 "   s32i            %[tmp], %[pRmtMem], 0x14    \n"
  26.                 "   ee.movi.32.a    q1, %[tmp], 1               \n"
  27.                 "   s32i            %[tmp], %[pRmtMem], 0x18    \n"
  28.                 "   ee.movi.32.a    q1, %[tmp], 0               \n"
  29.                 "   s32i            %[tmp], %[pRmtMem], 0x1c    \n"
  30.                 "   addi            %[pRmtMem],%[pRmtMem], 0x20 \n"
  31.                 "   memw                                        \n"
  32.                 : [tmp] "=&r"(tmp), [pRmtMem] "+r"(pItem)
  33.                 : [bitTable] "r"(bitTablePtr), [p] "r"(pData)
  34.                 : "a14", "a15");
  35.             mCur++;
where pItem points to RMT memory. a14 and a15 are currently just for debugging.

Any help would be appreciated.

-Aaron

ESP_igrr
Posts: 2072
Joined: Tue Dec 01, 2015 8:37 am

Re: Some assembly required... seeing trap when using ee.vld.128.ip instruction

Postby ESP_igrr » Mon Oct 17, 2022 7:54 pm

Hi Aaron,
I'll check this with the hardware designers, but very likely the issue you are seeing is because the source address is mapped to the instruction bus of the CPU, not to the data bus. ESP32-S3 has a 128-bit data bus which is used by the instruction extensions. However the CPU doesn't know how to perform a 128-bit access over an instruction bus — there simply isn't any hardware in the CPU for this.
Please try moving the array into data memory — either RAM (.data) or Flash (.rodata).

(Similar issue occurs with FPU instructions — loads/stores to/from the FPU registers can't use pointers which are in the instruction bus range)

aaronw
Posts: 14
Joined: Fri Apr 17, 2020 4:33 am

Re: Some assembly required... seeing trap when using ee.vld.128.ip instruction

Postby aaronw » Wed Oct 19, 2022 9:18 am

This may be the case. It looks like the static allocation of the table is being placed in instruction memory.

Who is online

Users browsing this forum: axellin, ESP_Sprite and 84 guests