RSA peripheral 50% slower on ESP32-S3/C3 than S2?

jonsmirl
Posts: 19
Joined: Tue Dec 08, 2015 10:59 pm

Re: RSA peripheral 50% slower on ESP32-S3/C3 than S2?

Postby jonsmirl » Sun Jan 29, 2023 2:04 pm

Did this code ever get finished? Can ECC use the hidden eFuses with the DS peripheral to protect the key?

EmilenL
Posts: 15
Joined: Sun Oct 17, 2021 5:54 pm

Re: RSA peripheral 50% slower on ESP32-S3/C3 than S2?

Postby EmilenL » Fri Sep 29, 2023 8:49 pm

jonsmirl wrote:
Sun Jan 29, 2023 2:04 pm
Did this code ever get finished? Can ECC use the hidden eFuses with the DS peripheral to protect the key?
I have a working prototype. eFuse cannot be used to protect the key, since the RSA peripheral is only used as a generic bignum multiplier accelerator. I have also been working on a separate ESP32-S3-only solution that uses PIE (the 128-bit SIMD assembler instructions) instead of the RSA peripheral, which results in a speedup of ~2.5x faster for each ECC operation compared to the RSA solution. For example, a P-256 ECDSA signature can then be verified in 5 ms, at 240 MHz, which is ~4-5x faster than the current hardware ECC solution for ESP32-C6 using the API in the current edf-idf.

In any case, I recently got an ESP32-C6 devkit. I had hoped that the performance would be similar to ESP32-C3, since this newer chip should just be an upgraded version of the C3. Unfortunately, after doing some basic performance tests of the RSA peripheral, it seems the memory transfer speed has been further significantly reduced, probably due to a lower APB speed. Before, each 256-bit word took around 0.475 us to transfer; now each 256-bit word instead takes around 1.25 us to transfer. The performance of an actual operation has also been degrated to around 2.05 us instead of 1.6 us (using the modexp hack with Y=0 to perform only one modular multiplication described earlier). One modmul operation including memory transfers thus takes around 3*1.25+2.05=5.8 us instead of 3*0.475+1.6=3.025 us, which means ~1.92x slower. The actual time the RSA peripheral is active becomes around 36%, the rest of the time is just "wasted" waiting for the memory bus.

Is there any way to speed up the memory transfers on the ESP32-C6 or is this just the way it is? Has it been slowed down on the ESP32-H2, ESP32-C5 and ESP32-P4 as well?

Who is online

Users browsing this forum: No registered users and 68 guests