My curiosity increased about the basic arithmetic performance with different types on ESP32 and ESP32-S2 (on IDF v4.2) increased. I did replace the timer_u32 function with the esp_timer_get_time to reduce the effort to replicate the measurements on different platforms. I did add the measurements for double calculations.
With this code
Code: Select all
#include <stdio.h>
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "soc/frc_timer_reg.h"
#ifdef CONFIG_ESP_TIMER_IMPL_FRC2
#define timer_u32() (REG_READ(0x3ff47004)) // FRC2
#define TICKS_PER_US (80)
#elif CONFIG_ESP_TIMER_IMPL_TG0_LAC
#define timer_u32() (REG_READ(0x3ff5F078)) // TG0_LAC
#define TICKS_PER_US (2)
#else
// SYSTIMER
__attribute__((always_inline)) static inline uint32_t timer_u32(void) {
REG_WRITE(0x3f423038,1UL << 31); // Start synch
while (!REG_GET_BIT(0x3f423038,1UL << 30)); // Wait synch
return REG_READ(0x3f423040); // Read SYSTIMER_VALUE_LO_REG
}
#define TICKS_PER_US (80)
#endif
#define F47 (2971215073)
#define FIBCOUNT (47)
uint32_t u32Loop(uint32_t n) {
uint32_t a,b,f;
uint32_t i,j;
f = 0;
for (i = 0; i < n; i++ ) {
a = 0;
b = 1;
for (j = 0; j < (FIBCOUNT - 1); j++) {
f = a + b;
a = b;
b = f;
}
}
return f;
}
uint32_t u64Loop(uint32_t n) {
uint64_t a,b,f;
uint32_t i,j;
f = 0;
for (i = 0; i < n; i++ ) {
a = 0;
b = 1;
for (j = 0; j < (FIBCOUNT - 1); j++) {
f = a + b;
a = b;
b = f;
}
}
return f;
}
uint32_t f32Loop(uint32_t n) {
float a,b,f;
uint32_t i,j;
f = 0.0;
for (i = 0; i < n; i++ ) {
a = 0.0;
b = 1.0;
for (j = 0; j < (FIBCOUNT - 1); j++) {
f = a + b;
a = b;
b = f;
}
}
return f;
}
uint32_t f64Loop(uint32_t n) {
double a,b,f;
uint32_t i,j;
f = 0.0;
for (i = 0; i < n; i++ ) {
a = 0.0;
b = 1.0;
for (j = 0; j < (FIBCOUNT - 1); j++) {
f = a + b;
a = b;
b = f;
}
}
return f;
}
void runTest(uint32_t n, uint32_t loop(uint32_t), char* typeId) {
uint32_t t0,t1;
uint64_t lastF;
float delta;
t0 = esp_timer_get_time();
lastF = loop(n);
t1 = esp_timer_get_time();
delta = t1 - t0;
printf("%11d times %d fibonaccies with %s in %8.0f us (%7.3f).",n,FIBCOUNT,typeId,delta,delta/n);
if (lastF == F47) {
printf(" F47 is OK.\n");
} else {
printf(" F47 = %jd, expected %jd.\n", lastF,F47);
}
vTaskDelay(100 / portTICK_PERIOD_MS);
}
void app_main(void) {
printf("Starting time comparisons on ESP32 Pico D4/v4.2\n");
runTest(10*1000,u32Loop,"u32");
runTest(100*1000,u32Loop,"u32");
runTest(10*1000,u64Loop,"u64");
runTest(100*1000,u64Loop,"u64");
runTest(10*1000,f32Loop,"f32");
runTest(100*1000,f32Loop,"f32");
runTest(10*1000,f64Loop,"f64");
runTest(100*1000,f64Loop,"f64");
for (int i = 2; i<14; i++) {
runTest(i*100*1000,f32Loop,"f32");
}
}
The ESP32 Pico D4 gave the following results
Code: Select all
Starting time comparisons on ESP32 Pico D4/v4.2
10000 times 47 fibonaccies with u32 in 40950 us ( 4.095). F47 is OK.
100000 times 47 fibonaccies with u32 in 409044 us ( 4.090). F47 is OK.
10000 times 47 fibonaccies with u64 in 61178 us ( 6.118). F47 is OK.
100000 times 47 fibonaccies with u64 in 611698 us ( 6.117). F47 is OK.
10000 times 47 fibonaccies with f32 in 41355 us ( 4.135). F47 = 2971214848, expected 2971215073.
100000 times 47 fibonaccies with f32 in 413426 us ( 4.134). F47 = 2971214848, expected 2971215073.
10000 times 47 fibonaccies with f64 in 208154 us ( 20.815). F47 is OK.
100000 times 47 fibonaccies with f64 in 2081548 us ( 20.815). F47 is OK.
200000 times 47 fibonaccies with f32 in 826852 us ( 4.134). F47 = 2971214848, expected 2971215073.
300000 times 47 fibonaccies with f32 in 1240284 us ( 4.134). F47 = 2971214848, expected 2971215073.
400000 times 47 fibonaccies with f32 in 1653710 us ( 4.134). F47 = 2971214848, expected 2971215073.
500000 times 47 fibonaccies with f32 in 2067135 us ( 4.134). F47 = 2971214848, expected 2971215073.
600000 times 47 fibonaccies with f32 in 2480567 us ( 4.134). F47 = 2971214848, expected 2971215073.
700000 times 47 fibonaccies with f32 in 2893993 us ( 4.134). F47 = 2971214848, expected 2971215073.
800000 times 47 fibonaccies with f32 in 3307418 us ( 4.134). F47 = 2971214848, expected 2971215073.
900000 times 47 fibonaccies with f32 in 3720851 us ( 4.134). F47 = 2971214848, expected 2971215073.
1000000 times 47 fibonaccies with f32 in 4134276 us ( 4.134). F47 = 2971214848, expected 2971215073.
1100000 times 47 fibonaccies with f32 in 4547701 us ( 4.134). F47 = 2971214848, expected 2971215073.
1200000 times 47 fibonaccies with f32 in 4961134 us ( 4.134). F47 = 2971214848, expected 2971215073.
E (58506) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (58506) task_wdt: - IDLE0 (CPU 0)
E (58506) task_wdt: Tasks currently running:
E (58506) task_wdt: CPU 0: main
E (58506) task_wdt: CPU 1: IDLE1
E (58506) task_wdt: Print CPU 0 (current core) backtrace
Backtrace:0x400D21B0:0x3FFB0760 0x40082479:0x3FFB0780 0x400D2B05:0x3FFB4690 0x400D2B72:0x3FFB46B0 0x400D2C6C:0x3FFB46E0 0x400D17EE:0x3FFB4700 0x4008459D:0x3FFB4730
0x400d21b0: task_wdt_isr at C:/Users/Olavi/esp/esp-idf/components/esp_common/src/task_wdt.c:187
0x40082479: _xt_lowint1 at C:/Users/Olavi/esp/esp-idf/components/freertos/xtensa/xtensa_vectors.S:1105
0x400d2b05: f32Loop at C:\Users\Olavi\Documents\VS_ESP_IDF\SpeedDemo\Speed_v0_3\build/../main/SpeedDemo.c:62
0x400d2b72: runTest at C:\Users\Olavi\Documents\VS_ESP_IDF\SpeedDemo\Speed_v0_3\build/../main/SpeedDemo.c:94
0x400d2c6c: app_main at C:\Users\Olavi\Documents\VS_ESP_IDF\SpeedDemo\Speed_v0_3\build/../main/SpeedDemo.c:119 (discriminator 3)
0x400d17ee: main_task at C:/Users/Olavi/esp/esp-idf/components/esp32/cpu_start.c:600 (discriminator 2)
0x4008459d: vPortTaskWrapper at C:/Users/Olavi/esp/esp-idf/components/freertos/xtensa/port.c:143
E (58506) task_wdt: Print CPU 1 backtrace
Backtrace:0x40081863:0x3FFB0D60 0x40082479:0x3FFB0D80 0x4000BFED:0x3FFB5540 0x400847F5:0x3FFB5550 0x400D23F4:0x3FFB5570 0x400D23FF:0x3FFB55A0 0x400D249D:0x3FFB55C0 0x40084F1A:0x3FFB55E0 0x4008459D:0x3FFB5600
0x40081863: esp_crosscore_isr at C:/Users/Olavi/esp/esp-idf/components/esp32/crosscore_int.c:80
0x40082479: _xt_lowint1 at C:/Users/Olavi/esp/esp-idf/components/freertos/xtensa/xtensa_vectors.S:1105
0x400847f5: vPortExitCritical at C:/Users/Olavi/esp/esp-idf/components/freertos/xtensa/port.c:419
0x400d23f4: esp_task_wdt_reset at C:/Users/Olavi/esp/esp-idf/components/esp_common/src/task_wdt.c:333
0x400d23ff: idle_hook_cb at C:/Users/Olavi/esp/esp-idf/components/esp_common/src/task_wdt.c:88
0x400d249d: esp_vApplicationIdleHook at C:/Users/Olavi/esp/esp-idf/components/esp_common/src/freertos_hooks.c:51 (discriminator 1)
0x40084f1a: prvIdleTask at C:/Users/Olavi/esp/esp-idf/components/freertos/tasks.c:3386 (discriminator 1)
0x4008459d: vPortTaskWrapper at C:/Users/Olavi/esp/esp-idf/components/freertos/xtensa/port.c:143
1300000 times 47 fibonaccies with f32 in 5416045 us ( 4.166). F47 = 2971214848, expected 2971215073.
The ESP32-S2 gave these results
Code: Select all
Starting time comparisons on ESP32-S2/v4.2
10000 times 47 fibonaccies with u32 in 40919 us ( 4.092). F47 is OK.
100000 times 47 fibonaccies with u32 in 408985 us ( 4.090). F47 is OK.
10000 times 47 fibonaccies with u64 in 61167 us ( 6.117). F47 is OK.
100000 times 47 fibonaccies with u64 in 611609 us ( 6.116). F47 is OK.
10000 times 47 fibonaccies with f32 in 178052 us ( 17.805). F47 = 2971214848, expected 2971215073.
100000 times 47 fibonaccies with f32 in 1780458 us ( 17.805). F47 = 2971214848, expected 2971215073.
10000 times 47 fibonaccies with f64 in 208129 us ( 20.813). F47 is OK.
100000 times 47 fibonaccies with f64 in 2081268 us ( 20.813). F47 is OK.
200000 times 47 fibonaccies with f32 in 3560910 us ( 17.805). F47 = 2971214848, expected 2971215073.
E (29903) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (29903) task_wdt: - IDLE0 (CPU 0)
E (29903) task_wdt: Tasks currently running:
E (29903) task_wdt: CPU 0: main
E (29903) task_wdt: Print CPU 0 (current core) backtrace
Backtrace:0x40082956:0x3FFBD4B0 0x40025C9E:0x3FFBD4D0 0x400081CA:0x3FFC17A0 0x4008320F:0x3FFC17B0 0x40083290:0x3FFC17D0 0x40083394:0x3FFC1800 0x40081D33:0x3FFC1820 0x40027851:0x3FFC1850
0x40082956: task_wdt_isr at C:/Users/Olavi/esp/esp-idf/components/esp_common/src/task_wdt.c:187
0x40025c9e: _xt_lowint1 at C:/Users/Olavi/esp/esp-idf/components/freertos/xtensa/xtensa_vectors.S:1105
0x4008320f: f32Loop at C:\Users\Olavi\Documents\VS_ESP_IDF\SpeedDemo\Speed_v0_3\build/../main/SpeedDemo.c:65 (discriminator 3)
0x40083290: runTest at C:\Users\Olavi\Documents\VS_ESP_IDF\SpeedDemo\Speed_v0_3\build/../main/SpeedDemo.c:94
0x40083394: app_main at C:\Users\Olavi\Documents\VS_ESP_IDF\SpeedDemo\Speed_v0_3\build/../main/SpeedDemo.c:119 (discriminator 3)
0x40081d33: main_task at C:/Users/Olavi/esp/esp-idf/components/esp32s2/cpu_start.c:453 (discriminator 2)
0x40027851: vPortTaskWrapper at C:/Users/Olavi/esp/esp-idf/components/freertos/xtensa/port.c:143
300000 times 47 fibonaccies with f32 in 5376149 us ( 17.920). F47 = 2971214848, expected 2971215073.
Compared to my previous tests, I am buzzled why the single floating point calculations gave the right result after 10000 loops when that should have not been possible with 24 bits. After 100000 loops the result was more wrong as it should have been.
With the new setup, the single floating point is failing in a consistent way. The double floating point calculations are giving right result as expected.
The main problem is still with the WDT. In ESP32 the real failure is detected when over 5 seconds is used in the test loops. In ESP32-S2, it takes the same time to fail.
Here is a summary of the relative performance with different types on ESP32 and ESP32-S2
Code: Select all
ESP32 ESP32-S2
uint32_t 4.090 4.090
uint64_t 6.117 6.116
float 4.134 17.805
double 20.815 20.813
The conclusion is that ESP32-S2 has some issues in the FPU.