Fast timer for microsecond level duration measurements

OllieK
Posts: 58
Joined: Mon Jan 18, 2016 8:08 am
Location: USA, PA
Contact:

Fast timer for microsecond level duration measurements

Postby OllieK » Tue Feb 02, 2021 10:27 pm

I am spoiled and/or biased based on my past experience in the STM32 world. It seemed more robust and functional than the ESP32 framework.

For all timing measurements, I did use the following macros to get a CPU clock frequency counter reading. This was in range from 72 to 480 MHz. There was a system variable indicating the CPU frequency to allow the translation of the counter reading into nano seconds or microseconds.

Code: Select all

#define initCycleCounter() \
    CoreDebug->DEMCR   |= CoreDebug_DEMCR_TRCENA_Msk; \
    DWT->CYCCNT         = 0; \
    DWT->CTRL          |= DWT_CTRL_CYCCNTENA_Msk


#define readCycleCounter()  (DWT->CYCCNT)
I was missing similar functionality in ESP32 and created the following macros for ESP-IDF VSCE.

Code: Select all

#include "soc/frc_timer_reg.h"

#ifdef CONFIG_ESP_TIMER_IMPL_FRC2
#define timer_u32() (REG_READ(0x3ff47004));     // FRC2
#elif CONFIG_ESP_TIMER_IMPL_TG0_LAC
#define timer_u32() (REG_READ(0x3ff5F078));     // TG0_LAC
#else
#define timer_u32() (0);                        // SYSTIMER
#endif
The SYSTIMER is too slow (50 us) for the fast timing measurements. The FRC2 values are based on the 80 MHz clock. The TG0_LAC values are based on 2 MHz clock.

In practice, the FRC2 should be used in IDF v4.1 and older. The TG0_LAC should be used in IDF 4.2 and newer. These selections are done by default when installing new versions. The selection can be changed with the SDK Configuration editor in the High resolution timer group.

At the application level, the usage is very easy

Code: Select all

uint32_t t0,t1,dt;

t0 = timer_u32();
<code to be measured>
t1 = timer_u32();
dt = t1 - t0;		// Delta time to execute the <code to be measured>
The timer_u32 is much faster than the esp_timer_get_time function. This is important when measuring very short durations.

There are no problems in the counter overflow. The basic uint32_t arithmetic works in all conditions all the time. The problems arrive when doing 64 bit level calculations (such as in esp_timer_get_time) in 32 bit architecture.

OllieK
Posts: 58
Joined: Mon Jan 18, 2016 8:08 am
Location: USA, PA
Contact:

Re: Fast timer for microsecond level duration measurements

Postby OllieK » Wed Feb 03, 2021 12:24 am

I am now getting my ESP32-S2 boards to work with v4.2. Based on those tests, I would like to update the timer_u32 macro/function.

Code: Select all

#include "soc/frc_timer_reg.h"

#ifdef CONFIG_ESP_TIMER_IMPL_FRC2
#define timer_u32() (REG_READ(0x3ff47004));     // FRC2
#elif CONFIG_ESP_TIMER_IMPL_TG0_LAC
#define timer_u32() (REG_READ(0x3ff5F078));     // TG0_LAC
#else
                                                // SYSTIMER
__attribute__((always_inline)) static inline uint32_t timer_u32(void) {
   REG_WRITE(0x3f423038,1UL << 31);             // Start synch
   while (!REG_GET_BIT(0x3f423038,1UL << 30));  // Wait synch
   return REG_READ(0x3f423040);                 // Read SYSTIMER_VALUE_LO_REG
}
#endif
The change is required because with ESP32-S2 the FRC2 and TG0_LAC options are not available. The SYSTIMER option is updated to give the same value as was available with FRC2. That means that with ESP32-S2 the function timer_u32 returns the current value of a 80 MHz counter.

Note that with ESP32-S2, the timer_u32 resolution is 80 times better than the resolution with esp_timer_get_time .

OllieK
Posts: 58
Joined: Mon Jan 18, 2016 8:08 am
Location: USA, PA
Contact:

Re: Fast timer for microsecond level duration measurements

Postby OllieK » Thu Feb 04, 2021 1:04 am

Just to have some fun with timer_u32 function, I measured some loop times with different number representations (uint32_t, uint64_t, and float). As a basic load, I did use the Fibonacci numbers up to F(47) = 2971215073, which happens to be the largest Fibonacci number fitting in a 32 bit integer.

Here is part of the code

Code: Select all

    t0 = timer_u32();
    n = 10*1000;
    for (i = 0; i < n; i++ ) {
        a_32 = 0;
        b_32 = 1;
        for (j = 0; j < 46; j++) {
            f_32 = a_32 + b_32;
            a_32 = b_32;
            b_32 = f_32;
        }
    }
    t1 = timer_u32();
    dt = (t1 - t0) / 80;
    lastF = f_32;
    delta = (t1 - t0) / 80.0;
    printf("%11d times 47 fibonaccies with %s in %8d us (%7.3f). Last = %jd (29712150[Codebox][/Codebox]73)\n", n,"u32",dt,delta/n,lastF);
    vTaskDelay(100 / portTICK_PERIOD_MS);
The TaskDelay at the end is required to complete the printing before the following measurements.

Here are the results

Code: Select all

 10000 times 47 fibonaccies with u32 in    38292 us (  3.829). Last = 2971215073 (2971215073)
100000 times 47 fibonaccies with u32 in   382725 us (  3.827). Last = 2971215073 (2971215073)
 10000 times 47 fibonaccies with u64 in    64290 us (  6.429). Last = 2971215073 (2971215073)
100000 times 47 fibonaccies with u64 in   642880 us (  6.429). Last = 2971215073 (2971215073)
 10000 times 47 fibonaccies with f32 in   175421 us ( 17.542). Last = 2971214848 (2971215073)
100000 times 47 fibonaccies with f32 in  1783571 us ( 17.836). Last = 3268336896 (2971215073)
The interesting revelation is that u32 calculations are around twice as fast as u64 calculations in ESP32-S2 and u32 calculations are around 5 times faster than floating point calculations.

The WORST REVELATION was that the floating point calculations can give wrong results under heavy load.

When the floating point looping was increased from 100000 to 500000 then there was the following error message before the calculation was completed.

Code: Select all

E (13883) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (13883) task_wdt:  - IDLE0 (CPU 0)
E (13883) task_wdt: Tasks currently running:
E (13883) task_wdt: CPU 0: main
E (13883) task_wdt: Print CPU 0 (current core) backtrace
To me, it is unacceptable that the calculation results are failing due to operating system assumptions or similar reasons. In real world
3268336896 is not 2971215073.

ESP_Sprite
Posts: 9719
Joined: Thu Nov 26, 2015 4:08 am

Re: Fast timer for microsecond level duration measurements

Postby ESP_Sprite » Thu Feb 04, 2021 3:41 am

That is odd and if this is indeed caused by the WDT timeout a pretty serious bug. Can you upload the entire project somewhere so we can download it and try to reproduce the issue?

OllieK
Posts: 58
Joined: Mon Jan 18, 2016 8:08 am
Location: USA, PA
Contact:

Re: Fast timer for microsecond level duration measurements

Postby OllieK » Thu Feb 04, 2021 8:06 pm

@ESP_Sprite, Here is the whole code

Code: Select all

#include <stdio.h>
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "soc/frc_timer_reg.h"

#ifdef CONFIG_ESP_TIMER_IMPL_FRC2
#define timer_u32() (REG_READ(0x3ff47004));     // FRC2
#elif CONFIG_ESP_TIMER_IMPL_TG0_LAC
#define timer_u32() (REG_READ(0x3ff5F078));     // TG0_LAC
#else
                                                // SYSTIMER
__attribute__((always_inline)) static inline uint32_t timer_u32(void) {
   REG_WRITE(0x3f423038,1UL << 31);             // Start synch
   while (!REG_GET_BIT(0x3f423038,1UL << 30));  // Wait synch
   return REG_READ(0x3f423040);                 // Read SYSTIMER_VALUE_LO_REG
}
#endif

void app_main(void) {
    uint32_t t0,t1,dt;
    uint32_t i,j,n;
    uint32_t a_32,b_32,f_32;
    uint64_t a_64,b_64,f_64;
    float a_f32,b_f32,f_f32;

    uint64_t lastF;
    float delta;
    f_32 = 0;
    f_64 = 0;
    f_f32 = 0.0;

    printf("Starting time comparisons\n");

    t0 = timer_u32();
    n = 10*1000;
    for (i = 0; i < n; i++ ) {
        a_32 = 0;
        b_32 = 1;
        for (j = 0; j < 46; j++) {
            f_32 = a_32 + b_32;
            a_32 = b_32;
            b_32 = f_32;
        }
    }
    t1 = timer_u32();
    dt = (t1 - t0) / 80;
    lastF = f_32;
    delta = (t1 - t0) / 80.0;
    printf("%11d times 47 fibonaccies with %s in %8d us (%7.3f). Last = %jd (2971215073)\n", n,"u32",dt,delta/n,lastF);
    vTaskDelay(100 / portTICK_PERIOD_MS);

    t0 = timer_u32();
    n = 100*1000;
    for (i = 0; i < n; i++ ) {
        a_32 = 0;
        b_32 = 1;
        for (j = 0; j < 46; j++) {
            f_32 = a_32 + b_32;
            a_32 = b_32;
            b_32 = f_32;
        }
    }
    t1 = timer_u32();
    dt = (t1 - t0) / 80;
    lastF = f_32;
    delta = (t1 - t0) / 80.0;
    printf("%11d times 47 fibonaccies with %s in %8d us (%7.3f). Last = %jd (2971215073)\n", n,"u32",dt,delta/n,lastF);
    vTaskDelay(100 / portTICK_PERIOD_MS);
    
    t0 = timer_u32();
    n = 10*1000;
    for (i = 0; i < n; i++ ) {
        a_64 = 0;
        b_64 = 1;
        for (j = 0; j < 46; j++) {
            f_64 = a_64 + b_64;
            a_64 = b_64;
            b_64 = f_64;
        }
    }
    t1 = timer_u32();
    dt = (t1 - t0) / 80;
    lastF = f_64;
    delta = (t1 - t0) / 80.0;
    printf("%11d times 47 fibonaccies with %s in %8d us (%7.3f). Last = %jd (2971215073)\n", n,"u64",dt,delta/n,lastF);
    vTaskDelay(100 / portTICK_PERIOD_MS);

    t0 = timer_u32();
    n = 100*1000;
    for (i = 0; i < n; i++ ) {
        a_64 = 0;
        b_64 = 1;
        for (j = 0; j < 46; j++) {
            f_64 = a_64 + b_64;
            a_64 = b_64;
            b_64 = f_64;
        }
    }
    t1 = timer_u32();
    dt = (t1 - t0) / 80;
    lastF = f_64;
    delta = (t1 - t0) / 80.0;
    printf("%11d times 47 fibonaccies with %s in %8d us (%7.3f). Last = %jd (2971215073)\n", n,"u64",dt,delta/n,lastF);
    vTaskDelay(100 / portTICK_PERIOD_MS);
    
    t0 = timer_u32();
    n = 10*1000;
    for (i = 0; i < n; i++ ) {
        a_f32 = 0;
        b_f32 = 1;
        for (j = 0; j < 46; j++) {
            f_f32 = a_f32 + b_f32;
            a_f32 = b_f32;
            b_f32 = f_f32;
        }
    }
    t1 = timer_u32();
    dt = (t1 - t0) / 80;
    lastF = f_f32;
    delta = (t1 - t0) / 80.0;
    printf("%11d times 47 fibonaccies with %s in %8d us (%7.3f). Last = %jd (2971215073)\n", n,"f32",dt,delta/n,lastF);
    vTaskDelay(100 / portTICK_PERIOD_MS);

    t0 = timer_u32();
    n = 500*1000;
    for (i = 0; i < n; i++ ) {
        a_f32 = 0.0;
        b_f32 = 1.1;
        for (j = 0; j < 46; j++) {
            f_f32 = a_f32 + b_f32;
            a_f32 = b_f32;
            b_f32 = f_f32;
        }
    }
    t1 = timer_u32();
    dt = (t1 - t0) / 80;
    lastF = f_f32;
    delta = (t1 - t0) / 80.0;
    printf("%11d times 47 fibonaccies with %s in %8d us (%7.3f). Last = %jd (2971215073)\n", n,"f32",dt,delta/n,lastF);
    vTaskDelay(100 / portTICK_PERIOD_MS);
    
    vTaskDelay(10000 / portTICK_PERIOD_MS);
}
In addition of the WDT problem, do you have any comments about the wrong floating point calculation result.

OllieK
Posts: 58
Joined: Mon Jan 18, 2016 8:08 am
Location: USA, PA
Contact:

Re: Fast timer for microsecond level duration measurements

Postby OllieK » Fri Feb 05, 2021 3:31 am

My curiosity increased about the basic arithmetic performance with different types on ESP32 and ESP32-S2 (on IDF v4.2) increased. I did replace the timer_u32 function with the esp_timer_get_time to reduce the effort to replicate the measurements on different platforms. I did add the measurements for double calculations.

With this code

Code: Select all

#include <stdio.h>
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "soc/frc_timer_reg.h"

#ifdef CONFIG_ESP_TIMER_IMPL_FRC2
#define timer_u32() (REG_READ(0x3ff47004))      // FRC2
#define TICKS_PER_US (80)
#elif CONFIG_ESP_TIMER_IMPL_TG0_LAC
#define timer_u32() (REG_READ(0x3ff5F078))      // TG0_LAC
#define TICKS_PER_US (2)
#else
                                                // SYSTIMER
__attribute__((always_inline)) static inline uint32_t timer_u32(void) {
   REG_WRITE(0x3f423038,1UL << 31);             // Start synch
   while (!REG_GET_BIT(0x3f423038,1UL << 30));  // Wait synch
   return REG_READ(0x3f423040);                 // Read SYSTIMER_VALUE_LO_REG
}
#define TICKS_PER_US (80)
#endif

#define F47 (2971215073)
#define FIBCOUNT (47)

uint32_t u32Loop(uint32_t n) {
    uint32_t a,b,f;
    uint32_t i,j;
    f = 0;
    for (i = 0; i < n; i++ ) {
        a = 0;
        b = 1;
        for (j = 0; j < (FIBCOUNT - 1); j++) {
            f = a + b;
            a = b;
            b = f;
        }
    }
    return f;
}

uint32_t u64Loop(uint32_t n) {
    uint64_t a,b,f;
    uint32_t i,j;
    f = 0;
    for (i = 0; i < n; i++ ) {
        a = 0;
        b = 1;
        for (j = 0; j < (FIBCOUNT - 1); j++) {
            f = a + b;
            a = b;
            b = f;
        }
    }
    return f;
}

uint32_t f32Loop(uint32_t n) {
    float a,b,f;
    uint32_t i,j;
    f = 0.0;
    for (i = 0; i < n; i++ ) {
        a = 0.0;
        b = 1.0;
        for (j = 0; j < (FIBCOUNT - 1); j++) {
            f = a + b;
            a = b;
            b = f;
        }
    }
    return f;
}

uint32_t f64Loop(uint32_t n) {
    double a,b,f;
    uint32_t i,j;
    f = 0.0;
    for (i = 0; i < n; i++ ) {
        a = 0.0;
        b = 1.0;
        for (j = 0; j < (FIBCOUNT - 1); j++) {
            f = a + b;
            a = b;
            b = f;
        }
    }
    return f;
}

void runTest(uint32_t n, uint32_t loop(uint32_t), char* typeId) {
    uint32_t t0,t1;
    uint64_t lastF;
    float delta;
    t0 = esp_timer_get_time();
    lastF = loop(n);
    t1 = esp_timer_get_time();
    delta = t1 - t0;
    printf("%11d times %d fibonaccies with %s in %8.0f us (%7.3f).",n,FIBCOUNT,typeId,delta,delta/n);
    if (lastF == F47) {
        printf(" F47 is OK.\n");
    } else {
        printf(" F47 = %jd, expected %jd.\n", lastF,F47);
    }  
    vTaskDelay(100 / portTICK_PERIOD_MS);
}

void app_main(void) {
    printf("Starting time comparisons on ESP32 Pico D4/v4.2\n");

    runTest(10*1000,u32Loop,"u32");
    runTest(100*1000,u32Loop,"u32");
    runTest(10*1000,u64Loop,"u64");
    runTest(100*1000,u64Loop,"u64");
    runTest(10*1000,f32Loop,"f32");
    runTest(100*1000,f32Loop,"f32");
    runTest(10*1000,f64Loop,"f64");
    runTest(100*1000,f64Loop,"f64");
    
    for (int i = 2; i<14; i++) {
        runTest(i*100*1000,f32Loop,"f32");
    }
}
The ESP32 Pico D4 gave the following results

Code: Select all

Starting time comparisons on ESP32 Pico D4/v4.2
      10000 times 47 fibonaccies with u32 in    40950 us (  4.095). F47 is OK.
     100000 times 47 fibonaccies with u32 in   409044 us (  4.090). F47 is OK.
      10000 times 47 fibonaccies with u64 in    61178 us (  6.118). F47 is OK.
     100000 times 47 fibonaccies with u64 in   611698 us (  6.117). F47 is OK.
      10000 times 47 fibonaccies with f32 in    41355 us (  4.135). F47 = 2971214848, expected 2971215073.
     100000 times 47 fibonaccies with f32 in   413426 us (  4.134). F47 = 2971214848, expected 2971215073.
      10000 times 47 fibonaccies with f64 in   208154 us ( 20.815). F47 is OK.
     100000 times 47 fibonaccies with f64 in  2081548 us ( 20.815). F47 is OK.
     200000 times 47 fibonaccies with f32 in   826852 us (  4.134). F47 = 2971214848, expected 2971215073.
     300000 times 47 fibonaccies with f32 in  1240284 us (  4.134). F47 = 2971214848, expected 2971215073.
     400000 times 47 fibonaccies with f32 in  1653710 us (  4.134). F47 = 2971214848, expected 2971215073.
     500000 times 47 fibonaccies with f32 in  2067135 us (  4.134). F47 = 2971214848, expected 2971215073.
     600000 times 47 fibonaccies with f32 in  2480567 us (  4.134). F47 = 2971214848, expected 2971215073.
     700000 times 47 fibonaccies with f32 in  2893993 us (  4.134). F47 = 2971214848, expected 2971215073.
     800000 times 47 fibonaccies with f32 in  3307418 us (  4.134). F47 = 2971214848, expected 2971215073.
     900000 times 47 fibonaccies with f32 in  3720851 us (  4.134). F47 = 2971214848, expected 2971215073.
    1000000 times 47 fibonaccies with f32 in  4134276 us (  4.134). F47 = 2971214848, expected 2971215073.
    1100000 times 47 fibonaccies with f32 in  4547701 us (  4.134). F47 = 2971214848, expected 2971215073.
    1200000 times 47 fibonaccies with f32 in  4961134 us (  4.134). F47 = 2971214848, expected 2971215073.
E (58506) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (58506) task_wdt:  - IDLE0 (CPU 0)
E (58506) task_wdt: Tasks currently running:
E (58506) task_wdt: CPU 0: main
E (58506) task_wdt: CPU 1: IDLE1
E (58506) task_wdt: Print CPU 0 (current core) backtrace


Backtrace:0x400D21B0:0x3FFB0760 0x40082479:0x3FFB0780 0x400D2B05:0x3FFB4690 0x400D2B72:0x3FFB46B0 0x400D2C6C:0x3FFB46E0 0x400D17EE:0x3FFB4700 0x4008459D:0x3FFB4730
0x400d21b0: task_wdt_isr at C:/Users/Olavi/esp/esp-idf/components/esp_common/src/task_wdt.c:187

0x40082479: _xt_lowint1 at C:/Users/Olavi/esp/esp-idf/components/freertos/xtensa/xtensa_vectors.S:1105

0x400d2b05: f32Loop at C:\Users\Olavi\Documents\VS_ESP_IDF\SpeedDemo\Speed_v0_3\build/../main/SpeedDemo.c:62

0x400d2b72: runTest at C:\Users\Olavi\Documents\VS_ESP_IDF\SpeedDemo\Speed_v0_3\build/../main/SpeedDemo.c:94

0x400d2c6c: app_main at C:\Users\Olavi\Documents\VS_ESP_IDF\SpeedDemo\Speed_v0_3\build/../main/SpeedDemo.c:119 (discriminator 3)

0x400d17ee: main_task at C:/Users/Olavi/esp/esp-idf/components/esp32/cpu_start.c:600 (discriminator 2)

0x4008459d: vPortTaskWrapper at C:/Users/Olavi/esp/esp-idf/components/freertos/xtensa/port.c:143


E (58506) task_wdt: Print CPU 1 backtrace


Backtrace:0x40081863:0x3FFB0D60 0x40082479:0x3FFB0D80 0x4000BFED:0x3FFB5540 0x400847F5:0x3FFB5550 0x400D23F4:0x3FFB5570 0x400D23FF:0x3FFB55A0 0x400D249D:0x3FFB55C0 0x40084F1A:0x3FFB55E0 0x4008459D:0x3FFB5600
0x40081863: esp_crosscore_isr at C:/Users/Olavi/esp/esp-idf/components/esp32/crosscore_int.c:80

0x40082479: _xt_lowint1 at C:/Users/Olavi/esp/esp-idf/components/freertos/xtensa/xtensa_vectors.S:1105

0x400847f5: vPortExitCritical at C:/Users/Olavi/esp/esp-idf/components/freertos/xtensa/port.c:419

0x400d23f4: esp_task_wdt_reset at C:/Users/Olavi/esp/esp-idf/components/esp_common/src/task_wdt.c:333

0x400d23ff: idle_hook_cb at C:/Users/Olavi/esp/esp-idf/components/esp_common/src/task_wdt.c:88

0x400d249d: esp_vApplicationIdleHook at C:/Users/Olavi/esp/esp-idf/components/esp_common/src/freertos_hooks.c:51 (discriminator 1)

0x40084f1a: prvIdleTask at C:/Users/Olavi/esp/esp-idf/components/freertos/tasks.c:3386 (discriminator 1)

0x4008459d: vPortTaskWrapper at C:/Users/Olavi/esp/esp-idf/components/freertos/xtensa/port.c:143


    1300000 times 47 fibonaccies with f32 in  5416045 us (  4.166). F47 = 2971214848, expected 2971215073.
The ESP32-S2 gave these results

Code: Select all

Starting time comparisons on ESP32-S2/v4.2
      10000 times 47 fibonaccies with u32 in    40919 us (  4.092). F47 is OK.
     100000 times 47 fibonaccies with u32 in   408985 us (  4.090). F47 is OK.
      10000 times 47 fibonaccies with u64 in    61167 us (  6.117). F47 is OK.
     100000 times 47 fibonaccies with u64 in   611609 us (  6.116). F47 is OK.
      10000 times 47 fibonaccies with f32 in   178052 us ( 17.805). F47 = 2971214848, expected 2971215073.
     100000 times 47 fibonaccies with f32 in  1780458 us ( 17.805). F47 = 2971214848, expected 2971215073.
      10000 times 47 fibonaccies with f64 in   208129 us ( 20.813). F47 is OK.
     100000 times 47 fibonaccies with f64 in  2081268 us ( 20.813). F47 is OK.
     200000 times 47 fibonaccies with f32 in  3560910 us ( 17.805). F47 = 2971214848, expected 2971215073.
E (29903) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (29903) task_wdt:  - IDLE0 (CPU 0)
E (29903) task_wdt: Tasks currently running:
E (29903) task_wdt: CPU 0: main
E (29903) task_wdt: Print CPU 0 (current core) backtrace


Backtrace:0x40082956:0x3FFBD4B0 0x40025C9E:0x3FFBD4D0 0x400081CA:0x3FFC17A0 0x4008320F:0x3FFC17B0 0x40083290:0x3FFC17D0 0x40083394:0x3FFC1800 0x40081D33:0x3FFC1820 0x40027851:0x3FFC1850
0x40082956: task_wdt_isr at C:/Users/Olavi/esp/esp-idf/components/esp_common/src/task_wdt.c:187

0x40025c9e: _xt_lowint1 at C:/Users/Olavi/esp/esp-idf/components/freertos/xtensa/xtensa_vectors.S:1105

0x4008320f: f32Loop at C:\Users\Olavi\Documents\VS_ESP_IDF\SpeedDemo\Speed_v0_3\build/../main/SpeedDemo.c:65 (discriminator 3)

0x40083290: runTest at C:\Users\Olavi\Documents\VS_ESP_IDF\SpeedDemo\Speed_v0_3\build/../main/SpeedDemo.c:94

0x40083394: app_main at C:\Users\Olavi\Documents\VS_ESP_IDF\SpeedDemo\Speed_v0_3\build/../main/SpeedDemo.c:119 (discriminator 3)

0x40081d33: main_task at C:/Users/Olavi/esp/esp-idf/components/esp32s2/cpu_start.c:453 (discriminator 2)

0x40027851: vPortTaskWrapper at C:/Users/Olavi/esp/esp-idf/components/freertos/xtensa/port.c:143


     300000 times 47 fibonaccies with f32 in  5376149 us ( 17.920). F47 = 2971214848, expected 2971215073.
Compared to my previous tests, I am buzzled why the single floating point calculations gave the right result after 10000 loops when that should have not been possible with 24 bits. After 100000 loops the result was more wrong as it should have been.

With the new setup, the single floating point is failing in a consistent way. The double floating point calculations are giving right result as expected.

The main problem is still with the WDT. In ESP32 the real failure is detected when over 5 seconds is used in the test loops. In ESP32-S2, it takes the same time to fail.

Here is a summary of the relative performance with different types on ESP32 and ESP32-S2

Code: Select all

                 ESP32       ESP32-S2
 uint32_t         4.090       4.090
 uint64_t         6.117       6.116
 float            4.134      17.805
 double          20.815      20.813
The conclusion is that ESP32-S2 has some issues in the FPU.

ESP_Sprite
Posts: 9719
Joined: Thu Nov 26, 2015 4:08 am

Re: Fast timer for microsecond level duration measurements

Postby ESP_Sprite » Fri Feb 05, 2021 8:08 am

OllieK wrote:
Fri Feb 05, 2021 3:31 am
The conclusion is that ESP32-S2 has some issues in the FPU.
Well, I wouldn't call it 'issues with the FPU', given the fact that the ESP32S2 does not have a FPU... but yeah, if your findings are correct, there certainly are some issues, possibly with register saving and loading.

ESP_Sprite
Posts: 9719
Joined: Thu Nov 26, 2015 4:08 am

Re: Fast timer for microsecond level duration measurements

Postby ESP_Sprite » Fri Feb 05, 2021 8:44 am

I tried replicating your testcase, both on v4.2 as well as latest master on esp-idf, but I couldn't, I nicely got 2971214848 every time (aside from a bunch of wdt timeouts). Anything else specific you may have set in your config?

Also, I don't see the 'correct results after 10000 loops' you're pointing out in your printout. Are you sure you're not looking at the F64 result?

OllieK
Posts: 58
Joined: Mon Jan 18, 2016 8:08 am
Location: USA, PA
Contact:

Re: Fast timer for microsecond level duration measurements

Postby OllieK » Fri Feb 05, 2021 11:00 pm

Thanks for addressing this issue. After refactoring the test case into more generic case, the wrong calculation results went away. In the new code the test results are as expected. That is that all the f32 results are not accurate
runTest(10*1000,f32Loop,"f32");
runTest(100*1000,f32Loop,"f32");
runTest(10*1000,f64Loop,"f64");
runTest(100*1000,f64Loop,"f64");

for (int i = 2; i<14; i++) {
runTest(i*100*1000,f32Loop,"f32");
}

On other hand, the two results for f64 are correct - as they should be.

Now I have to eat the humble pie :oops: and admit that I had a typo in my original code for the 500,000 f32 loops. The test code initialized the Fibonacci series with 0.0 and 1.1 instead of 0.0 and 1.0. I should have noticed that in the first place. Sorry for the confusion I did cause.

ESP_Sprite
Posts: 9719
Joined: Thu Nov 26, 2015 4:08 am

Re: Fast timer for microsecond level duration measurements

Postby ESP_Sprite » Sat Feb 06, 2021 8:59 am

No worries, thanks for clarifying what went wrong.

Who is online

Users browsing this forum: No registered users and 55 guests