Use of dual core

Postby **ESP_Sprite** » Fri Jan 27, 2017 9:32 am

Actually, a critical section would be a pretty nice way to achieve this. Basically, this is how it should work:

static portMUX_TYPE my_spinlock = portMUX_INITIALIZER_UNLOCKED;

void my_task(void *arg) {
    while(1) {
        if (start_communication) {
             portENTER_CRITICAL(&my_spinlock);
             //Do all your timing-critical stuff here
             portEXIT_CRITICAL(&my_spinlock);
        } else {
             vTaskDelay(1); //wait for something to happen.
        }
    }
}

BakerMan · Postby **BakerMan** » Fri Jan 27, 2017 10:28 am

Hello again,

for all who may be interrested. Ive found a solution for my use case. As i described before, i start 2 tasks and pinned one to each cpu core. The first instruction in the Task running on cpu1 is:

Code: Select all

 vTaskEndScheduler ();

Thats it! I can see clean waveforms without interruptions (except the known Erratum, if you switch GPIO to fast).

On cpu0 i created 2 tasks that switch different GPIO Pins. Both are working fine so the scheduler is still running on CPU0!
Thats what i wanted to have. A OS-Environment on CPU0, and pure "undisturbed" power on CPU1.

I am still investigating, if the deactivation of the scheduler has side effects because of disabling cache from running apps on cpu0, or if i have to use IRAM_ATTR or DRAM_ATTR. But at the moment its working fine! What do you think? Ill report my results soon.

Kind regards!

Postby **ESP_Sprite** » Sat Jan 28, 2017 2:11 am

Seems vTaskEndScheduler actually is a stub; it just disables all interrupts, marks the scheduler on the core as not-running and returns. This should work, unless you decide to enable interrupts somewhere later in your routine again: that would also enable the tick interrupt as well, and with the scheduler disabled, I have no idea what the effect of that would be.

Can I ask what you're trying to achieve that cannot handle a 5uS delay? There may be better ways to do it than to spinloop a CPU on it.

WiFive · Postby **WiFive** » Sat Jan 28, 2017 2:30 am

What happens if there is a cache miss and the cache is disabled by the other CPU? If your entire cpu1 code/data fits in the cache window I guess this is not an issue but I would think it would be preferable to disable the cpu1 cache and reclaim that memory and mark all cpu1 functions as iram attr?

Postby **ESP_igrr** » Sat Jan 28, 2017 3:17 am

If the cache is disabled to perform flash operation, it doesn't matter whether there will be hit or miss — read will always return 0 for icache and some pattern like baadfood or similar for dcache. The point about reclaiming memory is a good one, though, we'll consider this.

kolban · Postby **kolban** » Sat Jan 28, 2017 4:33 am

I decided to run some tests ... I wrote a task that simply eats CPU in a loop and tells me how long it takes to process that loop. I ran 1 instance of the task and took a measurement, then 2 instances and then 3 instances and so on. The resulting graph is shown here:

What I am not understanding are the first two entries. It takes 338 msecs to run a loop instance with 1 task running and 678 msecs to run a loop with 2 tasks running. What I am being a dummy on ... is why doesn't it "also" take 338 msecs to complete BOTH loops with 2 tasks running when we have two cores? Naively I am imagining one task running on one core and a second task running on the second core and BOTH tasks each taking 338 msecs to complete the loop.

Here is my sample code ... and I am running with unmodified from defaults "make menuconfig":

Code: Select all

#include <esp_log.h>
#include <freertos/FreeRTOS.h>
#include <freertos/task.h>

#include "c_timeutils.h"
#include "sdkconfig.h"

static char tag[] = "tasks";

static void test1(void *param) {
	char *id = (char *)param;
	ESP_LOGD(tag, ">> %s", id);
	int i;
	while(1) {
		struct timeval start;
		gettimeofday(&start, NULL);
		int j=0;
		for (i=0; i<9000000; i++) {
			j=j+1;
		}
		ESP_LOGD(tag, "%s - tick: %d", id, timeval_durationBeforeNow(&start));
	}
	vTaskDelete(NULL);
}

void task_tests(void *ignore) {
	xTaskCreate(&test1, "task1", 2048, "Task1", 5, NULL);
	xTaskCreate(&test1, "task2", 2048, "Task2", 5, NULL);
	xTaskCreate(&test1, "task3", 2048, "Task3", 5, NULL);
	xTaskCreate(&test1, "task4", 2048, "Task4", 5, NULL);
	xTaskCreate(&test1, "task5", 2048, "Task5", 5, NULL);
	xTaskCreate(&test1, "task6", 2048, "Task6", 5, NULL);
	vTaskDelete(NULL);
}

WiFive · Postby **WiFive** » Sat Jan 28, 2017 5:37 am

Edit: read ahead for answer

Postby **ESP_Sprite** » Sat Jan 28, 2017 5:47 am

I think the compiler sees you're doing nothing with the result of i and j, and happily optimizes away the entire loop; the time will then mostly come down to the printf() which is bound for all processes by the serial port speed. Make one or both variables volatile, and you should have a more effective delay loop. (Fyi, "for (volatile int i=0; i<(1<<20); i++) ;" already gives quite a reasonable delay.)

WiFive · Postby **WiFive** » Sat Jan 28, 2017 5:58 am

ESP_igrr wrote:If the cache is disabled to perform flash operation, it doesn't matter whether there will be hit or miss — read will always return 0 for icache and some pattern like baadfood or similar for dcache. The point about reclaiming memory is a good one, though, we'll consider this.

Oh I see so in the case where cpu1 starts up under freertos and then the scheduler is disabled (or regardless of this?), cpu0 can still disable cpu1 when it disables the cache? This still has the potential to affect timing critical operations on cpu1 and thus not the best approach?

Actually it seems if cpu0 tries to disable the cache and cpu1 scheduler is not running cpu0 will get stuck in a wait loop.

kolban · Postby **kolban** » Sun Jan 29, 2017 4:47 am

Howdy Mr Sprite,
I tried changing the ints to be defined as volatile. But the result was exactly the same ... running 1 task measured an elapsed time for 9 million iterations of 700msecs and running two tasks measured an elapsed time for 9 million iterations of 1400msecs each loop. I.e. no parallelism.

Before changing the values to be volatiles ... I measured changing the loop iterations and the time to execute a loop varied exactly proportionally with the number of loop iterations ... so I have good confidence that the loop is burning CPU ... i.e. X loop iterations took t(X) and changing X resulted in a proportional change in t(X). As such, I'd like to again puzzle this through as to what I may have misunderstood about leveraging the second core for parallel tasks.

Use of dual core

Re: Use of dual core

Re: Use of dual core

Re: Use of dual core

Re: Use of dual core

Re: Use of dual core

Re: Use of dual core

Re: Use of dual core

Re: Use of dual core

Re: Use of dual core

Re: Use of dual core

Who is online

About Us

Extra

Information