Building and running a module beyond the original image file

kolban · Postby **kolban** » Tue Feb 20, 2018 5:36 am

When I compile a C application for the ESP32 this results in an ELF file which we then process with "esptool elf2image" that produces a binary. I then flash that binary to the flash storage of my ESP32. I then reboot my ESP32 and the program runs. Excellent.

Now imagine that I have a source file. Lets call if "module.c". Let's assume that it is a pure algorithmic C source file that calls no external functions nor makes anything but an entry point available.

Is there any possible way I can prepare that source file (for example module.c -> module.o) and then create an ELF file from it and then explicitly load that that code from a flash partition and have it run?

Think along the lines of dynamically loaded code but without the notion of dynamic linking. The module is 100% self contained and is (I believe) comprised of:

* code
* static RO Data
* Initialized data
* BSS

Is there some recipe that I can follow or technique hints someone might have for getting such code running?

The back-story ...

I want to compile C source code (and eventually C++ source code) to some form of executable (not necessarily leveraging any external symbols) and load that into flash and then have it executed through a call/return paradigm from a regular ESP32 ESP-IDF app.

WiFive · Postby **WiFive** » Tue Feb 20, 2018 6:19 am

https://stackoverflow.com/questions/219 ... m-freertos

Just to show it can be possible
http://www.nuttx.org/Documentation/NuttXBinfmt.html
https://github.com/embedded2014/elf-loader

p-rimes · Postby **p-rimes** » Tue Feb 20, 2018 12:00 pm

These may help: (I can tell you that what you propose *does* work)

https://esp32.com/viewtopic.php?f=13&t=4293
https://github.com/espressif/esp-idf/issues/1554

Some things that I have learned since:

I chose to use regular dynamic library flags (-nostdlib -shared) when linking the ELF binary, and in the runtime loading process just avoid the initial PLT stubs by setting up the GOT and all relocations before executing.
You will need to load the right sections into the right RAM arenas (MALLOC_CAP_EXEC vs MALLOC_CAP_8BIT), otherwise you will get InstrFetch exceptions and/or LoadStoreExceptions. You will not be able to write arbitrary relocs into the instruction RAM, so I allocate a temporary buffer for .text, do the relocs, then memcpy that into the real buffer.

p-rimes · Postby **p-rimes** » Tue Feb 20, 2018 12:08 pm

Basically at runtime you end up parsing the ELF (using some library -- I used libelfin), figuring out which sections you need, taking advice from the flags (R/O vs R/W vs Exec), doing the relocs in regular RAM, and copying the relevant (reloc'd) sections into the right RAM type.

You can also strip out useless sections from the binary in advance, so that the ELF segments contain only interesting sections, and then at runtime copy the entire segments into RAM (many sections in one segment, so there's fewer buffers to maintain).

PS Initially I used non-shared linking with -pie (not -shared), and that was a good stepping stone. I re-implemented the loader for shared libs because it is easier to export dynamic symbols (as a .so file would). This was going against the advice of jcmvbkbc in the github thread above, but I found it more clear and it fits my mental model of what is happening (dynamically loaded code without a single _start point)

p-rimes · Postby **p-rimes** » Tue Feb 20, 2018 12:41 pm

Also, two security notes:

It is completely insecure to jump to dynamic RAM that is still R/W. If you can't trust the code completely, then it should be necessary to use the ESP32 MMUs and have your loader as PID 0 and run the code as PID 2~7
You mentioned not needing this, but eventually you will probably want to call other symbols (printf, sin()/cos(), memcpy) already in the full image, within your module. For that, I run a script that generates a .h/.c file from running nm on the final ELF image, that makes a map of symbol name to address. (I then re-compile and flash with this generated symbol table). You can/should make this a whitelist with allowed/safe symbols that can be called from within the module.

kolban · Postby **kolban** » Sat Feb 24, 2018 6:45 pm

@p-rimes,
I'm wondering if you would be available to discuss (person to person) some thoughts? I can be reached at kolban1@kolban.com or via IRC at #ESP32.

My puzzles right now are understanding the memory mappings and address spaces.

Keeping it simple, let's just think for just now about executable code (.text).

From my loose understanding, if I compile C code to generate binary and place that in flash, then that binary appears in the ESP32 address range 0x400C 2000 to 0x40BF FFFF. If I now imagine that I have my "constant" code compiled ... it may result in binary that thinks of itself as living at address range:

0x400C 2000 - 0x4014 2000 (assuming a 512K code executable)

My thinking is that I can now compile my "piggyback" code and compile it to think it exists at 0x4020 2000.

If in flash I then have:

base code - 0x0000 0000 - 0x0008 0000 (512K)
piggyback code - 0x0014 0000 to 0x001C 0000 (512K)

Then the base code will "appear" at address 0x400C 2000 and the piggyback code will "appear" at 0x4020 2000 ... both of these in ESP32 address space. I can then make a call from base-code into piggyback code somewhere north of 0x4020 2000.

Is this thinking along the right lines?

p-rimes · Postby **p-rimes** » Sat Feb 24, 2018 8:56 pm

That approach would also work, if I understand correctly you mean to utilize a custom linker script and "partition" available memory into distinct chunks (a bit like how OTA works). If so, then I believe that is termed the "overlay" approach (e.g. eventually with more than 1 module you would have one dedicated "overlay manager" that copies the right modules from flash -> RAM at the target location):

https://en.wikipedia.org/wiki/Overlay_(programming)

If you are careful/explicit with your linker script then you can locate your individual symbols at specific memory addresses within a specific overlay segment (with ample padding for future versions which may be larger in size). So future modules (that you might OTA) would keep their entrypoints at the same well-known address that you setup -- that is, new segments "overlay" onto previous ones in a compatible way. The idea is that you can have many more modules stored on flash than you have space in RAM, and only load the ones you need at runtime to save on RAM.

The overlay approach is sound, and battle-tested, but a bit old. Nowadays I would say virtual memory (+ dynamic linking) is the preferred way to handle the issue where you have more flash space than RAM, since with overlays you have to be pretty careful and it can be very brittle. It also leads to fragmentation problems (e.g. you left too much padding in the partition size, wasting space), and/or issues where a new overlay is slightly too big to replace the old one (e.g. you didn't leave enough padding for future fixes). Sometimes updates will have to span over multiple OTA cycles, when you are resizing overlays, etc. However I have worked with this on production devices, and it did work, and it did prove indispensable a couple times for fixing bugs in the field.

One nice thing is that you do not have to modify the binaries (e.g. runtime load them), and in fact you can (and should) store a checksum at a specific address in the overlay, and then checksum against the overlay's memory space before executing it (I think you need to do this in two places to prevent spoofing, but I can't recall 100%). Also, when you modify the linker script, then you really do know exactly where all your symbols are, even in the factory image. In some ways, it just makes more sense to do overlays, but the limitations are clear.

You might want to take a look at the ESP32 MMUs/MPUs in the reference manual, since those could provide user-mode protection at known overlay locations.

Personally, I am pursuing the dynamic loading approach, where you effectively store the modules in the heap alongside other heap memory, and so you have to modify the ELF binary each time before after copying it from flash -> RAM. However, this is actually surprisingly easy, if you have a way to read the ELF format then it tells you exactly what to do, and where (i.e. relocations). And in my experience the relocation math has always been the same one (I got lucky, I suppose): read the address from the .text section, add some known offset from the .rel section, and write that value back to the .text section in the same spot. The benefits are that I will still be able to use the MMUs, but also that with dynamic linking, I will be able to provide new symbol names at arbitrary new addresses. So new modules will be able to communicate with each other in new ways, rather than having to send out a whole new system image if you need to add features outside your overlays/linker script.

jumjum123 · Postby **jumjum123** » Tue May 08, 2018 2:05 pm

@kolban,
did you follow this idea ?
Generally it could be a big help, to create something like "user extended firmware"

kolban · Postby **kolban** » Tue May 08, 2018 2:13 pm

Howdy @jumjum,

I'm working on a project for a 3rd party that needs a fashion of dynamic loading. Think loosely of compiling an application that has a world of dependencies on libraries but what we don't want to do is flash a monolithic 1 MByte file. Imagine a user coded:

main() {
digitalWrite(D0, HIGH);
}

(this is a trivial example).

Instead of compiling this source file and then linking it with all the ESP-IDF libraries and other libraries, imagine we just compiled this code such that it had unresolveds (in this case "digitalWrite").

The size of the compiled C source files is a few Kbytes. What we can now do is "push" this to the ESP32 which already has the ESP-IDF and other libraries already in flash and running. At runtime, the resolveds are mapped and the program runs.

Its working very well and the solution is quite elegant.

At present, I can't give you the code but the project I am working on will be an open source solution and at that time, I must imagine that anyone who was really interested in doing similar could then look through the source code (eventually) and figure out how it was done. Hint ... Xtensa assembler was involved but it turned out not to be horrible.

The bottom line ... yes ... it is do-able.

jumjum123 · Postby **jumjum123** » Wed May 09, 2018 11:24 am

@kolban,
wow, that sounds very promising.
And the best, it will be open source

One more time, you made my day !!

Building and running a module beyond the original image file

Building and running a module beyond the original image file

Re: Building and running a module beyond the original image file

Re: Building and running a module beyond the original image file

Re: Building and running a module beyond the original image file

Re: Building and running a module beyond the original image file

Re: Building and running a module beyond the original image file

Re: Building and running a module beyond the original image file

Re: Building and running a module beyond the original image file

Re: Building and running a module beyond the original image file

Re: Building and running a module beyond the original image file

Who is online

About Us

Extra

Information