Overlay Code with GCC

macca · 2016-04-08 07:48

Programs that use overlay code were very popular in the era of early home computer systems when the amount of memory was not much, and the ability to load portions of code only when necessary, allowed to make programs much larger than the available memory.

With microcontrollers we are in a similar situation with sometimes very complex programs and a rather limited memory, just think for example to the management of the SD card file system or the internet access libraries, that can severely limit the memory available for the program itself, especially if they are to be used simultaneously.

With the Propeller we have 32K of internal RAM memory and an external EEPROM to load the code to run at power up. Since only the first 32K of the EEPROM are used for the program it is possible to dedicate the exceeding space of larger memories for data storage. Fortunately with the GCC compiler and the standard tools it is also possible to store portions of the program code to be loaded when necessary.

The program

We build as an example a small program that writes two sentences on the serial port using two functions stored in two separate sources, then we will manage them through overlay.

The two source files might look like the following:

overlay0.c:
	void overlay_function1()
	{
	    uart_print("Written by overlay_function1 !\r\n");
	}

overlay1.c:
	void overlay_function2()
	{
	    uart_print("Written by overlay_function2 !\r\n");
	}

The program could simply initialize the serial port of the microcontroller, write a header to indicate that it is working and call the two functions above. We also add the source to manage the EEPROM using the I2C bus on the standard pins.

During the exposure I will refer the source code without carrying it in completely to not make the post too long. The attached package contains the complete sample files that can be used as reference.

The linker script

To use the overlays we instruct the linker on which object files are the code and where to place them in the microcontroller memory. To do this we must write a custom script.

Start from the standard script that can be found in the propeller-elf/lib/ldscripts directory with the name propeller.x. Copy the file in your working directory and rename it as propeller_ovl.ld. Open a text editor and load the file.

At the beginning of the script we find the MEMORY section that defines the memory areas in which to place the code and data. We need to define a new area in which to place the overlay code so add a line, like the one shown, just below the line that starts with hub:

MEMORY
{
  hub     : ORIGIN = 0, LENGTH = 32K
  ovl     : ORIGIN = 28K, LENGTH = 4K
  cog     : ORIGIN = 0, LENGTH = 1984 /* 496*4 */

This line defines a memory area called ovl that begins at location 28762 (28K) at the end of the internal ram with size 4096 (4K) bytes. All object files defined as overlays will be loaded in this area.

Because the area is located in the microcontroller’s ram memory, we must therefore reduce the memory available for the normal program code, so change the hub line with a length of 32K-4K=28K. The first lines of the script will be like these:

MEMORY
{
  hub     : ORIGIN = 0, LENGTH = 28K
  ovl     : ORIGIN = 28K, LENGTH = 4K
  cog     : ORIGIN = 0, LENGTH = 1984 /* 496*4 */

At this point we must tell the linker which object files make up the overlay code, so let’s move down the script until we get to the SECTIONS section. The first lines of this section in the standard script should be similar to the following:

SECTIONS
{
  /* if we are not relocating (-r flag given) then discard the boot and bootpasm sections; otherwise keep them */
  /* the initial spin boot code, if any */
   .boot : { KEEP(*(.boot)) } >hub
   .bootpasm : { KEEP(*(.bootpasm)) } >bootpasm AT>hub

So we are going to add an OVERLAY section immediately after the opening brace so that it is read before any other instruction:

SECTIONS
{
  /* overlays */
  OVERLAY : NOCROSSREFS
  {
    .ovly0  { overlay0.o(.text .data) }
    .ovly1  { overlay1.o(.text .data) }
  } >ovl AT>drivers

  /* if we are not relocating (-r flag given) then discard the boot and bootpasm sections; otherwise keep them */
  /* the initial spin boot code, if any */
   .boot : { KEEP(*(.boot)) } >hub
   .bootpasm : { KEEP(*(.bootpasm)) } >bootpasm AT>hub

The OVERLAY : NOCROSSREFS line defines the beginning of the section informing the linker that we are defining the overlay code and that each overlay can not call the code in other overlays. In the lines between braces we are going to insert the list of objects files containing the overlay code and data, one per line. In the example above we have two objects files overlay0.o and overlay1.o.

The last line with the closing brace defines the memory area in which the code is executed, the ovl area defined at the beginning of the script, and where each object is physically stored, in the drivers area which corresponds to the EEPROM memory area above 32K.

At this point the script may already be used as the linker automatically produces the required symbols to load the objects, however to further simplyfying things we add a table that can be used to load the code with a simple index.

Go down a bit more in the script until the .data section and modify it as follows:

.data	  :
  {
    . = ALIGN(4);
    __ovly_table = .; 
        LONG(ABSOLUTE(ADDR(.ovly0))); LONG(SIZEOF(.ovly0)); LONG(LOADADDR(.ovly0));
        LONG(ABSOLUTE(ADDR(.ovly1))); LONG(SIZEOF(.ovly1)); LONG(LOADADDR(.ovly1));
  }  >hub AT>hub

The added rows defines a 2×3 long table called _ovly_table that we can use directly in C with a definition like the following:

extern uint32_t _ovly_table[2][3];

The first element of each row contains the address of the internal ram memory to load the code to, the second element contains the size in bytes of the code and the third element the address where it is stored on the EEPROM.

The last step is to redefine the stack pointer since with the standard script it is placed at the address 0x8000 at the end of the ram memory, in the same area that will be occupied by the overlay code (remember that the stack expands downward).

The last line of the script contains the stack pointer definition, so we are going to change it so it points to the same address as the beginning of the overlay area at address 28*1024=28672 (7000 hex):

/* default initial stack pointer */
  PROVIDE(__stack_end = 0x7000) ;
}

In this way there are no interferences between the overlay code and the stack.

The script is now complete and we can use it with the linker by adding the -T directive on the command line. In a typical makefile we will have a line like this:

CFLAGS := -Os -Wall
CXXFLAGS := -Os -Wall
LDFLAGS := -s -T propeller_ovl.ld -Wl,-Map=$(NAME).map -fno-exceptions

The -T directive tells the linker to use the propeller_ovl.ld file as the customized linker script.

Overlay load

Now we have a program that compiles correctly, but when it runs it doesn’t show the output we are expecting because the two functions have not yet been loaded into memory. To load the code from the EEPROM we add the definition of the array generated by the linker and a function that loads the actual code:

extern uint32_t _ovly_table[2][3];

void eeprom_load_overlay(int n)
{
    uart_print("--- loading overlay ");
    uart_print_dec(n);
    uart_print(" from ");
    uart_print_number(_ovly_table[n][2], 16, 0);
    uart_print(" to ");
    uart_print_number(_ovly_table[n][0], 16, 0);
    uart_print(" size ");
    uart_print_dec(_ovly_table[n][1]);
    uart_print("\r\n");

    eeprom_read(HIGH_EEPROM_OFFSET(_ovly_table[n][2]), (uint8_t *)_ovly_table[n][0], _ovly_table[n][1]);
}

In the program’s main function then add the call to the overlay load function just before calling the actual functions:

uart_print("\r\n*** OVERLAY DEMO ***\r\n\r\n");

    eeprom_load_overlay(0);
    overlay_function1();

    eeprom_load_overlay(1);
    overlay_function2();

Program upload

To upload the program to the microcontroller it is necessary to use the propeller-load program that comes with the gcc toolchain. The program does not require any particular instruction as it is capable of detecting that a portion of the program must be loaded to the upper part of the EEPROM because of the use of the drivers memory area. Make sure to have at least a 64K EEPROM installed:

marco@bridge:~/parallax/overlay$ propeller-load -t -p /dev/ttyACM0 -r demo.elf 
Propeller Version 1 on /dev/ttyACM0
Loading the serial helper to hub memory
10392 bytes sent                  
Verifying RAM ... OK
Loading cache driver 'eeprom_cache.dat'
1540 bytes sent                  
Writing cog images to eeprom
104 bytes sent                  
Loading demo.elf to hub memory
4792 bytes sent                  
Verifying RAM ... OK
[ Entering terminal mode. Type ESC or Control-C to exit. ]

*** OVERLAY DEMO ***

--- loading overlay 0 from C0000000 to 7000 size 52
Written by overlay_function1 !
--- loading overlay 1 from C0000034 to 7000 size 52
Written by overlay_function2 !

The sample output shows the data of the overlays loaded from the EEPROM.

Conclusions

Using programs with overlay code opens to many possibilities. In the attached example we used simple functions but nothing prevents you to have much more complex code, just beware of the limitations that we have now. The memory used by the overlays can not be used by the main program so complex overlay code occupying several K-bytes automatically reduces the available space. The code from an overlay can not call the code from another overlay, this is because it would be necessary to implement a call tracking system to load the correct code greatly increasing the complexity. Loading the code always takes some time so it is advisable to group functions toghether to minimize loads. Even with some limitations we now have the possibility to write programs greater than the 32K of available memory.

Enjoy!
Marco.

DavidZemon · 2016-04-08 12:27

This is great! I don't know if you read my C++ & Tachyon thread or just happened to be thinking the same thing at the same time, but I was wanting exactly this yesterday!

But now that you've done this and documented it so very thoroughly, I'm brainstorming how best I can implement this PropWare.

Can overlay object files be extracted from a static archive? When the overlay object file is dumped into EEPROM, does the linker do anything special or is it just a byte-for-byte copy between your local filesystem and the Propeller's EEPROM? I suppose the object files have to get addresses injected at link time still don't they, so you can just execute random object files?

I can see this having its own use. It's a lot more lean than PropGCC's XMM models, though also more complicated and requires a much more in-depth knowledge by the user of what is going on. Still, it might be exactly what the C++ & Tachyon system needs to implement a proper "import" statement.

macca · 2016-04-08 13:06

DavidZemon wrote: »

This is great! I don't know if you read my C++ & Tachyon thread or just happened to be thinking the same thing at the same time, but I was wanting exactly this yesterday!

It is a thing I was working with for a long time and finally found the time to document. Glad it is what you were looking for.

Can overlay object files be extracted from a static archive? When the overlay object file is dumped into EEPROM, does the linker do anything special or is it just a byte-for-byte copy between your local filesystem and the Propeller's EEPROM? I suppose the object files have to get addresses injected at link time still don't they, so you can just execute random object files?

The overlay code is extracted byte-for-byte from the final elf executable, if you look at the properller-load output you see that it writes 104 bytes as 'cog images' to the EEPROM, these are the overlay code. propeller-load sees them as cog images because I'm using the same memory area used by the .ecog images but are effectively the overlay functions. At the lower level the linker simply resets the code origin for each source listed as overlay to the memory location defined by the ovl section then stores the resulting code in the memory area defined by the drivers section (which is the same used by .ecog images). The final elf executable has everything correctly linked and the program can extract the required overlay sections at any time. Nothing else is done at runtime. I believe nothing prevents you from storing the overlay code somewhere else, you need to extract it from the elf file, not very difficult as it is the same operation propeller-load does.

I can see this having its own use. It's a lot more lean than PropGCC's XMM models, though also more complicated and requires a much more in-depth knowledge by the user of what is going on. Still, it might be exactly what the C++ & Tachyon system needs to implement a proper "import" statement.

The whole process could be automated to some degree, in a project I'm working on, the linker script is generated automatically by the building system, including the calculation of the memory used by the overlays and the stack adjustment. All informations can be read from the .o object files and with a bit of calculations you can write a linker script with the proper settings.

Electrodude · 2016-04-08 17:31

Wow, I had heard of linker scripts but never had any idea how powerful they were! So the linker isn't completely magic and can be understood by mere mortals... I never liked propgcc because it didn't give me enough control over what went into the final binary, but linker scripts give you more power than Spin does over placement!

Now all that's left is to figure out how to do COGC overlays from hubram into cogram.

JasonDorie · 2016-04-08 18:04

I've been seriously considering making a version of the Elev8 firmware that moves all the PASM drivers into the eeprom so I can reclaim that space for user code. This might be equally worthwhile - there are functions that aren't used often (or in flight) so they'd be good candidates for this approach, though I suspect I couldn't use SimpleIDE to build it any more.

DavidZemon · 2016-04-08 18:11

JasonDorie wrote: »

though I suspect I couldn't use SimpleIDE to build it any more.

I have just the solution for you!

JasonDorie wrote: »

I've been seriously considering making a version of the Elev8 firmware that moves all the PASM drivers into the eeprom

You mean in the same fashion as ecogc right? I guess SimpleIDE only supports that for .ecogc files, not .S files (I assume the PASM drivers you're referring to reside in .S source files). PropWare doesn't currently support that either, but it most certainly should. I'd be happy to implement that in PropWare if it is something you would use.

JasonDorie · 2016-04-08 20:56

All the driver code is PASM inside Spin files. My plan is to create two firmwares - One that simply uploads the driver DAT sections into the eeprom along with some kind of table of contents for sizes / offsets. The other will be the actual flight firmware, and it'll just pull the PASM drivers out of the eeprom and launch them. It won't be hard to do, it just means that flashing the device will need two steps, which is a bit of a pain in the butt. If I can figure out how to make the initial loader push a full 64Kb to the eeprom it'd be easier.

Propware isn't an option here because this is ultimately for the Learn program for Parallax, and SimpleIDE is geared toward beginners. Anything that requires you to learn makefile syntax or jump through too many hoops is kind of a non-starter for this project.

DavidZemon · 2016-04-08 21:49

JasonDorie wrote: »

All the driver code is PASM inside Spin files. My plan is to create two firmwares - One that simply uploads the driver DAT sections into the eeprom along with some kind of table of contents for sizes / offsets. The other will be the actual flight firmware, and it'll just pull the PASM drivers out of the eeprom and launch them. It won't be hard to do, it just means that flashing the device will need two steps, which is a bit of a pain in the butt. If I can figure out how to make the initial loader push a full 64Kb to the eeprom it'd be easier.

Does SimpleIDE not support .S files? I'm just trying to understand why the assembly portions are written in Spin files when it is a C++ program.

JasonDorie wrote: »

Propware isn't an option here because this is ultimately for the Learn program for Parallax, and SimpleIDE is geared toward beginners.

I only mentioned it because I got the impression you were entertaining the idea of not using SimpleIDE anymore. If you're stuck with SimpleIDE, then it sounds like you are indeed stuck with a two-stage loader... unless you know of someone interested in updating SimpleIDE.

JasonDorie wrote: »

Anything that requires you to learn makefile syntax or jump through too many hoops is kind of a non-starter for this project.

Here's the complicated CMake syntax. No Makefiles need be written by the end user.

cmake_minimum_required(VERSION 3.3)
find_package(PropWare 2.1 REQUIRED)

project(Flight-Controller C CXX SPIN2DAT)

set(MODEL cmm)

create_executable(Elev8
    Beep.cpp
    Eeprom.cpp
    Elev8-Main.cpp
    F32.cpp
    F32_driver.spin
    IntPID.cpp
    QuatIMU.cpp
    pst.spin
    RC.cpp
    RC_driver.spin
    SBUS.cpp
    SBUS_driver.spin
    Sensors.cpp
    Sensors_driver.spin
    Servo32_HighRes.cpp
    Servo32_HighRes_driver.spin
    Settings.cpp
)

The first two lines are boilerplate - they'll never change. The third line defines the name of your project and what languages are being used. And the rest should be pretty self-explanatory.

JasonDorie · 2016-04-08 22:59

The files are Spin because the whole thing was a Spin project to begin with. I had a fully functioning system when I decided to see if it was possible to do in C/C++, because I knew that C would mean an optimizer, and the CMM would likely run faster.

I didn't want to change all the PASM code to the GCC assembler format because all the label / hub indices are bytes instead of longs, and the format is fairly different in general. It meant spending a bunch of time for really no benefit.

CMake looks like it'd be pretty easy to use, but it would still mean path settings and command-line compilation setup, which itself isn't as user-friendly as "install, then click the go button" for a newb. I'd be fine with it, but I'm not the client here.

I actually did try using .S files at one point, but there was some weirdness with them, possibly because of case insensitivity on Windows files. If I built from a .s (lower) source it didn't build, but building from a .S source (upper) it built but didn't link. It might even be related to SimpleIDE itself not liking them. It was something along those lines, but it was long enough ago that I don't remember the exact problem.

macca · 2016-04-09 06:12

JasonDorie wrote: »

I actually did try using .S files at one point, but there was some weirdness with them, possibly because of case insensitivity on Windows files. If I built from a .s (lower) source it didn't build, but building from a .S source (upper) it built but didn't link. It might even be related to SimpleIDE itself not liking them. It was something along those lines, but it was long enough ago that I don't remember the exact problem.

Maybe I can help with that, I have converted a number of drivers from spin to .s assembler and it isn't that difficult, most of the source can be used verbatim without changes, you can also see an example in the demo package with uart_driver.s converted from FullDuplexSerial.spin. There are a couple of quirks to take into account: first you need a recent gcc build, if I'm not wrong Parallax didn't update the "official" package for quite some time and early releases don't work well with pasm sources. Some syntax won't work, for example the %%1234 numbers don't throw any error but all values are translated to 0, you need to convert them to hex or decimal, binary values %01010101 works well. Local labels have a different syntax, luckyly they throws an error at compile time so you can fix them. Other thing, you can't reset the org address so if you have overlay code (initializing code that is reused as variable space) it won't work and needs a rewrite.

If you need hub addresses just use the @_hub_variable_name syntax (make sure to use the correct C naming which adds an underscore to the variable name), like:

varname       long  @_hub_varname

Moving the drivers to EEPROM then is very easy, take the uart_driver.s example, at the top of the file we have:

.pasm
                        .compress off

                        .section .cog_uart_driver, "ax"

Change the section line so it defines a .ecog section, like this:

.pasm
                        .compress off

                        .section .ecog_uart_driver, "ax"

Done! The driver will be put on the EEPROM and the compiler generates the __load_start_ecog_uart_driver symbols with the appropriate addresses, you just need to read them from the EEPROM in a temporary buffer and start the cog (I think the gcc library has a function for that already but never really used it so I'm not sure how it works). propeller-load does the upload in one step.

David Betz · 2016-10-06 13:54

Hi Marco,

I saw your post quite a while ago but never had a chance to play with it until just now. Thanks for writing up a great example of how to use overlays! I have a question though. What is this code for?

    OUTA = (1 << I2C_SCL);
    DIRA = (1 << I2C_SCL);

    DIRA &= ~(1 << I2C_SDA);                       // Set SDA as input
    for (i = 0; i < 9; i++) {
        OUTA &= ~(1 << I2C_SCL);                   // Put out up to 9 clock pulses
        OUTA |= (1 << I2C_SCL);
        if ((INA & (1 << I2C_SDA)) != 0)           // Repeat if SDA not driven high by the EEPROM
            break;
    }

I assume it's initialization code for the i2c bus but why isn't it in eeprom.c as eeprom_start() or something like that?

macca · 2016-10-06 15:18

David Betz wrote: »
Hi Marco,

I saw your post quite a while ago but never had a chance to play with it until just now. Thanks for writing up a great example of how to use overlays! I have a question though. What is this code for?
    OUTA = (1 << I2C_SCL);
    DIRA = (1 << I2C_SCL);

    DIRA &= ~(1 << I2C_SDA);                       // Set SDA as input
    for (i = 0; i < 9; i++) {
        OUTA &= ~(1 << I2C_SCL);                   // Put out up to 9 clock pulses
        OUTA |= (1 << I2C_SCL);
        if ((INA & (1 << I2C_SDA)) != 0)           // Repeat if SDA not driven high by the EEPROM
            break;
    }
I assume it's initialization code for the i2c bus but why isn't it in eeprom.c as eeprom_start() or something like that?

That code should reinitialize a device that is in an invalid state (found in in i2c source written by Michael Green). There is no particular reason why it was left in main, probably I just forgot to move to a more appropriate place.

David Betz · 2016-10-06 15:55

macca wrote: »
David Betz wrote: »
Hi Marco,

I saw your post quite a while ago but never had a chance to play with it until just now. Thanks for writing up a great example of how to use overlays! I have a question though. What is this code for?
    OUTA = (1 << I2C_SCL);
    DIRA = (1 << I2C_SCL);

    DIRA &= ~(1 << I2C_SDA);                       // Set SDA as input
    for (i = 0; i < 9; i++) {
        OUTA &= ~(1 << I2C_SCL);                   // Put out up to 9 clock pulses
        OUTA |= (1 << I2C_SCL);
        if ((INA & (1 << I2C_SDA)) != 0)           // Repeat if SDA not driven high by the EEPROM
            break;
    }
I assume it's initialization code for the i2c bus but why isn't it in eeprom.c as eeprom_start() or something like that?
That code should reinitialize a device that is in an invalid state (found in in i2c source written by Michael Green). There is no particular reason why it was left in main, probably I just forgot to move to a more appropriate place.

Okay, I'll probably move it if I use your code in a project. Thanks for your work on this!

Overlay Code with GCC

Comments