input requested: relocating loader for large model code

Bill Henning · 2007-04-15 02:38

Hi guys,

I've been doing some work on my large model and the assembler I'm building for it; and it has become obvious that I need the ability to relocate large model programs as I load them.

The simplest scheme I can think of is to have each large model executable have a header defined as follows:

word magic
word iptr
word dptr
word size
long reserved1, reserved2 (reserved for future expansion)

Where:

magic - identifies the type of executable (small model driver, large model code, etc)
iptr - word offset to the first large model code address in the file which is a long (for FJMP and FCALL)
dptr - word offset to the first large model data (for FLDI and FPUSHI that need relocation)
size - number of longs of memory required by executable + stack

For large addresses embedded in large model code, they will be stored as follows:

High word: offset to next pointer to be modified
Low word: offset from load address, is added to a base by the relocating loader, which will also clear the high word

I can't think of anything simpler, but I welcome suggestions.

Why am I incorporating a relocating loader?

because I don't want to take the runtime performance hit of always adding a base address, and I want to support multiple large model programs running at once.

This scheme should still work with future propellers with 128KB / 256KB of hub memory, but it will limit large model programs to 64KB code / 64KB data until I define a new format for larger future chips.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com - a new blog about microcontrollers

Mike Green · 2007-04-15 02:57

Bill,
At the very least, I suggest defining "size" to be the actual loaded size of the executable. Keep the expected stack size separate. You might separate the size of the data area as well. I strongly suggest that you have all instructions and large addresses stored on long word boundaries and the offsets in iptr, dptr, and the high and low words of large model addresses be in terms of long words. That way, you'll support 256KB already.

Bill Henning · 2007-04-15 03:10

Hi Mike,

Hmm... maybe I was trying to be too frugal. I was going to compute the code segment size from file size - header size, storing the initialized data in the code segment as well; and having the data segment size be size (in header) - code segment size.

But, I think I will follow your suggestion and split them out for the sakes of clarity.

All instructions and long addresses were already going to be on long word bondaries precisely for the reason you mention [noparse]:)[/noparse]

The heap pointer chain also uses word pointers, long aligned, to allow for 256KB as well.

Revised header:

byte magic - executable type
byte hdrs - header size in bytes
word isize - code segment size, in longs
word dsize - data segment size, in longs
word ssize - requested stack size, in longs
word iptr - pointer to first long code address that will need relocation (which chains to following addresses)
word dptr - pointer to first long data address that will need relocation (which chains to following addresses)

This also allows for an external utility to modify the stack allocation for an executable without recompiling it, and theoretically allows for 256K code, 256K data.

Thanks Mike. I was being too miserly with header bytes.

Mike Green said...
Bill,
At the very least, I suggest defining "size" to be the actual loaded size of the executable. Keep the expected stack size separate. You might separate the size of the data area as well. I strongly suggest that you have all instructions and large addresses stored on long word boundaries and the offsets in iptr, dptr, and the high and low words of large model addresses be in terms of long words. That way, you'll support 256KB already.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com - a new blog about microcontrollers

Post Edited (Bill Henning) : 4/15/2007 3:17:38 AM GMT

Mike Green · 2007-04-15 03:22

Do document that iptr and dptr (and the chains) are long numbers (to be shifted left 2 bits before use as an address). You might indicate that zero is the nil pointer for these.

Bill Henning · 2007-04-15 03:35

Will do.

Will also document isize, dsize, ssize as being size in longs, not bytes.

Mike Green said...
Do document that iptr and dptr (and the chains) are long numbers (to be shifted left 2 bits before use as an address). You might indicate that zero is the nil pointer for these.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com - a new blog about microcontrollers

ImageCraft · 2007-04-17 07:34

Hmmm... so instead of using a symbol table, you are embedding the symbol list directly in the program, using 2 words for each address. I am not sure the value of that. I would prefer to put a symbol table after the header, before the actual code and data. .. This requires a proper assembler and linker that understand relocation, but we can have that easily with the C compiler tool chain.

// richard

Mike Green · 2007-04-17 14:32

Richard,
First of all, the address field is a long for this type of code, so packing two words there works nicely.

Second, this is a relocating loader, not a linker. At this point, there are no symbols involved. Since the
Propeller's instructions (and the Large Model extension to it) have only absolute addressing, the
code needs relocation on loading if you want to do any kind of dynamic loading

Presumably, the kind of file Bill is talking about would be the output of a proper linker for "overlay" segments.

Bill Henning · 2007-04-18 07:04

Got it in one Mike!

The reason for a relocating loader on top of a regular linker is I want to run *multiple* large model programs at once, with overlay support.

Since there is no MMU, and since I did not want to take the performance hit of adding an offset to every large model address at runtime... I wanted to design a relocating loader.

It would be used for full apps as well, not just overlay segments - think of it as a 'p.out' format for propeller relocatable executables [noparse]:)[/noparse]

Mike Green said...
Richard,
First of all, the address field is a long for this type of code, so packing two words there works nicely.

Second, this is a relocating loader, not a linker. At this point, there are no symbols involved. Since the
Propeller's instructions (and the Large Model extension to it) have only absolute addressing, the
code needs relocation on loading if you want to do any kind of dynamic loading

Presumably, the kind of file Bill is talking about would be the output of a proper linker for "overlay" segments.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com - a new blog about microcontrollers

Post Edited (Bill Henning) : 4/18/2007 7:18:16 AM GMT

Bill Henning · 2007-04-18 07:10

Think of this format as an alternative to 'elf' or 'a.out' binaries; Mike was correct, this is post linking; this is a workaround for not having an MMU, and still supporting loading fully compiled executables at different addresses.

This is for an executable format, not an object file format; the object files would ofcourse have embedded symbol tables for the linker.

I think I'll call this format 'p.out'

to distinguish it from a.out or elf or ...

ImageCraft said...
Hmmm... so instead of using a symbol table, you are embedding the symbol list directly in the program, using 2 words for each address. I am not sure the value of that. I would prefer to put a symbol table after the header, before the actual code and data. .. This requires a proper assembler and linker that understand relocation, but we can have that easily with the C compiler tool chain.

// richard

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com - a new blog about microcontrollers

ImageCraft · 2007-04-18 07:19

I understand that this is a loader, not a linker. My point is (if I understand correctly), there are multiple addresses that need adjusting (e.g. any time an "absolute" address is used, it will need the base offset added to it) and from what I understand what you wrote, the entries are chained by the address fields themselves. The other approach is to have a table of the offset after the header, but before the code.

// richard

Bill Henning · 2007-04-18 07:40

You understand correctly.

I preferred the linked list approach as it keeps the header small, and the code for fixing up the addresses actually ends up a bit shorter.

Otherwise, a variable sized table would have to be loaded, consuming significantly more precious critical hub resources (memory)

Say the p.out excutable would have had an assembled "FLDI $0004" in it, referring to the second long in the data segment for this executable (or overlay). Encoded for the relocating loader, it would actually be encoded as [noparse][[/noparse]long offset to next address to be fixed][noparse][[/noparse]$0001]

Assuming that the data segment gets loaded to hub address $2040, the loader would change the long value to $00002044, and would know where the next data segment address that required fixing would be; chaining along the data addresses and fixing them until it finds a "null" offset for the next address to be fixed - which tells it that it has finished relocating data addresses.

A similar chain would be used to fix up static addresses in the code segment.

The beauty of this approach is that it will handle Chip increasing hub ram up to 256KB without breaking binary compatibility while using limited ammount of hub memory for the "overhead" of relocatability.

Dynamic link libraries will be invoked with an indirect jump, thus do not require relocation.

ImageCraft said...
I understand that this is a loader, not a linker. My point is (if I understand correctly), there are multiple addresses that need adjusting (e.g. any time an "absolute" address is used, it will need the base offset added to it) and from what I understand what you wrote, the entries are chained by the address fields themselves. The other approach is to have a table of the offset after the header, but before the code.

// richard

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com - a new blog about microcontrollers

ImageCraft · 2007-04-18 07:48

Bill Henning said...
You understand correctly.

I preferred the linked list approach as it keeps the header small, and the code for fixing up the addresses actually ends up a bit shorter.

Otherwise, a variable sized table would have to be loaded, consuming significantly more precious critical hub resources (memory)
...

OK, I will need to think about it some more. At this point, I can't commit our compiler to adhere to this model. It may or may not, so don't hold up the proposal on my account.
// richard

Bill Henning · 2007-04-18 07:54

Well, I really hope you add support for the 'p.out' format for large model executables [noparse]:)[/noparse]

It should actually be VERY easy for you to add this to your linker, and it would allow executables generated by your compiler to run on my future pico-os for propellers - running multiple C executables concurrently [noparse]:)[/noparse]

The DLL format I am thinking of uses the same basic header, along with a "DLLInit" required function to initialize the jump vector table, so by supporting this format, you will also automatically support the FLIB and FSVC interfaces [noparse]:)[/noparse] [noparse]:)[/noparse] [noparse]:)[/noparse]

ImageCraft said...

Bill Henning said...
You understand correctly.

I preferred the linked list approach as it keeps the header small, and the code for fixing up the addresses actually ends up a bit shorter.

Otherwise, a variable sized table would have to be loaded, consuming significantly more precious critical hub resources (memory)
...

OK, I will need to think about it some more. At this point, I can't commit our compiler to adhere to this model. It may or may not, so don't hold up the proposal on my account.
// richard

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com - a new blog about microcontrollers

Bill Henning · 2007-04-18 08:03

If you don't have time to add this specific mode, please then have an option for your linker to emit a file with the offset (within the file) of every large model constant address (and weather it is a data or instruction address); that could be used by an external utility to re-format your executables into the p.out format

I will be pushing p.out as the standard large model executable format; my assembler and linker (and other future projects) will use it - although I may just adopt the newly released cross assembler if it fits my needs in order to save time.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com - a new blog about microcontrollers

Tom Bampton · 2007-04-18 09:06

I have been thinking a lot about how to deal with such things lately...

I would probably put the header and the relocation information before the actual code/data. My reasoning behind this is after you have loaded and relocated the executable, all you need is the entry point. The header and relocation info can be overwritten. If you split them, then you end up with two fragments ... a tiny one and a potentially big one. Keeping them together, you just have one potentially big chunk that you can reuse in it's entirity.

In the extreme case, that tiny bit of fragmentation can be the difference between being able to load another executable or not. Presumably if you are dynamically loading executables, you'd also be dynamically allocating memory. In such a situation it seems that the fragmentation could be even more of an issue depending on the allocation strategy.

It's also worth pointing out that if fragmentation actually is an issue, and not just me being pedantic, then every time you load an executable it's going to make it worse. Dealing with that can easily offset any code savings you may make in other ways.

T.

Bill Henning · 2007-04-21 05:59

I totally agree, that's one of the reasons it is a header [noparse]:)[/noparse]

I also agree about controlling fragmentation; that's why I intend to allocate one large block for code, data and stack for each process.

Whenever a block is freed, it will check to see if the block before it or after it in the malloc'd list has also been freed; if so, it will merge them.

Generally, the executable layout I'm thinking of is:

header
code
initialized data

Ofcourse initialized pointers in the initialized data section can also be relocated.

- My intent is to load the header into a temporary shared buffer.

- Using the information from the header, allocate a large enough chunk of hub memory to hold the code, data, and stack.

that chunk would look like

code
initialized data
uninitialized data
stack

- Load the code and initialized data into the newly allocated memory block.

- Fix up the addresses in the chains pointed to by iptr+dptr

- Create a TCB with the address information for start of code, start of data, top of stack.

- start a copy of a large model kernel in an available cog, passing the TCB as par

Tom Bampton said...
I have been thinking a lot about how to deal with such things lately...

I would probably put the header and the relocation information before the actual code/data. My reasoning behind this is after you have loaded and relocated the executable, all you need is the entry point. The header and relocation info can be overwritten. If you split them, then you end up with two fragments ... a tiny one and a potentially big one. Keeping them together, you just have one potentially big chunk that you can reuse in it's entirity.

In the extreme case, that tiny bit of fragmentation can be the difference between being able to load another executable or not. Presumably if you are dynamically loading executables, you'd also be dynamically allocating memory. In such a situation it seems that the fragmentation could be even more of an issue depending on the allocation strategy.

It's also worth pointing out that if fragmentation actually is an issue, and not just me being pedantic, then every time you load an executable it's going to make it worse. Dealing with that can easily offset any code savings you may make in other ways.

T.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com - a new blog about microcontrollers

Bill Henning · 2007-04-21 07:51

While writing the documentation on the loader format it has become obvious that the data segment has to be limited to 64KB - which is not an issue for the current propeller, and is not really an issue for the future propeller as malloc'ed memory blocks are outside of the global data segement, thus large model code will NOT be limited to 64KB data.

This change is neccessary to support relocation of byte pointers generated by compilers - for example, consider the following:

char buff[noparse][[/noparse]100], token[noparse][[/noparse]20];

strcpy(token,&buff);

if all data pointers were long aligned, the above would require emitting significant additional code, whereas if the data segment pointers are byte aligned, merely three large model instructions would be emitted for the above:

PUSHI token
PUSHI buff+1
FCALL strcpy

Code pointers however will be long aligned, therefore a word can point to 64K longs (256KB)

word sized byte pointers are called BPTR

word sized long pointers are called LPTR

for the sake of completeness, I will also define WPTR as a word sized pointer to one of 64K words, therefore potentially addressing 128KB

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com - a new blog about microcontrollers

Bill Henning · 2007-04-21 08:13

Relocating Loader

The loader must be able to distinguish between the different types of executable programs in order to be able to properly load and execute them.

In order to be able to distinguish between the different file types, every type of executable file will be prefixed with a small header.

The Header

MAGIC A byte containing a magic number that identifies the type of executable.

1. Large Model Application
2. Large Model Shared Library
3. Driver
4. Shell script

SIZE A byte containing the number of bytes in the current header.

ISIZE A word containing the number of longs in the code segment
DSIZE A word containing the number of longs in the data segment
SSIZE A word containing the number of longs in the stack segment

IPTR LPTR to the first long in the code segment that needs relocation
DPTR LPTR to the first long in the data segment that needs relocation

SUDO A byte, set if must run as root

LIBS A byte containing the number of shared libraries this executable depends on

LIBREC Eight byte library records, one per number in LIBS

There are no explicit device records, as devices are opened at run-time.

The long pointed to by IPTR is the start of a linked list of relocation records, where the high word is an LPTR to the next record in the list, and the low word consists of an LPTR to an instruction.

The long pointed to by DPTR is the start of a linked list of relocation records where the high word is an LPTR to the next record in the list, and the low word consists of a BPTR to a byte of data.

A LIBREC consists of the following

LIBNO A byte consisting of the library handle
LIBVER A byte consisting of minimum library version
LIBNAME Up to six characters for library name, prefixed by ‘lib’ before loading

---

1) LPTR is a word that contains a long address that has been shifted right two bits, this way a word can be used to point at any long in a 256KB address space.

2) BPTR is a word that contains a byte address; BPTR’s are needed to allow C code pointer initialization to specific bytes. Please note that this only limits compiler allocated global data to 64K, malloc’d data can return valid pointers above 64K.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com - a new blog about microcontrollers

Post Edited (Bill Henning) : 4/21/2007 8:17:46 AM GMT

input requested: relocating loader for large model code

Comments