input requested: relocating loader for large model code
Bill Henning
Posts: 6,445
Hi guys,
I've been doing some work on my large model and the assembler I'm building for it; and it has become obvious that I need the ability to relocate large model programs as I load them.
The simplest scheme I can think of is to have each large model executable have a header defined as follows:
word magic
word iptr
word dptr
word size
long reserved1, reserved2 (reserved for future expansion)
Where:
magic - identifies the type of executable (small model driver, large model code, etc)
iptr - word offset to the first large model code address in the file which is a long (for FJMP and FCALL)
dptr - word offset to the first large model data (for FLDI and FPUSHI that need relocation)
size - number of longs of memory required by executable + stack
For large addresses embedded in large model code, they will be stored as follows:
High word: offset to next pointer to be modified
Low word: offset from load address, is added to a base by the relocating loader, which will also clear the high word
I can't think of anything simpler, but I welcome suggestions.
Why am I incorporating a relocating loader?
because I don't want to take the runtime performance hit of always adding a base address, and I want to support multiple large model programs running at once.
This scheme should still work with future propellers with 128KB / 256KB of hub memory, but it will limit large model programs to 64KB code / 64KB data until I define a new format for larger future chips.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com - a new blog about microcontrollers
I've been doing some work on my large model and the assembler I'm building for it; and it has become obvious that I need the ability to relocate large model programs as I load them.
The simplest scheme I can think of is to have each large model executable have a header defined as follows:
word magic
word iptr
word dptr
word size
long reserved1, reserved2 (reserved for future expansion)
Where:
magic - identifies the type of executable (small model driver, large model code, etc)
iptr - word offset to the first large model code address in the file which is a long (for FJMP and FCALL)
dptr - word offset to the first large model data (for FLDI and FPUSHI that need relocation)
size - number of longs of memory required by executable + stack
For large addresses embedded in large model code, they will be stored as follows:
High word: offset to next pointer to be modified
Low word: offset from load address, is added to a base by the relocating loader, which will also clear the high word
I can't think of anything simpler, but I welcome suggestions.
Why am I incorporating a relocating loader?
because I don't want to take the runtime performance hit of always adding a base address, and I want to support multiple large model programs running at once.
This scheme should still work with future propellers with 128KB / 256KB of hub memory, but it will limit large model programs to 64KB code / 64KB data until I define a new format for larger future chips.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com - a new blog about microcontrollers
Comments
At the very least, I suggest defining "size" to be the actual loaded size of the executable. Keep the expected stack size separate. You might separate the size of the data area as well. I strongly suggest that you have all instructions and large addresses stored on long word boundaries and the offsets in iptr, dptr, and the high and low words of large model addresses be in terms of long words. That way, you'll support 256KB already.
Hmm... maybe I was trying to be too frugal. I was going to compute the code segment size from file size - header size, storing the initialized data in the code segment as well; and having the data segment size be size (in header) - code segment size.
But, I think I will follow your suggestion and split them out for the sakes of clarity.
All instructions and long addresses were already going to be on long word bondaries precisely for the reason you mention [noparse]:)[/noparse]
The heap pointer chain also uses word pointers, long aligned, to allow for 256KB as well.
Revised header:
byte magic - executable type
byte hdrs - header size in bytes
word isize - code segment size, in longs
word dsize - data segment size, in longs
word ssize - requested stack size, in longs
word iptr - pointer to first long code address that will need relocation (which chains to following addresses)
word dptr - pointer to first long data address that will need relocation (which chains to following addresses)
This also allows for an external utility to modify the stack allocation for an executable without recompiling it, and theoretically allows for 256K code, 256K data.
Thanks Mike. I was being too miserly with header bytes.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com - a new blog about microcontrollers
Post Edited (Bill Henning) : 4/15/2007 3:17:38 AM GMT
Will also document isize, dsize, ssize as being size in longs, not bytes.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com - a new blog about microcontrollers
// richard
First of all, the address field is a long for this type of code, so packing two words there works nicely.
Second, this is a relocating loader, not a linker. At this point, there are no symbols involved. Since the
Propeller's instructions (and the Large Model extension to it) have only absolute addressing, the
code needs relocation on loading if you want to do any kind of dynamic loading
Presumably, the kind of file Bill is talking about would be the output of a proper linker for "overlay" segments.
The reason for a relocating loader on top of a regular linker is I want to run *multiple* large model programs at once, with overlay support.
Since there is no MMU, and since I did not want to take the performance hit of adding an offset to every large model address at runtime... I wanted to design a relocating loader.
It would be used for full apps as well, not just overlay segments - think of it as a 'p.out' format for propeller relocatable executables [noparse]:)[/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com - a new blog about microcontrollers
Post Edited (Bill Henning) : 4/18/2007 7:18:16 AM GMT
This is for an executable format, not an object file format; the object files would ofcourse have embedded symbol tables for the linker.
I think I'll call this format 'p.out' to distinguish it from a.out or elf or ...
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com - a new blog about microcontrollers
// richard
I preferred the linked list approach as it keeps the header small, and the code for fixing up the addresses actually ends up a bit shorter.
Otherwise, a variable sized table would have to be loaded, consuming significantly more precious critical hub resources (memory)
Say the p.out excutable would have had an assembled "FLDI $0004" in it, referring to the second long in the data segment for this executable (or overlay). Encoded for the relocating loader, it would actually be encoded as [noparse][[/noparse]long offset to next address to be fixed][noparse][[/noparse]$0001]
Assuming that the data segment gets loaded to hub address $2040, the loader would change the long value to $00002044, and would know where the next data segment address that required fixing would be; chaining along the data addresses and fixing them until it finds a "null" offset for the next address to be fixed - which tells it that it has finished relocating data addresses.
A similar chain would be used to fix up static addresses in the code segment.
The beauty of this approach is that it will handle Chip increasing hub ram up to 256KB without breaking binary compatibility while using limited ammount of hub memory for the "overhead" of relocatability.
Dynamic link libraries will be invoked with an indirect jump, thus do not require relocation.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com - a new blog about microcontrollers
OK, I will need to think about it some more. At this point, I can't commit our compiler to adhere to this model. It may or may not, so don't hold up the proposal on my account.
// richard
It should actually be VERY easy for you to add this to your linker, and it would allow executables generated by your compiler to run on my future pico-os for propellers - running multiple C executables concurrently [noparse]:)[/noparse]
The DLL format I am thinking of uses the same basic header, along with a "DLLInit" required function to initialize the jump vector table, so by supporting this format, you will also automatically support the FLIB and FSVC interfaces [noparse]:)[/noparse] [noparse]:)[/noparse] [noparse]:)[/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com - a new blog about microcontrollers
I will be pushing p.out as the standard large model executable format; my assembler and linker (and other future projects) will use it - although I may just adopt the newly released cross assembler if it fits my needs in order to save time.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com - a new blog about microcontrollers
I would probably put the header and the relocation information before the actual code/data. My reasoning behind this is after you have loaded and relocated the executable, all you need is the entry point. The header and relocation info can be overwritten. If you split them, then you end up with two fragments ... a tiny one and a potentially big one. Keeping them together, you just have one potentially big chunk that you can reuse in it's entirity.
In the extreme case, that tiny bit of fragmentation can be the difference between being able to load another executable or not. Presumably if you are dynamically loading executables, you'd also be dynamically allocating memory. In such a situation it seems that the fragmentation could be even more of an issue depending on the allocation strategy.
It's also worth pointing out that if fragmentation actually is an issue, and not just me being pedantic, then every time you load an executable it's going to make it worse. Dealing with that can easily offset any code savings you may make in other ways.
T.
I also agree about controlling fragmentation; that's why I intend to allocate one large block for code, data and stack for each process.
Whenever a block is freed, it will check to see if the block before it or after it in the malloc'd list has also been freed; if so, it will merge them.
Generally, the executable layout I'm thinking of is:
header
code
initialized data
Ofcourse initialized pointers in the initialized data section can also be relocated.
- My intent is to load the header into a temporary shared buffer.
- Using the information from the header, allocate a large enough chunk of hub memory to hold the code, data, and stack.
that chunk would look like
code
initialized data
uninitialized data
stack
- Load the code and initialized data into the newly allocated memory block.
- Fix up the addresses in the chains pointed to by iptr+dptr
- Create a TCB with the address information for start of code, start of data, top of stack.
- start a copy of a large model kernel in an available cog, passing the TCB as par
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com - a new blog about microcontrollers
This change is neccessary to support relocation of byte pointers generated by compilers - for example, consider the following:
char buff[noparse][[/noparse]100], token[noparse][[/noparse]20];
strcpy(token,&buff);
if all data pointers were long aligned, the above would require emitting significant additional code, whereas if the data segment pointers are byte aligned, merely three large model instructions would be emitted for the above:
PUSHI token
PUSHI buff+1
FCALL strcpy
Code pointers however will be long aligned, therefore a word can point to 64K longs (256KB)
word sized byte pointers are called BPTR
word sized long pointers are called LPTR
for the sake of completeness, I will also define WPTR as a word sized pointer to one of 64K words, therefore potentially addressing 128KB
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com - a new blog about microcontrollers
The loader must be able to distinguish between the different types of executable programs in order to be able to properly load and execute them.
In order to be able to distinguish between the different file types, every type of executable file will be prefixed with a small header.
The Header
MAGIC A byte containing a magic number that identifies the type of executable.
1. Large Model Application
2. Large Model Shared Library
3. Driver
4. Shell script
SIZE A byte containing the number of bytes in the current header.
ISIZE A word containing the number of longs in the code segment
DSIZE A word containing the number of longs in the data segment
SSIZE A word containing the number of longs in the stack segment
IPTR LPTR to the first long in the code segment that needs relocation
DPTR LPTR to the first long in the data segment that needs relocation
SUDO A byte, set if must run as root
LIBS A byte containing the number of shared libraries this executable depends on
LIBREC Eight byte library records, one per number in LIBS
There are no explicit device records, as devices are opened at run-time.
The long pointed to by IPTR is the start of a linked list of relocation records, where the high word is an LPTR to the next record in the list, and the low word consists of an LPTR to an instruction.
The long pointed to by DPTR is the start of a linked list of relocation records where the high word is an LPTR to the next record in the list, and the low word consists of a BPTR to a byte of data.
A LIBREC consists of the following
LIBNO A byte consisting of the library handle
LIBVER A byte consisting of minimum library version
LIBNAME Up to six characters for library name, prefixed by ‘lib’ before loading
---
1) LPTR is a word that contains a long address that has been shifted right two bits, this way a word can be used to point at any long in a 256KB address space.
2) BPTR is a word that contains a byte address; BPTR’s are needed to allow C code pointer initialization to specific bytes. Please note that this only limits compiler allocated global data to 64K, malloc’d data can return valid pointers above 64K.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com - a new blog about microcontrollers
Post Edited (Bill Henning) : 4/21/2007 8:17:46 AM GMT