Standard object file format
OwenS
Posts: 173
With multiple people working on their own DIY propeller tools, I thought we would benefit greatly from a standardized object file format for storing compiled (but not linked) code.
One question I raise is should this object format support Spin objects, and if so, how? I don't have any reasonable ideas here; all of them woul be slower than the current single pass compilation. Suggestions?
My current idea was that each object file would contain
Each fragment contains
The following is a preliminary overview of the format. It's in a mix of spin and c, with datatype sizes following those used on the Propeller. All values are unsigned
Questions:
* Should we support Spin? How?
* Is the Relocation structure capable enough?
* Do we need anything else?
One question I raise is should this object format support Spin objects, and if so, how? I don't have any reasonable ideas here; all of them woul be slower than the current single pass compilation. Suggestions?
My current idea was that each object file would contain
- A identification and info header
- Zero or more imported symbols
- Zero or more exported symbols
- Zero or more internal symbols
- Zero or more fragments
Each fragment contains
- The code and data to be stored in it
- A list of relocations to be performed on the contained code
The following is a preliminary overview of the format. It's in a mix of spin and c, with datatype sizes following those used on the Propeller. All values are unsigned
struct Header { long magic = "POBJ"; long version = 0; long importedSymbols; long exportedSymbols; long privateSymbols; long fragments; ImportedSymbol imported[noparse][[/noparse]importedSymbols]; ExportedSymbol exported[noparse][[/noparse]exportedSymbols]; PrivateSymbol private[noparse][[/noparse]privateSymbols]; Fragment frags[noparse][[/noparse]fragments]; }; struct ImportedSymbol { long num; // this is the number by which this symbol will be referred to by the relocation sections word len; // length of text, bytes byte text[noparse][[/noparse]len]; // text is not null terminated }; struct ExportedSymbol { long num; // this is the number by which this symbol will be referred to by the relocation sections long frag; // fragment in this object file which this symbol refers to long off; // offset in the fragment word len; // length of text, bytes byte text[noparse][[/noparse]len]; // text is not null terminated }; struct PrivateSymbol { long num; // this is the number by which this symbol will be referred to by the relocation sections long frag; // fragment in the object file which this symbol refers to long off; // offset in the fragment }; struct Fragment { long bytes; // length of fragment in bytes long relocations; // number of relocations byte data[noparse][[/noparse]bytes]; // data Relocation relocs[noparse][[/noparse]relocations]; // relocations }; struct Relocation { long symbol; // Symbol to relocate against long offset; // Offset in fragment to address to relocate byte size; // Size of symbol // Performing relocations: // switch(size) { // case 1: byte[noparse][[/noparse]frag.data + reloc.offset] = symbol.address + reloc.adj; break; // case 2: word[noparse][[/noparse]frag.data + reloc.offset] = symbol.address + reloc.adj; break; // case 4: long[noparse][[/noparse]frag.data + reloc.offset] = symbol.address + reloc.adj; break; // } }
Questions:
* Should we support Spin? How?
* Is the Relocation structure capable enough?
* Do we need anything else?
Comments
Spin objects work with addresses much like asm code, so relocation can be taken care of in a similar manner.
On that note, almost a year ago I suggested a simple object file format for LMM code, it's in the forums somewhere... I am thinking of something a bit better now, but still simpler than the proposal above.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com - a new blog about microcontrollers
Please do submit your object format. It would be very interesting to hear how you would do it, and would certainly count as suggestions
If you want to ultimately compile in the propeller itself... it could be a problem, but for pcs?... with GBs of memory and disk space... not a problem(tm).
I had to do this in a project where I was using compact block text files and a another developer insisted on using XML. It was far easier than I expected.
Binary data could be in pairs of hex digits, left lower address, right higher address as in a hex file. Addresses in big endian (so human readable).
One option would be to change to an IFF-style format, that is each datatype has the following header:
Conventionally, that is, following the IFF specification, the numbers would be stored in big endian. Since were targetting a little endian processor, I propose storing them in little endian instead.
I, from that, propose the following layout:
Any type which can contain other predefined types can contain custom types. Readers should ignore types theyve never seen before. You may make subtypes of your own types.
Actually, no. Generating XML is by no means difficult. You just use printf and format at your wish. Redundant information ?, the parser does no know that it is redundant, nor it cares. (as long as 2 attributes have different names...)
There is a convention to transfer special chars (like '>') between tags, you just write < or > like in html.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve
Converting > is easy ( and I bet many are confused by the forum converting it back automatically ), no different to converting plain text to HTML.
Attributes I don't see as a problem either. Arguable for my 'KISS' approach I should have added an extra <count>2</count> entity rather than use an attribute but it doesn't really matter. Any XML interfacing engine should be able to extract any specific entity out of the XML just like picking any <pre>...</pre> out of HTML etc. The end-user programmer still has to know about the meta-format and meta-data to deal with what is just 'data' even though it's XML ( although there might be some 'content aware' processing tools available, not my field ).
Perhaps the significant difference is that in both cases, XML and my text format, there is an indicator of count rather than the case where a parser would scan a line at a time and work out which section it was now in or whether there was more data or sub-data, or data blocks being terminated by blank line etc.
IMO there's only one rule for a portable and convertable data representation -
Every entity must have a count of how many sub-entities there are plus those sub-entities.
Whether the end-user needs to know what those sub-entities are by pre-definition or the sub-entities describe themselves is icing on the cake. It doesn't matter, just affects how east it is to use a parsing engine library to get at that data.
I got the syntax of my example wrong, but with a couple of simple tweaks it seems to be valid enough XML. If I could remember how to it would be easy enough to turn that into a page view in IE/Firefox using the JavaScript XML engine.
XML is not so bad if done simply, but the data representation *is* bigger than necessary. If you use it for IPC, marshalling etc... is required. You don't get marshalling with "tv.out" or whatever [noparse]:)[/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve
I have two issues with libBFD. The first is that it's licensed under the GPL, and while I intend to open source my assembler, I have pragmatic issues with libraries being licensed under the GPL (Namely, I'm of the opinion that source access doesn't matter; what does is the data). Secondly, documentation for porting BFD to another target architecture is lacking.
If I were to use a pre-existing object format, it would be COFF, since ELF's additional features (mainly shared libraries and their support, with monstrosities like the GOT) are unlikely to be used on the Prop (And if someone wishes to use shared libraries then the ELF format's overhead would be unwarrented anyway and a better method would be using a custom format)
However, even COFF has stuff we don't need: Do we really need text, data and bss sections? On a system with a flat memory map and no support for memory protection, they are superflous, and add complexity.
The format I have laid out can be considered to be a "simplified COFF". It reduces the features to only those useful on the Prop, excluding sections and such. It could be said that this is a COFF with the unneeded fat trimmed off and extensibility added.
As for XML, I am all for it as a data interchange method for textual data which humans may need to read. But is there really any point for storing assembled instructions in an XML file? I can't, personally, see many users wanting to hand assemble/disassemble their object files!
The sections you mention for text, bss, etc.... are very useful for C programs embedded or otherwise. I'm working on the concept of using external memory to store "text" executable instructions and internal propeller memory for data (this requires some fixes in ICC so I'm on hold for now). If there was no way to partition the sections, the code would not work because the tool would not generate proper references.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve
That is a good point, but it only applies to code.
Would storing whether or not a fragment contains code or data in the object file not work equally well, however? Then your custom linker could just put the code at addresses above the prop's memory. Of course the code would need to not access data in code fragments, but the same applies for segments.
Alternatively, the object file format could contain whether a segment was text (=0), data (=1), or bss (=2), like COFF does (COFF also stores names, but theyre just for display)