Standard object file format

OwenS · 2008-09-09 19:35

With multiple people working on their own DIY propeller tools, I thought we would benefit greatly from a standardized object file format for storing compiled (but not linked) code.

One question I raise is should this object format support Spin objects, and if so, how? I don't have any reasonable ideas here; all of them woul be slower than the current single pass compilation. Suggestions?

My current idea was that each object file would contain

A identification and info header
Zero or more imported symbols
Zero or more exported symbols
Zero or more internal symbols
Zero or more fragments

Each fragment contains

The code and data to be stored in it
A list of relocations to be performed on the contained code

Fragments may be output in any order, except the first fragment of the first input object will always be output first

The following is a preliminary overview of the format. It's in a mix of spin and c, with datatype sizes following those used on the Propeller. All values are unsigned

struct Header {
long magic = "POBJ";
long version = 0;
long importedSymbols;
long exportedSymbols;
long privateSymbols;
long fragments;
ImportedSymbol imported[noparse][[/noparse]importedSymbols];
ExportedSymbol exported[noparse][[/noparse]exportedSymbols];
PrivateSymbol private[noparse][[/noparse]privateSymbols];
Fragment frags[noparse][[/noparse]fragments];
};

struct ImportedSymbol {
long num; // this is the number by which this symbol will be referred to by the relocation sections
word len; // length of text, bytes
byte text[noparse][[/noparse]len]; // text is not null terminated
};

struct ExportedSymbol {
long num; // this is the number by which this symbol will be referred to by the relocation sections
long frag; // fragment in this object file which this symbol refers to
long off; // offset in the fragment
word len; // length of text, bytes
byte text[noparse][[/noparse]len];  // text is not null terminated
};

struct PrivateSymbol {
long num; // this is the number by which this symbol will be referred to by the relocation sections
long frag; // fragment in the object file which this symbol refers to
long off; // offset in the fragment
};

struct Fragment {
long bytes; // length of fragment in bytes
long relocations; // number of relocations
byte data[noparse][[/noparse]bytes]; // data
Relocation relocs[noparse][[/noparse]relocations]; // relocations
};

struct Relocation {
long symbol; // Symbol to relocate against
long offset; // Offset in fragment to address to relocate
byte size; // Size of symbol
// Performing relocations:
// switch(size) {
//    case 1: byte[noparse][[/noparse]frag.data + reloc.offset] = symbol.address + reloc.adj; break;
//    case 2: word[noparse][[/noparse]frag.data + reloc.offset] = symbol.address + reloc.adj; break; 
//    case 4: long[noparse][[/noparse]frag.data + reloc.offset] = symbol.address + reloc.adj; break; 
// }
}

Questions:
* Should we support Spin? How?
* Is the Relocation structure capable enough?
* Do we need anything else?

Ale · 2008-09-09 19:51

Why not some *text* based interchange format ?... like xml

. Binary files do not really bring any benefit at this point IMHO.
Spin objects work with addresses much like asm code, so relocation can be taken care of in a similar manner.

Bill Henning · 2008-09-09 20:01

Simple... because it takes less resources to use a binary format, and as eventually I hope to see self-hosting propeller tools, simplicity and compactness win [noparse]:)[/noparse]

On that note, almost a year ago I suggested a simple object file format for LMM code, it's in the forums somewhere... I am thinking of something a bit better now, but still simpler than the proposal above.

Ale said...
Why not some *text* based interchange format ?... like xml . Binary files do not really bring any benefit at this point IMHO.
Spin objects work with addresses much like asm code, so relocation can be taken care of in a similar manner.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com - a new blog about microcontrollers

OwenS · 2008-09-09 20:02

Because an object file already contains binary code: The output of the assembler for whatever language your using, whether it be Spin, C, or even raw assembly. Storing this in XML would just make it large, and be silly. Nobody really wants to edit an object file anyway

OwenS · 2008-09-09 20:07

Bill:
Please do submit your object format. It would be very interesting to hear how you would do it, and would certainly count as suggestions

Ale · 2008-09-10 09:39

It is not that silly, especially if you think that when you want to add a new thing, it will not break other ones. It will be ending agnostic, and self-explanatory. I'm all (or was) for blobs, but in the late times I come to realize that for exchange of information... it could be a good idea(tm).

If you want to ultimately compile in the propeller itself... it could be a problem, but for pcs?... with GBs of memory and disk space... not a problem(tm).

hippy · 2008-09-10 11:21

As an aside on XML, ignoring the massive bloat; providing it's well designed, one <...> per line, content on next line, it's quite easy to parse without any specific XML handlers and can easily be converted between an XML and 'block format' and vice-versa, and also therefore to binary format. The point is that it doesn't really matter what the specific format is providing the content structure is well defined and there's a means to reasonable easily convert between formats.

I had to do this in a project where I was using compact block text files and a another developer insisted on using XML. It was far easier than I expected.


<name="Symbol Table" count="2">
   <symbol>
      <name>
        Fred
      </name>
      <address>
        0000
      </address>
   </symbol>
   <symbol>
      <name>
        Bill
      </name>
      <address>
        0120
      </address>
   </symbol>
</name>

or

SYMBOL TABLE
2
Fred
0000
Bill
0120

Ale · 2008-09-10 12:50

Hippy: Exactly like that.

Binary data could be in pairs of hex digits, left lower address, right higher address as in a hex file. Addresses in big endian (so human readable).

OwenS · 2008-09-10 15:31

The problem with XML is not parsing it; it's generating it. For example, if you have the input string "a > b", then you need to convert it to "a > b", else you produce invalid XML. Also, XML doesn't like redundant information like the count="2" in Hippy's example above; Parsers should read it as an XML document, not by any "count" attribute or such (XML parser libs can be quite small and fast anyway, like Expat)

One option would be to change to an IFF-style format, that is each datatype has the following header:

struct header {
long fourcc;
long size; // size = sizeof(type) - sizeof(header);
};

Conventionally, that is, following the IFF specification, the numbers would be stored in big endian. Since were targetting a little endian processor, I propose storing them in little endian instead.

I, from that, propose the following layout:

The following diagram uses the format:
Name [noparse][[/noparse]Quantity] (FourCC)

Header  (POBJ)
|-- ExportedSymbol [noparse][[/noparse]0...x] (ESYM)
|-- ImportedSymbol [noparse][[/noparse]0...x] (ISYM)
|-- PrivateSymbol [noparse][[/noparse]0..x] (PSYM)
\-- Fragment [noparse][[/noparse]0..x] (FRAG)
     |-- Data  (DATA)
     \-- Relocation [noparse][[/noparse]0..x] (RLOC)

Any type which can contain other predefined types can contain custom types. Readers should ignore types theyve never seen before. You may make subtypes of your own types.

Ale · 2008-09-10 16:10

OwenS said...

The problem with XML is not parsing it; it's generating it. For example, if you have the input string "a > b", then you need to convert it to "a > b", else you produce invalid XML. Also, XML doesn't like redundant information like the count="2" in Hippy's example above;

Actually, no. Generating XML is by no means difficult. You just use printf and format at your wish. Redundant information ?, the parser does no know that it is redundant, nor it cares. (as long as 2 attributes have different names...)

There is a convention to transfer special chars (like '>') between tags, you just write < or > like in html.

jazzed · 2008-09-10 16:41

You are talking about compiled objects right ? What's wrong with using ELF other than NIH (not invented here) ?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve

Ale · 2008-09-10 16:44

what is wrong with ELF ? libbfd of course !

hippy · 2008-09-10 17:16

OwenS said...
The problem with XML is not parsing it; it's generating it. For example, if you have the input string "a > b", then you need to convert it to "a > b", else you produce invalid XML. Also, XML doesn't like redundant information like the count="2" in Hippy's example above; Parsers should read it as an XML document, not by any "count" attribute or such

Converting > is easy ( and I bet many are confused by the forum converting it back automatically

), no different to converting plain text to HTML.

Attributes I don't see as a problem either. Arguable for my 'KISS' approach I should have added an extra <count>2</count> entity rather than use an attribute but it doesn't really matter. Any XML interfacing engine should be able to extract any specific entity out of the XML just like picking any <pre>...</pre> out of HTML etc. The end-user programmer still has to know about the meta-format and meta-data to deal with what is just 'data' even though it's XML ( although there might be some 'content aware' processing tools available, not my field ).

Perhaps the significant difference is that in both cases, XML and my text format, there is an indicator of count rather than the case where a parser would scan a line at a time and work out which section it was now in or whether there was more data or sub-data, or data blocks being terminated by blank line etc.

IMO there's only one rule for a portable and convertable data representation -

Every entity must have a count of how many sub-entities there are plus those sub-entities.

Whether the end-user needs to know what those sub-entities are by pre-definition or the sub-entities describe themselves is icing on the cake. It doesn't matter, just affects how east it is to use a parsing engine library to get at that data.

I got the syntax of my example wrong, but with a couple of simple tweaks it seems to be valid enough XML. If I could remember how to it would be easy enough to turn that into a page view in IE/Firefox using the JavaScript XML engine.

jazzed · 2008-09-10 17:27

Ale said...
what is wrong with ELF ? libbfd of course !

Maybe libbfd is too big·?· Sorry I haven't drilled down to the nether reaches of ELF, it's just so pervasive that one should at least consider it. Anything "general use" probably has more definition than necessary to do the job though.

XML is not so bad if done simply, but the data representation *is* bigger than necessary. If you use it for IPC, marshalling etc... is required. You don't get marshalling with "tv.out" or whatever [noparse]:)[/noparse]

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve

OwenS · 2008-09-10 18:28

Ale said...
There is a convention to transfer special chars (like '>') between tags, you just write < or > like in html.

XML requires that you output < for < and > for >. You can declare a section CDATA in a DTD, but XML doesn't support that feature IIRC.

Ale said...
what is wrong with ELF ? libbfd of course !

I have two issues with libBFD. The first is that it's licensed under the GPL, and while I intend to open source my assembler, I have pragmatic issues with libraries being licensed under the GPL (Namely, I'm of the opinion that source access doesn't matter; what does is the data). Secondly, documentation for porting BFD to another target architecture is lacking.

If I were to use a pre-existing object format, it would be COFF, since ELF's additional features (mainly shared libraries and their support, with monstrosities like the GOT) are unlikely to be used on the Prop (And if someone wishes to use shared libraries then the ELF format's overhead would be unwarrented anyway and a better method would be using a custom format)

However, even COFF has stuff we don't need: Do we really need text, data and bss sections? On a system with a flat memory map and no support for memory protection, they are superflous, and add complexity.

The format I have laid out can be considered to be a "simplified COFF". It reduces the features to only those useful on the Prop, excluding sections and such. It could be said that this is a COFF with the unneeded fat trimmed off and extensibility added.

As for XML, I am all for it as a data interchange method for textual data which humans may need to read. But is there really any point for storing assembled instructions in an XML file? I can't, personally, see many users wanting to hand assemble/disassemble their object files!

jazzed · 2008-09-10 19:03

OwenS,
The sections you mention for text, bss, etc.... are very useful for C programs embedded or otherwise. I'm working on the concept of using external memory to store "text" executable instructions and internal propeller memory for data (this requires some fixes in ICC so I'm on hold for now). If there was no way to partition the sections, the code would not work because the tool would not generate proper references.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve

OwenS · 2008-09-10 19:23

Jazzed,

That is a good point, but it only applies to code.

Would storing whether or not a fragment contains code or data in the object file not work equally well, however? Then your custom linker could just put the code at addresses above the prop's memory. Of course the code would need to not access data in code fragments, but the same applies for segments.

Alternatively, the object file format could contain whether a segment was text (=0), data (=1), or bss (=2), like COFF does (COFF also stores names, but theyre just for display)

Standard object file format

Comments