Some Propeller reverse-engineering info
Cliff L. Biffle
Posts: 206
Work has slowed down for a week or two, so I finally got around to reverse-engineering the Propeller binary format and some of the Spin bytecodes. (Chip sent me an example machine-code loader routine, but it didn't work, for reasons I outlined in a PM to him -- so I cleanroom-engineered the file, because it's more fun that way.)
Info here:
www.cliff.biffle.org/software/propeller/binary-format.html
www.cliff.biffle.org/software/propeller/spin-bytecode.html
These are both works in progress; I'd appreciate any additional info from the community.
My assembler is coming along quickly, and currently has frontends for the Parallax ASM syntax and a different, somewhat cleaner syntax (for folks who fondly remember 68000 assembler). It's feature-complete, but I haven't built out the full instruction set yet. Source and binaries soon.
Post Edited (Cliff L. Biffle) : 10/19/2006 5:53:13 AM GMT
Info here:
www.cliff.biffle.org/software/propeller/binary-format.html
www.cliff.biffle.org/software/propeller/spin-bytecode.html
These are both works in progress; I'd appreciate any additional info from the community.
My assembler is coming along quickly, and currently has frontends for the Parallax ASM syntax and a different, somewhat cleaner syntax (for folks who fondly remember 68000 assembler). It's feature-complete, but I haven't built out the full instruction set yet. Source and binaries soon.
Post Edited (Cliff L. Biffle) : 10/19/2006 5:53:13 AM GMT
Comments
Mike
The best way to reverse engineer spin would be to write trivial sample programs, to exercise each feature of the language, and examine the resulting byte code; however personally, I am more interested in being able to totally take over the propeller and not having to worry about using spin to load code - now in an ideal world, it would also be interesting to invoke spin from assembly code for certain situations.
Basically, what I find limiting is the "32k static load on startup" - I'd like to see ability swap code in, dynamically building code on the fly for cogs, and much more; things that would not be easy to implement right now. Heck, I'd even be interested in the ability to run spin apps in a constrained portion of the hub memory... as I said, I like spin [noparse]:)[/noparse]
I am busy moving at the moment but as soon as I can I will have a look at the kernel myself.
*Peter*
That said, the current SPIN compiler seems to do nearly no compile-time optimization -- in particular, it does not do constant folding, does not inline short functions, and seems to keep local variables off the stack (resulting in more memory accesses).
These three are particularly unfortunate, because they encourage unreadable code in the name of speed. (Though, I suppose, you're already working in a language that uses |< and ~~ as unary operators...readability may not be a concern.)
Here are two concrete examples which, while semantically equivalent, differ in execution speed. They are nonsense code, of course, but were we using a real compiler (say, GCC), these would each be equivalent at runtime.
v := 2 + 2 + 2 ' slow.
v := constant(2 + 2 + 2) ' fast
' Slow version:
r := (v & $4) << 8
v := r | $42
' Fast version: inlining is good for you! Or something.
v := ((v & $4) << 8) | $42
' Slow version
PUB ILikeStructuredProgramming(v) :r
r := twiddle(v,2)
r := twiddle(r,4)
r := twiddle(r,8)
PRI twiddle(x,y) : r
r := (x >> y) | (y << (32 - y))
' Fast version
PUB ILovePoorCode(x) : r
r := ((((x >> 2) | constant(2 << 30)) >> 4) | constant(4 << 28)) >> 8) | constant(8 << 24)
' Or, after constant folding,
r := (x >> 14) | $8480000
As you've noted, the compiler generates byte code based quite directly on source code. Though it does not perform optimizations, the intepreted byte-code language, itself,·is quite memory-efficient. In the future, an optimizing compiler·could shrink programs (somewhat) and make them (a little)·faster, but this is a huge step up in complexity (read 'reliability')·and is not a priority for us right now. What we have works perfectly and·it serves its purpose well. I do agree that constant folding would be a huge win. I'll look into implementing that, as it might not be too difficult to do.
I bet you'll make an awesome C compiler!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Chip Gracey
Parallax, Inc.
In this case, the code in question does something like this:
push 2
push 2
add
pop (or tuck) value to local variable
push that same local variable
push 2
add
pop to same local variable.
That's what I mean by more memory accesses: it's shuffling the stack a lot more than it ought to. Optimized stack code would omit the intermediate pop/push.
(Note: I'm still working out how exactly the local variable stores work. I suspect it's simply moving TOS down into the stack by an offset, but it's still unnecessary.)
I'm not disputing the value of a stack-oriented bytecode machine.
Chip Gracey
Parallax, Inc.
Can you please please show the source that created this byte stream?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Chip Gracey
Parallax, Inc.
I don't have the original -- I'm playing around in PropTool and not saving my sources -- but here's an example. It's pathological, but it shows what I'm referring to.
Spin bytecodes:
(Intriguingly, r+=2 and r := r + 2 generate different bytecodes. More fodder for study!)
So, Chip, it's like you say -- each statement is being compiled separately, without regard for the stack effects of the others.
By contrast, a "canonical" compiler for a stack machine would leave locals as intermediate values on the stack, rather than stashing them in an activation record below the current data stack. The code generated would simply omit the 65/64 pairs:
Between that and constant folding, you'd wind up with this:
An interested party could write a relatively simple two-wide peephole optimizer to take care of this; the artifact (the tuck/push sequence) is very consistent. I had to address a similar issue in javac in JDK1.1, back when Java was interpreted. (Unfortunately, the PropTool can't be invoked from scripts and the like, so you'd wind up manually writing binaries to the disk, processing them, and loading them back in.)
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Chip Gracey
Parallax, Inc.
Post Edited (Chip Gracey (Parallax)) : 10/20/2006 9:04:54 AM GMT
source before it was compiled, and yet another external program to run on the
binary stuff that was so generated. This way we could add our own typing
restrictions to spin, or add a peephole optimizer, or the like, without having to
delve into Delphi. We could (for a simple case) run the m4 preprocessor on
source, so you can have full preprocessing semantics. Or someone can
(if they really really want to) change >= and <= to behave more like they
more "normally" do, and use something different like >:= and <:= or something
for the extended semantics.
Of course this does open a Pandora's box when it comes to compatibility and
the like, but for many people it would allow them to (for instance) write a
trivial Perl script to catch the errors they make time and again.
And perhaps enough experimentation with a simple peephole optimizer may
actually yield something that could be easily and profitably integrated with
the "normal" IDE.
I believe the Microsoft IDEs generally support this type of functionality.
It's also the basis for every Unix development tool, ever. I'm totally with you.
Though in my case, it'd need to somehow run these programs outside of my VM on my real computer, where I have scripting languages and m4.
I've refined my Propeller binary preamble and gotten it working for the general case. With some help from Chip, I've worked out the mystery 8-byte sequence that Propeller Tool requires to load a file (yet which the Propeller seems to disregard).
The file layout table and the preamble have been updated on my webpage:
www.cliff.biffle.org/software/propeller/binary-format.html
I've also got a lot more of the SPIN bytecode worked out; I'll update that page shortly.