SPIN Interpreter Internals (link included)
asterick
Posts: 158
Given light of some interest (one person) in my SPIN interpreter research, I went ahead and decided to do a write up of everything I know on the inner workings.
www.sublab.net/spin.html
This covers all the operation codes I could get the compiler to produce. Anything that is not known is marked.
EDIT: Added opcode 0x07 (Call indexed object)
Post Edited (asterick) : 1/5/2007 10:55:51 PM GMT
www.sublab.net/spin.html
This covers all the operation codes I could get the compiler to produce. Anything that is not known is marked.
EDIT: Added opcode 0x07 (Call indexed object)
Post Edited (asterick) : 1/5/2007 10:55:51 PM GMT
Comments
(You can test this by adding and removing variables and watch it change.)
Also, your python code fails for me; I'm trying to understand why. (All I get is numbers, no op codes.)
Based on these results you might share with people what makes the most compact code. (I.e., the first seven (arguments + locals)
are more concisely accessed than the rest, also the first 8 long vars are more concisely accessed; also longs are faster than
either bytes or words in general).
It looks to me like the "PC relatives" (branch offsets and the like) use a two-byte form frequently when a one-byte form would
suffice.
Maybe I'll just write a binary to binary optimizer that does the obvious peephole improvements.
The relative branches are 7-bit signed, so they can only be between -64 to 63 (the MSB means that it's a 16-bit branch)
Local 0 is the RESULT or return value. Local 1-7 are the call arguements, followed by local variables.
Yes, longs are more efficient, so long as they take up the first 8 entries in the VAR tables. Also interleaving BYTEs and LONGs results in alignment and will steal bytes from you.
As far as your "var" offset, I don't know, it seems pretty ambiguous. 1st and 2nd tier objects have a zero (1st tier is ignored), where as the 3rd tier tend to have wildly large values (0x3000+)
I also discovered another opcode, which leaves only 2 unknown in the main instruction set.
I've included an update disassembler, which dumps all the crazyness.
to see if we can get a dump for an emulator. I don't actually own a propeller board yet, so I can't do it myself. I'm almost certain that the people at parallax thought to do boundary checking on that function, but you never know. They had the foresight to encrypt the rom after all.
My interpretation of the second word of the object is as follows.
Given an object with a var ptr of v.
This object has two subobjects, o1 and o2. The var offset of o1 is v1, and the var
offset of o2 is v2.
When o1 or o2 is invoked, the var ptr should be set to v + v1 or v + v2, respectively.
Essentially, it's an offset from the var ptr of the enclosing object to get to the var
pointer of the current object.
(Try including the same object multiple times; you'll get multiple subobject entries,
each with a different var offset, reflecting that they are different objects, yet they
share the same code.)
If you have an example that shows this not to be the case, I'm interested in it.
I was hoping to get a debugger working by modifying the main-memory code in-place, but
have not yet figured out everything I need. (I was going to modify the main-memory code
to put a branch-back-to-itself, or maybe a wait-pne or something).
What we should consider doing is coming up with a symbol table format, since clearly that's
lacking and useful.
Mike
PS - I'm interested as well.. so make it TWO people....
The WORD after the child object offset is the VAR block offset relative to the parent object's VAR block offset (nrr)
I didn't really mess with custom objects, I usually just messed with the library objects, which makes sense that they would go crazy, since they can have massive variable blocks.
I'm going to be cleaning it up after a bit, so it's easier to read. Right now it's a bit slapped together. I also need to describe what the codes themselves do, and how they affect the stack.
Those voids in the instruction set are glaring at me though. [noparse]:)[/noparse]
(The JIT BASIC would just be a stunt; only the smallest and simplest loops would stay in assembly---but they would run fast!)
Using the bytecodes as a BASIC target would ease the memory pressure on the microcomputer I'm building because I would reuse the
existing ROM code rather than taking up space for my own code. And for simplicity I'd use as few opcodes as possible to keep the "compiler"
simple.
Other tricks you can do pretty easily with this information:
dynamic loading and execution of SPIN binaries
loading and executing a SPIN binary with assembly just to load up and start a COG, and then free that memory back up. This way you
could load a *bunch* of functionality into some cogs (with assembly language) and then free up all that shared RAM for other uses.
debuggers/tracers
passing an object by reference (this is a tiny bit tricky)
It wouldn't be a huge challenge if you used a custom compiler (since you could embed the VAR block inside the object itself)
I don't think the system write protects code space.
If you ignore multithreading, objects by reference is not that hard; you simply need to make
sure the method signatures match. Everything else falls into place pretty easily.
With multithreading, it's harder, since the code space is shared and you only have a byte
index to use for a trampoline. But it should still be reasonably possible.
I like the spin bytecodes since they almost act as a form of compression when used correctly.
Complex tasks can be done with a few spin opcodes which would normally take many 32bit ASM opcodes to do!
I've once wrote complex code in ASM for the prop only to later write it in SPIN to save space! (at the cost of speed)
A perfect example of spin's compression is doing some multiplication.
How many 32bit ASM ops would we need? 12 maybe?
How many Spin ops would we need? I know it's less then 12. (lol)
Correct me if I'm wrong, but I think of the spin ops as a bunch of useful little routines. You just call the routines as you need them.
I mean the same could be done in ASM (make a whole bunch of routines for complex functions used a lot in your program and call them), but Parallex already has done this for us.
Not to mention that these "routines" are already stored in ROM and don't cost us any memory!
Maybe I see things from the wrong angle, But this is how I view the spin opcodes.
5 instructions.
- Mikael
PUB mul( x, y )
x *= y
translates down to
PUSH_LOCALMEM_LONG_2
EFFECT_LOCALMEM_LONG_1 POP MULTIPLY
which turns into 3 bytes.
Still better than 20 needed for assembly. [noparse]:)[/noparse]
Here's some old Z80 code for the task I pulled up on the net.
Still needs more opcodes by far!
INPUT: THE VALUES IN REGISTER B EN C
; OUTPUT: HL = B * C
; CHANGES: AF,DE,HL,B
;
LD HL,0
LD A,B
OR A
RET Z
LD D,0
LD E,C
LOOP: ADD HL,DE
DJNZ LOOP
RET
I've made a bootloader that does everything except the serial comms / detection.
Someone mind giving me some heads up on how it works?
Yeah, now that's what I call compression. [noparse]:)[/noparse]
I had a feeling multiplication would be a good example of the spin ops.
Hey look, I wrote Multiplication in one asm op. (lol)
"shl foo,bar"
Only thing is you have to multiply by a power of two. *laughs*
Here's some ARM! Mwahaha.
MUL r0, r1, r3
Good news. Someone from parallex left some example code up here for that!
Let me see if I can find it.
(p.s) I like the z80. with it's 150,000 some instructions.(lol) MSX!!!
Post Edited (Ym2413a) : 1/4/2007 8:40:30 PM GMT
Something relating to a serial loader. Not sure if you've seen this or not.
http://forums.parallax.com/showthread.php?p=622354
Yeah, if you keep the program in the first 256Bytes(Zero-page mode) you may even get it to run faster than on the Z80...
(That is, if they ran at the same clock-speed...)
Those 16bit ADD and SUB, commands and the BIT commands, who needs them?
Not to mention those pesky IX and IY registers?
I mean, seriously, who would ever do anything as daft as work with databases, or even linked lists on an 8bit CPU?
But where the Z80 really suffers is the Interrupts...
It can barely bit-bang a 9600bps full-duplex serial link in the background, because the shadow-registers is such a clumsy solution to use during tight interrupts...
And the ability to handle up to 128 different interrupts, well, that is just plain confusing, isn't it?
Ooops...
Left my brain in Irony-mode...
Anyway, that piece of Z80 code among is the worst I've ever seen.
(One of my hobbies is disassembling ROMs from assorted computers)
Do you know what it does?
To multiply B with C it copies C into the DE(16bit register), then ADDs it B times to the HL(16bit) register.
if the B register contains 255, it'll loop 255 times, and the DJNZ command isn't nice to such loops.
(The DJNZ command on the Propeller is much nicer such loops. )
This one takes a bit more space, but it executes a heck of a lot faster:
http://map.tni.nl/sources/external/z80bits.html#1.1
Or read this page to learn why the Z80 was considered THE 8bit number-cruncher:
http://home.arcor.de/andreadrian/oldcpu/Z80_number_cruncher.html
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Don't visit my new website...
That sucks for me. [noparse]:([/noparse]
I learned ASM on the 8bit z80.
Then again some people have called the propeller's opcodes weak and pointless and joke about its lack of interrupts, but I like the propeller too and like its ops as well.
So I guess it's what your used too.
Post Edited (Ym2413a) : 1/4/2007 10:41:09 PM GMT
exciting stuff,
Marty
A Sinclair machine?
(Timex Sinclair in the USA)
A Gameboy?
Anyway, I think it's time to revive the 'Old School Hacker' thread in the Sandbox...
http://forums.parallax.com/forums/default.aspx?f=15&p=001&m=387
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Don't visit my new website...