Spin Interpreter ( Source Code )

hippy · 2007-12-24 21:07

For your Christmas delectation and delight; one Spin Interpreter written in PASM.

I'd hoped to get further than I have, but there's only so many hours in a day, the coding elves have packed up tools and run off claiming to have some other job they need to do, and I seem to have just five longs of code space left with a fair chunk still to implement, but here it is anyway

It runs a reasonable amount of code but there are some notable absences ( a += b, a++, and no subroutine calling ).

It's become very clear that I need a redesign as there's some 'trick' I'm not seeing. I'm currently using 80 longs as lookup / jump vector table which if I can reclaim most of that it might be enough to squeeze it into 496 longs, but I'm still doubtful. I've a long way to go before I can match the skill of Chip's programming. In the meantime I'll probably resort to overlays.

For now though, back to the mince pies and alcohol.

deSilva · 2007-12-24 21:43

Quite... astonishing!

bambino · 2007-12-25 00:01

Someone has been extremely busy!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Rayman · 2007-12-25 01:17

Cool! At first, I though you meant Spin bytecode interpreter... And, I was wondering what good that would be... But, this is more interesting... I think this takes text input, right?

hippy · 2007-12-25 04:43

Rayman said...
Cool! At first, I though you meant Spin bytecode interpreter... And, I was wondering what good that would be... But, this is more interesting... I think this takes text input, right?

This is a Spin bytecode Interpreter

In its own right I'm not sure what use it is, but l can probably think of something when I'm more awake. Being able to single-step, breakpoint and debug bytecode should have its uses and the porting possibilities appeal to me; PICmicro / AVR / ARM / PC etc. With a bit of time-slicing ( and obviously loss of speed ) it should be possible to emulate eight cogs on a single CPU. I can see still see advantages to running Spin code even if the hardware isn't Propeller and doesn't support all the hardware attributes.

deSilva · 2007-12-25 11:14

Its a little bit like running "VirtualPC" on a PC.. lots of advantages!

Rayman · 2007-12-25 13:37

Ok. I see it now... You give text output of the commands...

Rayman · 2007-12-26 00:27

I would be sure nice to go the other way... Take text and convert to byte code... Then, maybe wouldn't need the Prop Tool at all and have the Prop able to edit it's own code... I've heard they're doing this for the next version of the Prop. But, maybe a lite version (FemtoSpin?) could be made now...

Paul Baker · 2007-12-27 19:58

Rayman, I think you're still not seeing what it is, everything before the DAT is sort of window dressing. He is trying to reverse engineer the Spin interpretor in the ROM of the Propeller.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer

Parallax, Inc.

hippy · 2007-12-28 16:36

Paul Baker (Parallax) said...
He is trying to reverse engineer the Spin interpretor in the ROM of the Propeller.

Clean-room implementation from scratch lest anyone thinks I'm simply trying to decode the ROM ( no offence taken ).

The post Christmas haze has lifted so time for version two. This one makes extensive use of overlays which really does free ones mind from trying to work within Cog memory constraints, and it's now easier to say what it doesn't do rather than does ...

* No calls to Object Methods
* No Abort / Abort trapping
* No LookUp / LookDown
* No SPR bit or bits access, INA[noparse][[/noparse]0], INA[noparse][[/noparse]0..7]
* No CogInit / CogNew
* Random numbers aren't random

For comparison with the genuine Parallax Spin Interpreter; size is approximately 3700 bytes, 925 longs.

Approximate benchmark timings (seconds/mm:ss), calling a subroutine to increment a long in hub memory 10 million times ...

170 / 02:50 - Genuine Parallax Spin Interpreter
480 / 08:00 - Using overlays
225 / 03:45 - Without using overlays
220 / 03:40 - Fetch loop optimised without using overlays

So it's at best 30% slower than Parallax's ( vice-versa, Parallax is 23% faster ). It's in the right ballpark so I'm very happy with that.

Paul Baker · 2007-12-28 19:42

You're correct Hippy, my description wasn't entirely accurate. Now if you started doing execution time comparisions and modify your interpretor to match the internal ROM interpretor's execution speed, I would catagorize your efforts as reverse eengineering (via the black box method).

One interesting pursuit of this effort of yours would be to see if by deletion of a few of the lesser used operators to free up some space, if a substantial speeding up of remaining operators is possible by optimizing for speed rather than space.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Paul Baker
Propeller Applications Engineer

Parallax, Inc.

hippy · 2007-12-29 00:02

Paul Baker (Parallax) said...
You're correct Hippy, my description wasn't entirely accurate. Now if you started doing execution time comparisions and modify your interpretor to match the internal ROM interpretor's execution speed, I would catagorize your efforts as reverse eengineering (via the black box method).

If I can match the speed of the ROM interpreter I may come asking for a job

I think the furthest I'm going to go is the incrementing loop and a stopwatch. Someone else can worry about hyper-optimisation and appreciate the sterling work Chip's done. I'm more than happy with what I've got, but obviously will optimise where I see an opportunity.

If I had to set a 'success criteria' on speed, I'd say bit-banging 19200 baud serial.

Paul Baker (Parallax) said...
One interesting pursuit of this effort of yours would be to see if by deletion of a few of the lesser used operators to free up some space, if a substantial speeding up of remaining operators is possible by optimizing for speed rather than space.

The way I'm doing it ( looked-up jump table ) there'd be no gain. Handlers can either be fixed in Cog or overlayed in as needed so there's obviously a speed advantage which can be given to most commonly used opcodes. I was quite surprised that with everything overlayed it was only half the speed, I'd expected worse.

There are gains to be had in un-rolling code and focused optimisations but I'm trying to keep it as clean as possible for maintenance at present.

One interesting comparison would be, instead of overlay loading and executing, to perform LMM-style interpretation of the overlay code. That would benefit non-looped overlays but the looped ones would suffer; I can however see an easy way to do both.

A favour to ask ...

As it's Christmas ( playing the "season to be giving presents" card ); you don't happen to have an example line of code which generates the $3C opcode you'd be prepared to share ? Try as hard as I can, just cannot seem to generate one. It would be much appreciated.

hippy · 2007-12-29 03:52

Earlier Version 002 benchmarks ...

170 / 02:50 - Genuine Parallax Spin Interpreter
480 / 08:00 - Using overlays
225 / 03:45 - Without using overlays
220 / 03:40 - Fetch loop optimised without using overlays

Current Development Version having undergone some tweaking ...

170 / 02:50 - Genuine Parallax Spin Interpreter
450 / 07:30 - Using overlays, load and execute
395 / 06:35 - Using overlays, interpret LMM-style when possible
170 / 02:50 - Favoured in-cog handlers, overlays interpreted LMM-style when possible

So a significant saving using LMM-style interpretation rather than load and execute overlays. Load and execute takes 14% longer than LMM and I already do demand-sized loads for speed. Something significant perhaps for everyone looking at LMM-style implementations.

The real hoot though is getting execution speed to match the ROM Interpreter !

It's perhaps not a fair comparison as I have all the opcodes used in the benchmark as favoured opcodes held in-cog for maximum performance, particularly the three subroutine related opcodes ( Frame / Call / Return ) and the Cog is once again near full so some of that will have to move back to overlay.

One thing I hadn't appreciated ( coming from a background of much slower micros ) is that it doesn't really amount to saving much by optimising away a single instruction, and conversely even an extra unnecessary one left in does little damage, not even over 10 million loops. In this case optimizing to hit hub access sweet spots is the key to fast execution.

The current footprint is down to around 2800 bytes, 700 longs. A new release will be along sometime soon.

hippy · 2008-01-02 02:30

I've got as far as I can so this is likely to be the final release for a while unless I ( or anyone else ) finds any bugs which need fixing.

I'm in the wrong OS so cannot determine footprint or benchmarks at present but at one point it was running my increment loop faster than the ROM interpreter does. I'll give details later.

Everything working except for whatever the missing $3C opcode is used with, and I hit a brick wall with CogNew / CogInit. That uses an unidentified sub-opcode which needs further investigation.

If the PropTool generated a suitable symbol table / listing file it would be possible to do high-level Spin debugging, statement stepping, breakpoints with variable content viewing and change ( like VB6 and other similar development tools ). Background Debugging Mode would be something very worthwhile Parallax could add to the Prop II's extra ROM, a Spin Interpreter designed for this purpose. Just a listing file and leaving the Propeller community to it would be a good second best.

Two side-tracks were multi-tasking ( running two or more Spin Programs on the same cog ) and embedding other bytecode interpreters within this one. Both worked well. The bottom line is that it should be relatively easy to emulate any arbitrary bytecode architecture. Likewise any other processor instruction set when cycle timing isn't important.

LCC, an ANSI C compiler, looks to have a fairly well defined bytecode / VM so I may consider that next, or maybe a PICmicro/AVR emulator so their compilers can be used. I might also take a look at rewriting the BS1/PICAXE emulator in PASM.

hippy · 2008-01-09 03:34

A bug fix. Cock-up introduced when adding support for ABORT or more likely when changing the stack-framing to support inter-object calls.

Bug found while investigating "Where does an ABORT trap to when it occurs in a method being used to form a parameter for a method " ? The answer is; it doesn't continue after the method being called, but falls out the method that call was attempted within. No, I can't phrase it better !!!

This, using Parallax ROM Spin Interpreter, prints ...

Started
Trap Start
Abort
Finished

Some may have expected "Trap Finish" to have been executed but it isn't.

PUB Main
  tv.Start( TV_PIN )
  tv.Str( String("Started",$0D) )
  \Trap
  tv.Str( String("Finished") )

PRI Trap
  tv.Str( String("Trap Start",$0D) )
  \Method( Parameter(1) )
  tv.Str( String("Trap Finish",$0D ) )
  
PRI Parameter( dummy )
  tv.Str( String("Abort",$0D) )
  abort

PRI Method( dummy )

Spin Interpreter ( Source Code )

Comments