Ways to Break the 32KB Spin Code Barrier

jazzed · 2009-11-02 06:15

This post is in response to James' query in another topic. Answering it here to avoid polluting the other thread.

JMH said...
Please clarify about using Spin stubs and 32kb limit.

There is more than one way to do it.

First consider the easiest way to use more than 32KB for code ... not exactly a direct answer to "stubs", but similar and the answer some would prefer. That would be to load PASM services into all cogs (except cog 0) that can be used by small Spin methods ... more or less what can be done today as described with one or more methods in Post Boot PASM COG Loader?.

Secondly, one can use a virtual memory method (paging, but not a full swap implementation). The Spin interpreter can be intercepted [noparse]:)[/noparse] I'm doing this in Spud. Again this requires Spin stubs to activate code, but in this case a code cache can be used where spin byte code is stored on SD card and pulled in to the cache as required, executed, and maintained there as required based on statistical usage.

Thirdly and the fastest way, one could use multiple Propellers in a RPC methodology. This would require Spin stubs on the host to activate a COG based management/transport layer for sending remote procedure call requests (RPCs) to one of a cluster of Propellers as discussed in the OctoProp Thread. That method is on ice for the time being.

Also for multiple devices, it is possible to just build different images for each Propeller and use a predetermined protocol for predetermined method invocation/results, but this is less attractive for me at least mainly because it is not possible to throw "all resources" at solving "any one problem" in a generic and easily replaceable (that is without a difficult reprogramming cycle) method.

So those are ways to break the 32KB Spin code barrier.

I have most of this worked out. The problem is getting enough interest in seeing it finished. Some of the work is pretty tough and requires interest to be finished. The usage model is difficult to grasp for some here, but anyone who has done distributed parallel computing or studied computing systems architecture should understand the approaches discussed.

There has been a ton of external memory interest this year. It seems to me that a way to run bigger applications at native rates in the case of distributed parallel computing would be more interesting than waiting to load instructions at a time or even in blocks from external devices. Having a way to use 32KB+ Spin programs even if it is slower because of "paging" is attractive too ... this is what your PC is doing when it runs out of memory.

The question is what is worth to break the 32KB Spin code barrier? Maybe the problems being addressed by the community are too small to consider the extra effort?

Well, then perhaps it's not worth the bother. There are other "solid" shiny things on the horizon.

Added: It seems that mpark added the ability for larger Spin programs to be produced for Catalina at some point. I'm not exactly sure what utility that brings, but am pretty sure it is different from what I described except for possibly the first paragraph methodology.

Post Edited (jazzed) : 11/2/2009 6:42:01 AM GMT

Ale · 2009-11-02 14:30

The spin interpreter addresses are limited to 15 bits. With bit 14 being sometimes just sign so you end up with 15 bits anyways. It would need a new interpreter. It is not only a matter of putting more code in it. More memory (external) has to come with a lmm-enabled spin compiler/interpreter.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Visit some of my articles at Propeller Wiki:
MATH on the propeller propeller.wikispaces.com/MATH
pPropQL: propeller.wikispaces.com/pPropQL
pPropQL020: propeller.wikispaces.com/pPropQL020
OMU for the pPropQL/020 propeller.wikispaces.com/OMU

BradC · 2009-11-02 14:57

Ale said...
The spin interpreter addresses are limited to 15 bits. With bit 14 being sometimes just sign so you end up with 15 bits anyways. It would need a new interpreter. It is not only a matter of putting more code in it. More memory (external) has to come with a lmm-enabled spin compiler/interpreter.

The spin interpreter is going to require changes to work for the new chip anyway.

It would be interesting to look into what changes are actually required to enable the current interpreter work with more ram in an XMM format. Preferably without resorting to LMM, paging or any other tricks that will slow the interpreter down.

I'm sure the compilers can be relatively easily modified to suit.

I suspect the word size in the coginit/new statements are going to be limiting factors.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
If you always do what you always did, you always get what you always got.

Ale · 2009-11-02 15:49

Bard

the current interpreter is slow (I think) because of all the conditional code that just lays in the middle. I'd hope that a table based one would be much faster. But I do not think 2x, but less.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Visit some of my articles at Propeller Wiki:
MATH on the propeller propeller.wikispaces.com/MATH
pPropQL: propeller.wikispaces.com/pPropQL
pPropQL020: propeller.wikispaces.com/pPropQL020
OMU for the pPropQL/020 propeller.wikispaces.com/OMU

jazzed · 2009-11-02 16:34

Ale said...
The spin interpreter addresses are limited to 15 bits. With bit 14 being sometimes just sign so you end up with 15 bits anyways.

The number of address bits used by the interpreter is irrelevant for the "stub solutions" I've described. A "stub solution" means that most methods compiled in the main binary have nothing more than a signature and a method encoding. The bodies of the stubbed methods are kept in a separate location as either a file or a library running on one or more other Propellers.

Ale said...
It would need a new interpreter.

Yes, the "stub solutions" are two COG minimum. Some modifications to the interpreter are necessary for sharing the PC, stack pointer, and object context. Virtual Memory management is needed running in a second COG. In one case, the VMM will manage the pages and calls to the "library" method body. In the cluster case, the VMM will manage dispatch of RPC.

Ale said...
It is not only a matter of putting more code in it. More memory (external) has to come with a lmm-enabled spin compiler/interpreter.

This is true in the case of paging from an external device for best performance, but the "library" code could just as easily be stored on SD card.

jazzed · 2009-11-02 16:49

BradC said...
It would be interesting to look into what changes are actually required to enable the current interpreter work with more ram in an XMM format. Preferably without resorting to LMM, paging or any other tricks that will slow the interpreter down.

I'm sure the compilers can be relatively easily modified to suit.

I suspect the word size in the coginit/new statements are going to be limiting factors.

I agree it is worth investigating. I wonder if Chip has a solution for this already for Propeller II. If Spin can run using a bigger space it means Spin can become more of a general purpose language. That little stack pointer start word is definitely a problem. Maybe it can be merged with the current stack pointer to define the start? Then there is the chip id which lives at $FFFF ... that is a problem too which kind of makes a new booter necessary. Too bad we can't just use the chip ID to select the model.

Bill Henning · 2009-11-02 16:57

Interesting idea.

Personally, I wonder if it would not be simpler to do a segmented memory model for spin.

Yes, I know, flat memory space is nicer, and segmentation is evil.

However it segmentation can also be easier to implement...

(all approaches would be significantly simplified if only code space was expanded)

Simple approach:

Separate code, data and stack segments - although the stack would be relatively small, and could be part of the data segment.

Allow 64KB space for both - add XMM to the mix, and we could have 4x as much space for Spin code/data. Make device drivers loadable, and we have even more [noparse]:)[/noparse]

For extra simplicity, data/stack could stay in the hub, and only the bytecodes get moved to XMM.

More complex approach: (similar to Jazzed's stubs)

add segment bases for data, code (and possibly stack) to each object

change BST so that it accesses each method by code_seg[noparse]:o[/noparse]ffset instead of just offset

no need for stubs, would allow as much code and data as you have XMM, only real limit is each object would be limited to ~32k on Prop1

More radical approach

New byte codes for longjump, longcall, (read|write)(long|word|byte), change to BST for flat 32 bit (or at least 24 bit) addressing, XMM interpreter

jazzed said...

Ale said...
The spin interpreter addresses are limited to 15 bits. With bit 14 being sometimes just sign so you end up with 15 bits anyways.

The number of address bits used by the interpreter is irrelevant for the "stub solutions" I've described. A "stub solution" means that most methods compiled in the main binary have nothing more than a signature and a method encoding. The bodies of the stubbed methods are kept in a separate location as either a file or a library running on one or more other Propellers.

Ale said...
It would need a new interpreter.

Yes, the "stub solutions" are two COG minimum. Some modifications to the interpreter are necessary for sharing the PC, stack pointer, and object context. Virtual Memory management is needed running in a second COG. In one case, the VMM will manage the pages and calls to the "library" method body. In the cluster case, the VMM will manage dispatch of RPC.

Ale said...
It is not only a matter of putting more code in it. More memory (external) has to come with a lmm-enabled spin compiler/interpreter.

This is true in the case of paging from an external device for best performance, but the "library" code could just as easily be stored on SD card.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com Please use mikronauts _at_ gmail _dot_ com to contact me off-forum, my PM is almost totally full
Morpheusdual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory IO board kit $89.95, both kits $189.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler for the Propeller Largos - a feature full nano operating system for the Propeller

jazzed · 2009-11-02 18:49

Bill, you get it. I think BradC does too. Ale was probably just a little distracted by the bit count.

Some of the things mentioned like segmenting would require an adjustment to the stack frame which takes two (three?) longs today for book-keeping. Making the stack frame bigger might reduce interpreter size a little because of the two embedded bits in the object offset used as flags. The current single COG interpreter is very tight - there is one wasted long though[noparse]:)[/noparse] Even going to a 2 COG solution (not the best thing obviously) is not so bad if the result is worth it though.

Being able to make better use of today's hardware is to allow higher code density is one immediate goal for me whether it is big EEPROM, Flash, XMM, SD card, or even Propeller clusters, etc... that provides the solution.

Another goal which could make Spin less foreign or obscure to others (and may benefit Parallax) would be to make Spin a PC-able language like Java, C, VB, whatever (some disagree ... oh well). Obviously to me at least for making it PC-able demands a bigger memory range one way or another.

BradC · 2009-11-03 00:22

Bill Henning said...

More complex approach: (similar to Jazzed's stubs)

add segment bases for data, code (and possibly stack) to each object

change BST so that it accesses each method by code_seg[noparse]:o[/noparse]ffset instead of just offset

My gut feeling is that would be fairly complex to implement and end up being a bit of a nightmare. Check out x86 for the kind of gunk we want to avoid.

Bill Henning said...

More radical approach

New byte codes for longjump, longcall, (read|write)(long|word|byte), change to BST for flat 32 bit (or at least 24 bit) addressing, XMM interpreter

That would be a much nicer solution. We like clean and simple.
bst[noparse][[/noparse]c] already uses 32bit addressing internally, it's even already geared to emit 32 bit addresses for spin code.

I'm not entirely sure new bytecodes are required. Because the interpreter works effectively with variable length constants, making it clean might not be as difficult as all that.
The biggest issue is making a memory hole around $8000-$FFFF

I don't see how it will fit into one cog though, and as evidenced by the fact nobody has used or even finished the alternative interpreters put forward, it'd have to have some stunning pluses to offset the complete b0rkage required to actually use it.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
If you always do what you always did, you always get what you always got.

Bill Henning · 2009-11-03 00:29

Hmm... I like clean...

I don't think we need a hole from $8000-$FFFF... bear with me.

The "new" interpreter would run code out of XMM, right? No need for a hole there!

I'd probably keep the stack in the hub, possibly simple variables as well.

The first pass at this could be just moving the code to XMM, leaving everything else in the hub.

As far as the new interpreter being much bigger... move some less used functionality into the hub.

BradC said...

Bill Henning said...

add segment bases for data, code (and possibly stack) to each object

My gut feeling is that would be fairly complex to implement and end up being a bit of a nightmare. Check out x86 for the kind of gunk we want to avoid.

Bill Henning said...

More radical approach

New byte codes for longjump, longcall, (read|write)(long|word|byte), change to BST for flat 32 bit (or at least 24 bit) addressing, XMM interpreter

That would be a much nicer solution. We like clean and simple.
bst[noparse][[/noparse]c] already uses 32bit addressing internally, it's even already geared to emit 32 bit addresses for spin code.

I'm not entirely sure new bytecodes are required. Because the interpreter works effectively with variable length constants, making it clean might not be as difficult as all that.
The biggest issue is making a memory hole around $8000-$FFFF

I don't see how it will fit into one cog though, and as evidenced by the fact nobody has used or even finished the alternative interpreters put forward, it'd have to have some stunning pluses to offset the complete b0rkage required to actually use it.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com Please use mikronauts _at_ gmail _dot_ com to contact me off-forum, my PM is almost totally full
Morpheusdual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory IO board kit $89.95, both kits $189.95
Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
Las - Large model assembler for the Propeller Largos - a feature full nano operating system for the Propeller

jazzed · 2009-11-03 00:38

Hmm. Again it's a matter of "interest" [noparse]:)[/noparse]

So one waits until Chip delivers the next solution. It's very likely I will not even be alive then.

Cheers.
-Steve

BradC · 2009-11-03 00:41

Bill Henning said...
Hmm... I like clean...

I don't think we need a hole from $8000-$FFFF... bear with me.

The "new" interpreter would run code out of XMM, right? No need for a hole there!

I'd probably keep the stack in the hub, possibly simple variables as well.

The first pass at this could be just moving the code to XMM, leaving everything else in the hub.

As far as the new interpreter being much bigger... move some less used functionality into the hub.

The more I think about this, the less it makes sense. If you are moving to XMM then memory becomes less of a pressure. SPIN is brilliant because it uses some really cool tweaks to save code space. It was obviously designed with 1 cog for the interpreter and minimal program space usage to fit the constraints of the propeller.

If you really want to blow out so much code that you need to start extending the interpreter, then just use one of the XMM C implementations. You have XMM already, so the fact the code is much larger is really no issue.

Just mark me as one of those who remain unsure of the utility of XMM on the Propeller.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
If you always do what you always did, you always get what you always got.

Sapieha · 2009-11-03 10:41

Hi ALL.

Not sure if I understand what this thread write on.

BUT if it is NEW Prop II chip ..... Chip already have said .... IT will have LMM posiblites directly in COG. It will have indirect addresing regs to brack 64KB barrier.

That give new posiblites to write SPIN interpreter to take advantages of more memory.

Regards
ChJ

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Nothing is impossible, there are only different degrees of difficulty.
For every stupid question there is at least one intelligent answer.
Don't guess - ask instead.
If you don't ask you won't know.
If your gonna construct something, make it·as simple as·possible yet as versatile as posible.

Sapieha

Cluso99 · 2009-11-03 14:44

As it currently stands, XMM for an interpreter would be a one cog only solution because memory arbitration between multiple cogs running XMMspin would be unworkable. However, this is not a reason that it will not work for just the one cog. Remember, other micros effectively only have 1 cog anyway.

There is space in my version of the interpreter to add the extras required. Just unsure if it is worth the trouble.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:

· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)
· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm

Ways to Break the 32KB Spin Code Barrier

Comments