 |
|
 |
| Parallax Forums > Public Forums > Propeller Chip > Ways to Break the 32KB Spin Code Barrier | Forum Quick Jump
|
|  jazzed _oOo_(^^)_oOo_
        Date Joined Jan 2008 Total Posts : 2114 | Posted 11/1/2009 10:15 PM (GMT -8) |   | This post is in response to James' query in another topic. Answering it here to avoid polluting the other thread.
JMH said... Please clarify about using Spin stubs and 32kb limit. There is more than one way to do it.
First consider the easiest way to use more than 32KB for code ... not exactly a direct answer to "stubs", but similar and the answer some would prefer. That would be to load PASM services into all cogs (except cog 0) that can be used by small Spin methods ... more or less what can be done today as described with one or more methods in Post Boot PASM COG Loader?. Secondly, one can use a virtual memory method (paging, but not a full swap implementation). The Spin interpreter can be intercepted :) I'm doing this in Spud. Again this requires Spin stubs to activate code, but in this case a code cache can be used where spin byte code is stored on SD card and pulled in to the cache as required, executed, and maintained there as required based on statistical usage.
Thirdly and the fastest way, one could use multiple Propellers in a RPC methodology. This would require Spin stubs on the host to activate a COG based management/transport layer for sending remote procedure call requests (RPCs) to one of a cluster of Propellers as discussed in the OctoProp Thread. That method is on ice for the time being.
Also for multiple devices, it is possible to just build different images for each Propeller and use a predetermined protocol for predetermined method invocation/results, but this is less attractive for me at least mainly because it is not possible to throw "all resources" at solving "any one problem" in a generic and easily replaceable (that is without a difficult reprogramming cycle) method.
So those are ways to break the 32KB Spin code barrier.
I have most of this worked out. The problem is getting enough interest in seeing it finished. Some of the work is pretty tough and requires interest to be finished. The usage model is difficult to grasp for some here, but anyone who has done distributed parallel computing or studied computing systems architecture should understand the approaches discussed.
There has been a ton of external memory interest this year. It seems to me that a way to run bigger applications at native rates in the case of distributed parallel computing would be more interesting than waiting to load instructions at a time or even in blocks from external devices. Having a way to use 32KB+ Spin programs even if it is slower because of "paging" is attractive too ... this is what your PC is doing when it runs out of memory.
The question is what is worth to break the 32KB Spin code barrier? Maybe the problems being addressed by the community are too small to consider the extra effort?
Well, then perhaps it's not worth the bother. There are other "solid" shiny things on the horizon.
Added: It seems that mpark added the ability for larger Spin programs to be produced for Catalina at some point. I'm not exactly sure what utility that brings, but am pretty sure it is different from what I described except for possibly the first paragraph methodology.Post Edited (jazzed) : 11/2/2009 6:42:01 AM GMT | | Back to Top | | |
 |  Ale Registered Member

       Date Joined May 2007 Total Posts : 1267 | Posted 11/2/2009 6:30 AM (GMT -8) |   | | | |
  |  Ale Registered Member

       Date Joined May 2007 Total Posts : 1267 | Posted 11/2/2009 7:49 AM (GMT -8) |   | | | |
   |  Bill Henning Registered Member

       Date Joined Sep 2006 Total Posts : 962 | Posted 11/2/2009 8:57 AM (GMT -8) |   | Interesting idea.
Personally, I wonder if it would not be simpler to do a segmented memory model for spin.
Yes, I know, flat memory space is nicer, and segmentation is evil.
However it segmentation can also be easier to implement...
(all approaches would be significantly simplified if only code space was expanded)
Simple approach: -------------------------
Separate code, data and stack segments - although the stack would be relatively small, and could be part of the data segment.
Allow 64KB space for both - add XMM to the mix, and we could have 4x as much space for Spin code/data. Make device drivers loadable, and we have even more :)
For extra simplicity, data/stack could stay in the hub, and only the bytecodes get moved to XMM.
More complex approach: (similar to Jazzed's stubs) ---------------------------------------------------------------------------
add segment bases for data, code (and possibly stack) to each object
change BST so that it accesses each method by code_seg:offset instead of just offset
no need for stubs, would allow as much code and data as you have XMM, only real limit is each object would be limited to ~32k on Prop1
More radical approach --------------------------------
New byte codes for longjump, longcall, (read|write)(long|word|byte), change to BST for flat 32 bit (or at least 24 bit) addressing, XMM interpreter
jazzed said...Ale said... The spin interpreter addresses are limited to 15 bits. With bit 14 being sometimes just sign so you end up with 15 bits anyways. The number of address bits used by the interpreter is irrelevant for the "stub solutions" I've described. A "stub solution" means that most methods compiled in the main binary have nothing more than a signature and a method encoding. The bodies of the stubbed methods are kept in a separate location as either a file or a library running on one or more other Propellers. Ale said... It would need a new interpreter. Yes, the "stub solutions" are two COG minimum. Some modifications to the interpreter are necessary for sharing the PC, stack pointer, and object context. Virtual Memory management is needed running in a second COG. In one case, the VMM will manage the pages and calls to the "library" method body. In the cluster case, the VMM will manage dispatch of RPC. Ale said... It is not only a matter of putting more code in it. More memory (external) has to come with a lmm-enabled spin compiler/interpreter. This is true in the case of paging from an external device for best performance, but the "library" code could just as easily be stored on SD card. www.mikronauts.com Please use mikronauts _at_ gmail _dot_ com to contact me off-forum, my PM is almost totally full Morpheusdual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory IO board kit $89.95, both kits $189.95 Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz Las - Large model assembler for the Propeller Largos - a feature full nano operating system for the Propeller | | Back to Top | | |
 |  jazzed _oOo_(^^)_oOo_
        Date Joined Jan 2008 Total Posts : 2114 | Posted 11/2/2009 10:49 AM (GMT -8) |   | Bill, you get it. I think BradC does too. Ale was probably just a little distracted by the bit count.
Some of the things mentioned like segmenting would require an adjustment to the stack frame which takes two (three?) longs today for book-keeping. Making the stack frame bigger might reduce interpreter size a little because of the two embedded bits in the object offset used as flags. The current single COG interpreter is very tight - there is one wasted long though:) Even going to a 2 COG solution (not the best thing obviously) is not so bad if the result is worth it though.
Being able to make better use of today's hardware is to allow higher code density is one immediate goal for me whether it is big EEPROM, Flash, XMM, SD card, or even Propeller clusters, etc... that provides the solution.
Another goal which could make Spin less foreign or obscure to others (and may benefit Parallax) would be to make Spin a PC-able language like Java, C, VB, whatever (some disagree ... oh well). Obviously to me at least for making it PC-able demands a bigger memory range one way or another. | | Back to Top | | |
 |  BradC Gronk

       Date Joined Jul 2007 Total Posts : 1577 | Posted 11/2/2009 4:22 PM (GMT -8) |   | Bill Henning said...
More complex approach: (similar to Jazzed's stubs) ---------------------------------------------------------------------------
add segment bases for data, code (and possibly stack) to each object
change BST so that it accesses each method by code_seg:offset instead of just offset
My gut feeling is that would be fairly complex to implement and end up being a bit of a nightmare. Check out x86 for the kind of gunk we want to avoid.
Bill Henning said...
More radical approach --------------------------------
New byte codes for longjump, longcall, (read|write)(long|word|byte), change to BST for flat 32 bit (or at least 24 bit) addressing, XMM interpreter
That would be a much nicer solution. We like clean and simple. bst[c] already uses 32bit addressing internally, it's even already geared to emit 32 bit addresses for spin code.
I'm not entirely sure new bytecodes are required. Because the interpreter works effectively with variable length constants, making it clean might not be as difficult as all that. The biggest issue is making a memory hole around $8000-$FFFF
I don't see how it will fit into one cog though, and as evidenced by the fact nobody has used or even finished the alternative interpreters put forward, it'd have to have some stunning pluses to offset the complete b0rkage required to actually use it. If you always do what you always did, you always get what you always got. | | Back to Top | | |
 |  Bill Henning Registered Member

       Date Joined Sep 2006 Total Posts : 962 | Posted 11/2/2009 4:29 PM (GMT -8) |   | Hmm... I like clean...
I don't think we need a hole from $8000-$FFFF... bear with me.
The "new" interpreter would run code out of XMM, right? No need for a hole there!
I'd probably keep the stack in the hub, possibly simple variables as well.
The first pass at this could be just moving the code to XMM, leaving everything else in the hub.
As far as the new interpreter being much bigger... move some less used functionality into the hub.
BradC said...Bill Henning said...
add segment bases for data, code (and possibly stack) to each object My gut feeling is that would be fairly complex to implement and end up being a bit of a nightmare. Check out x86 for the kind of gunk we want to avoid. Bill Henning said...
More radical approach --------------------------------
New byte codes for longjump, longcall, (read|write)(long|word|byte), change to BST for flat 32 bit (or at least 24 bit) addressing, XMM interpreter
That would be a much nicer solution. We like clean and simple. bst[c] already uses 32bit addressing internally, it's even already geared to emit 32 bit addresses for spin code. I'm not entirely sure new bytecodes are required. Because the interpreter works effectively with variable length constants, making it clean might not be as difficult as all that. The biggest issue is making a memory hole around $8000-$FFFF I don't see how it will fit into one cog though, and as evidenced by the fact nobody has used or even finished the alternative interpreters put forward, it'd have to have some stunning pluses to offset the complete b0rkage required to actually use it. www.mikronauts.com Please use mikronauts _at_ gmail _dot_ com to contact me off-forum, my PM is almost totally full Morpheusdual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory IO board kit $89.95, both kits $189.95 Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz Las - Large model assembler for the Propeller Largos - a feature full nano operating system for the Propeller | | Back to Top | | |
 |  jazzed _oOo_(^^)_oOo_
        Date Joined Jan 2008 Total Posts : 2114 | Posted 11/2/2009 4:38 PM (GMT -8) |   | Hmm. Again it's a matter of "interest" :)
So one waits until Chip delivers the next solution. It's very likely I will not even be alive then.
Cheers. -Steve | | Back to Top | | |
   | Forum Information | Currently it is Saturday, November 21, 2009 11:33 AM (GMT -8) There are a total of 393,856 posts in 55,536 threads. In the last 3 days there were 84 new threads and 707 reply posts. View Active Threads
| | Who's Online | This forum has 17692 registered members. Please welcome our newest member, old guy. 62 Guest(s), 15 Registered Member(s) are currently online. Details Siri, keith_kw, Mike Green, Bob Lawrence (VE1RLL), Alsowolfman, Dogg, dMajo, hover1, ErNa, Harley, Harprit, Beau Schwabe (Parallax), Tubular, Leon, MicroDirk |
Forum powered by dotNetBB v2.42EC SP2.02 dotNetBB © 2000-2009 |
|
|