Shop OBEX P1 Docs P2 Docs Learn Events
Spin32: If we had a 32 bit Spin compiler/interpreter how would it work? — Parallax Forums

Spin32: If we had a 32 bit Spin compiler/interpreter how would it work?

jazzedjazzed Posts: 11,803
edited 2010-02-06 04:43 in Propeller 1
I've been toying with 32 bit words for the Spin interpreter key variables (dcurr, pbase, vbase, etc..) and have something that works right so far within the HUB RAM. Another question beyond the subject is: can an interface to fast XMM memory be squeezed into the interpreter? TBD

1. Is it possible to just copy everything from HUB RAM to XMM and flip a switch?
2. Given that the ROM occupies $8000-$FFFF, and stack grows up, a big hack would be needed for stack.
3. Is it possible to just make the stack always start above $FFFF to push the stack over the "ROM Hole" code?
4. Maybe a big dummy array would serve to fix the "ROM Hole" instead of asking for a special compiler mode?

I understand that Homespun allows for images > 32K. Is there any limit at all in Homespun today that would prevent a "Spin32" from working?

TIA
--Steve

Comments

  • MagIO2MagIO2 Posts: 2,243
    edited 2010-02-04 09:22
    Currently you don't have real access to the stack because there is no support for changing stack-pointers. What SPIN is doing with it is it's own secret. There is no push or pop. If you want to keep it as a secret, you can have completely different stack concepts. For example each SPIN-COG has it's own stack starting at $0000_0000 and it can even grow into ROM address-space if these addresses are mapped into some real XMM memory.

    Just an idea for a 32 bit SPIN memory model:
    We have 8 COGs, so let's say from HUB-RAM each COG gets 3kB. A 1kB page for code and 2 x 1kB page for data (one for source, one for destination addresses, just for the case these are to far apart from each other?). The 32-bit address can then easily be mapped as 10 LSBits show you the address in the page and all MSB show you where to find that in external memory. If a operation accesses data outside of the current page it has to wait until a external memory driver has swapped the page. In the other case it can run at full-speed. And 1kB of SPIN code can really contain much.

    This makes 8x3kB = 24kB. The rest of the HUB-RAM (8kB) can be used for COG to COG communication, (and maybe for stack?)
  • Cluso99Cluso99 Posts: 18,069
    edited 2010-02-04 12:17
    I have answered (expanded)·this on the spin interpreter thread http://forums.parallax.com/forums/post.aspx?f=25&r=r&m=273607

    The main problem I see is that of XMM (SRAM) bus contention between multiple cogs if the Interpreter were to use XMM for other than loading overlays of code from (or cache). Individual byte/word/long access would be monumentally slow for the contention handler given the way the interpreter works.

    @jazzed:
    Changing to 32 bits for dcurr, pbase, vbase should be fairly simple as IIRC it is actually 32 bit within the cog now. However, when these are pushed and pulled from the stack, a long would need to be pushed & popped.

    1. Yes, but only for·1 modified interpreter. This could be done with only rdxxxx & wrxxxx instructions changed to access XMM. Due to contention, only 1 interpreter could be run or else contention must be in place using a lock or something similar.
    2. I don't agree. It just depends on where to place the stack to ensure it has enough space.
    3. I believe so, but see my answer to MagIO below.
    4. Yes.

    @MagIO:
    There is really a push & pop. Don't forget, you need considerable space for the video buffer in hub.

    IMHO, a mechanism for copying code blocks from XMM maybe the best, served up by a single cog. The other alternative maybe to fetch a block of code from XMM (using contention locks) and then execute it from hub, sort of like a cache, but only acting on the code, not stack or variables.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Links to other interesting threads:

    · Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
    · Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
    · Prop Tools under Development or Completed (Index)
    · Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
    · Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
    My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz
  • MagIO2MagIO2 Posts: 2,243
    edited 2010-02-04 13:14
    You don't need locks ... every COG has a communication area for requesting new pages. The one COG which drives the external RAM is simply doing the same as the HUB ... it's looping over the communication areas of the COGs to see if there is a request and processes it. When it's done it signals the waiting COG and proceedes with the next. Another propeller in the propeller so to say ;o)

    Are push and pop supported by SPIN language or do you talk about SPIN bytecode?

    Real video is not where a propeller is really good for. So, for a simple text output some kB can still be used. Maybe the page size can be held flexible, so if you really need graphics you can reduce it to 512 byte. Each COG used with PASM code or old SPIN will give you extra memory available for other things.
    For real good graphics external RAM is needed anyway - or maybe a second propeller or other support (TFT with controller + RAM / FPGA / XMOS)
  • BradCBradC Posts: 2,601
    edited 2010-02-04 13:21
    MagIO2 said...
    There is no push or pop.

    I'm not sure that means what you think it means. There are both push and pop. There is also a pop in the context of "discard x items off the stack".

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Life may be "too short", but it's the longest thing we ever do.
  • MagIO2MagIO2 Posts: 2,243
    edited 2010-02-04 13:43
    What I mean is that there are no push and pop instructions in SPIN language (see SPIN language reference). There are bytecodes for push and pop, but these are not usable by SPIN programmers, are they? Which means that you can change the routines behind the bytecodes to map the stack to RAM wherever you want. That won't harm any code that's been written in original SPIN language as original SPIN hides how it uses the stack - maybe it means harm for some code that manipulates the stack and it's pointers directly - dunno if such code exists somewhere.
  • jazzedjazzed Posts: 11,803
    edited 2010-02-04 17:15
    @Cluso99, mentions (in another thread for some reason) about bus contention with XMM. Yes, since more than one instruction is required to fetch from XMM with the read/write done in the same COG as the interpreter, this would be a problem.

    Thing is, there will "never" be enough room in one COG for the interpreter and the XMM fetch code.

    So the solution is to have a separate COG do the read/write. Not only does it solve most contention, but it would also allow Spin32 to easily run on any XMM hardware solution provided a driver exists. It seems locks would still be required to manage ownership of the IO COG.

    Bill's VMCOG interface when complete would serve the solution nicely I'm sure. Using 2 COGs for 32 bit addressable Spin -vs- 1 COG is a fair trade for big programs assuming a working compiler is available.

    Post Edited (jazzed) : 2/4/2010 7:26:39 PM GMT
  • BradCBradC Posts: 2,601
    edited 2010-02-04 23:23
    MagIO2 said...
    What I mean is that there are no push and pop instructions in SPIN language (see SPIN language reference). There are bytecodes for push and pop, but these are not usable by SPIN programmers, are they? Which means that you can change the routines behind the bytecodes to map the stack to RAM wherever you want. That won't harm any code that's been written in original SPIN language as original SPIN hides how it uses the stack - maybe it means harm for some code that manipulates the stack and it's pointers directly - dunno if such code exists somewhere.

    Ok, I see what you mean about "user accessible". I guess exposing the stack to the user takes away a lot of the simplicity and safety of spin.

    If you are a bit of an experimenter, you can experiment directly with the spin bytecodes using bstc. It implements a bytecode() primitive that directly injects the code into the method.
    Just use comma delimited numbers to squirt in whatever you like.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Life may be "too short", but it's the longest thing we ever do.
  • BradCBradC Posts: 2,601
    edited 2010-02-04 23:30
    jazzed said...

    Bill's VMCOG interface when complete would serve the solution nicely I'm sure. Using 2 COGs for 32 bit addressable Spin -vs- 1 COG is a fair trade for big programs assuming a working compiler is available.

    The only obvious limitations I see are really in the headers. Both .binary and object. If you blew them out to 32 bits then you could pretty much place anything wherever you liked.

    There is nothing preventing you from putting PBASE = $10000 VBASE = PBASE+CODESIZE & DBASE=$10+NEW_INTERPRETER_SIZE

    Why not ask Chip what he is intending to do for the upcoming device? I suspect it's a problem that has already been thought through, and it would be logical to do it in such a fashion as it does not require completely re-architecting the compiler.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Life may be "too short", but it's the longest thing we ever do.
  • jazzedjazzed Posts: 11,803
    edited 2010-02-05 00:40
    I've already had the PNUT interpreter running with 32 bit pbase, etc... variables for a few days. That was easy. The rest is hard given how packed the interpreter is.

    I thought about waiting for Parallax several times over the last few years·[noparse]:smilewinkgrin:[/noparse] Maybe Chip or someone else not buried in PropII development will speak up. Trying to make something compatible with what Parallax is doing would offer everyone a head start.

    Running everything offset from $10000 is not a bad start. I guess the problem with PBASE = $10000, etc... is that hardware device drivers need to have buffers at $0-$7FFF for fast memory access. That could be adjusted if control variable addresses were pointers .... As in other times, the preprocessor conditionals using SPIN32 or whatever could serve to differentiate versions for private library code.

    I've managed to save 6 longs in the original PNUT interpreter by replacing MULT and SQRT PASM thanks to CessnaPilot and Lonesock. Looking for other savings. I'm not sure what it will take to finish. An LMM type spin interpreter may be a better approach. Interesting challenge [noparse]:smile:[/noparse]
  • jazzedjazzed Posts: 11,803
    edited 2010-02-05 04:36
    BradC,

    Since DAT variables/code are "global", wouldn't they always be referenced by Spin in their original address regardless of pbase, dbase, etc... ?
  • BradCBradC Posts: 2,601
    edited 2010-02-05 06:42
    jazzed said...
    BradC,

    Since DAT variables/code are "global", wouldn't they always be referenced by Spin in their original address regardless of pbase, dbase, etc... ?

    Nope. DAT variables are only "global" to the object. All addresses are relative to PBASE. The hacky @@@ operator was invented to return the real hub address.
    Using @label in spin when referencing an object will return the hub address of whatever it is you are asking for, but it's a runtime thing in the interpreter.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Life may be "too short", but it's the longest thing we ever do.
  • Cluso99Cluso99 Posts: 18,069
    edited 2010-02-05 07:27
    jazzed: I have plenty of room in my version of the interpreter. You really should take a look. If you think about giving it a go, let me know and I will find the bug that I introduced. Probably I will just go back 2 steps. The bug was why I build my debugger. A reason for everything hey!!!

    Initially I thought my version of the Interpreter would not fit in cog, so I built my overlay loader and used that to place some of the less used routines in hub. Once I had the decode working I found I had plenty of space (plus a big speed improvement) and began improving the other sections of the code. The hub decoding takes 256 longs in hub, but would be common for all copies of my interpreter. This allowed me to simplify the arithmetic code. I then still had enough room to place the overlays back into the interpreter.

    IIRC there is still enough room for direct XMM access but of course would require locks. I believe it would be faster to do the access to XMM directly in cog even with locks. Otherwise, you are going via another cog which accesses sram (or wherever) to hub to cog and visa versa. Direct bypasses the hub.

    If Bill's VMCOG works then perhaps it ma be no slower that direct access. We must wait and see.

    So, what I am saying is, there is room in the interpreter and that will also gain a runtime improvement of 20-25% which may then be traded for XMM access.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Links to other interesting threads:

    · Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
    · Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
    · Prop Tools under Development or Completed (Index)
    · Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
    · Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
    My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz
  • jazzedjazzed Posts: 11,803
    edited 2010-02-05 18:24
    Can you post the version you've mentioned here? All I've seen is the one with the SQR hack.
    It would be useful if you summarize your overlay approach. I may adopt this if it's not a nightmare.
  • Cluso99Cluso99 Posts: 18,069
    edited 2010-02-05 19:49
    There is an example of the overlay loader in the thread (see the tools link in my signature).

    My faster spin interpreter also has a thread in the tools link.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Links to other interesting threads:

    · Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
    · Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
    · Prop Tools under Development or Completed (Index)
    · Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
    · Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
    My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz
  • Bill HenningBill Henning Posts: 6,445
    edited 2010-02-05 20:10
    Something occured to me after I posted on the ZPU thread.

    VMCOG could be used to provide an on-propeller (ie no PC needed) Spin debugger!

    It would only work for "pure" Spin code, but would it not be cool for beginners to the prop? Combined with Sphinx?

    In a related thought... the proposed XSpin virtual machine ought to be able to speed up Sphinx immensely, as it would get rid of temp files etc.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com 5.0" VGA LCD in stock!
    Morpheus dual Prop SBC w/ 512KB kit $119.95, Mem+2MB memory/IO kit $89.95, both kits $189.95 SerPlug $9.95
    Propteus and Proteus for Propeller prototyping 6.250MHz custom Crystals run Propellers at 100MHz
    Las - Large model assembler Largos - upcoming nano operating system
  • jazzedjazzed Posts: 11,803
    edited 2010-02-05 20:18
    ClusoInterpreter_260C_007F.spin? About 30 longs free apparently. So, what's failing in it?
  • jazzedjazzed Posts: 11,803
    edited 2010-02-05 20:20
    Bill Henning said...
    Something occured to me after I posted on the ZPU thread.

    VMCOG could be used to provide an on-propeller (ie no PC needed) Spin debugger!

    It would only work for "pure" Spin code, but would it not be cool for beginners to the prop? Combined with Sphinx?
    Yes, as long as Sphinx source level debugging is available ... but that's another thread.
  • Cluso99Cluso99 Posts: 18,069
    edited 2010-02-06 04:43
    Bill: My debugger uses a serial terminal for debugging and runs in spin and uses zero footprint. It does both pasm & spin (in the one program so it can trace the spin execution in pasm if desired by a comand). It is easy to redirect the serial to screen/keyboard ( and 1-pin versions would be fine).

    jazzed: I will answer shortly. Yes I spawned a new version and added each mod seperately, so it appears that v260C_007F is the latest and indeed may be fully functional (just checking). My comments indicate it·SHOULD be a fully operational version.·I believe the regression worked. You will of course need to test·it. I had done a lot of testing of the individual sections as I optimised them, but recall how I was unravelling around the j6??? sections to see if I could speed it up with the newfound·free space.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Links to other interesting threads:

    · Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
    · Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
    · Prop Tools under Development or Completed (Index)
    · Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
    · Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
    My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz

    Post Edited (Cluso99) : 2/6/2010 4:51:09 AM GMT
Sign In or Register to comment.