Spin Interpreter Needed
jazzed
Posts: 11,803
We need a Spin interpreter that can fetch instructions from external memory like I2C EEPROM, RAM, SDCARD, etc....
I have been able to demonstrate with the Propeller JVM with a small and simple on board cache that it is possible to use I2C EEPROM like this effectively (or some other address mapped device such as SRAM, DRAM, or Flash). With Bill's VMCOG, it would also be possible to use SDCARD as the physical store for Spin byte-codes.
I have a version of a Big Spin Interpreter that is very close to achieving this goal, but there are limits that I just can't seem to get around (my time is also a limit). Given that, I'm posting what I have in case someone is willing to take up the cause. The example runs, and it is easy to see the changes required to support > 64KB (HUB+ROM) code space, but it does not support external memory yet because there is no room to put in a byte-code fetcher.
Changes from Chip's original interpreter can be spotted because of code that looks like below. The original code is commented out.
What is necessary to finish is to have a byte-code fetcher. There are only a few places where this needs to be done (one example is shown below), and given enough savings in the COG by someone (the wall I can't seem to penetrate), it is possible to add a "spinner mailbox" for posting the address to another COG that will fetch the data. I'm hoping some huge re-write of the interpreter is not required to do this because of the amount of testing required (and chance of making bugs). There is some waste in the interpreter as is with the "mask" variables and some of the math routines, but not enough to make any of it easy.
Well, have a look. If the collective power of the forum can get together and solve the problem, the Propeller community can benefit in a big way. Thanks for reading.
Cheers,
--Steve
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
I have been able to demonstrate with the Propeller JVM with a small and simple on board cache that it is possible to use I2C EEPROM like this effectively (or some other address mapped device such as SRAM, DRAM, or Flash). With Bill's VMCOG, it would also be possible to use SDCARD as the physical store for Spin byte-codes.
I have a version of a Big Spin Interpreter that is very close to achieving this goal, but there are limits that I just can't seem to get around (my time is also a limit). Given that, I'm posting what I have in case someone is willing to take up the cause. The example runs, and it is easy to see the changes required to support > 64KB (HUB+ROM) code space, but it does not support external memory yet because there is no room to put in a byte-code fetcher.
Changes from Chip's original interpreter can be spotted because of code that looks like below. The original code is commented out.
{ rdword dcall,dcall 'set old dcall wrword pcurr,dbase 'set return pcurr add dbase,#2 'set call dbase '} '{ rdlong dcall,dcall 'set old dcall wrlong pcurr,dbase 'set return pcurr add dbase,#4 'set call dbase '}
What is necessary to finish is to have a byte-code fetcher. There are only a few places where this needs to be done (one example is shown below), and given enough savings in the COG by someone (the wall I can't seem to penetrate), it is possible to add a "spinner mailbox" for posting the address to another COG that will fetch the data. I'm hoping some huge re-write of the interpreter is not required to do this because of the amount of testing required (and chance of making bugs). There is some waste in the interpreter as is with the "mask" variables and some of the math routines, but not enough to make any of it easy.
loop mov x,#0 'reset x rdbyte op,pcurr 'get opcode add pcurr,#1
Well, have a look. If the collective power of the forum can get together and solve the problem, the Propeller community can benefit in a big way. Thanks for reading.
Cheers,
--Steve
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
spin
43K
Comments
You could continue to support the square root operator by implementing it in the EEPROM fetcher cog.
Dave
- remove WAITVID() ... no one will be able to write a video driver in spin until Prop2 at least
FOR NOW
- remove SQRT (someone else already suggested this)
- remove STRSIZE
- remove STRCOMP
Note ops below all take 3 args, can share arg code
- remove BYTEFILL
- remove WORDFILL
- remove LONGFILL
- remove BYTEMOVE
- remove WORDMOVE
- remove LONGMOVE
THEN
- Use free space to make a small "FCACHE" area
- Add a "fetch_byte_from_pcur_and_inc_pcur" subroutine
- Add a TINY LMM interpreter, only complex enough to support three arguments to a function
- Use LMM/FCACHE to put back in SQRT, STR*, *FILL, *MOVE
The ops selected for conversion to LMM/FCACHE will only take a small performance hit, but save more than enough memory!
Due to the limited scope of changes, only changed ops need thorough testing.
ALSO
Doing it this way allows limited in-line LMM code in Spin code [noparse]:)[/noparse]
EDIT:
Dave posted while I was writing this response off-line... I think it was David who I saw suggest removing SQRT for a small LMM loop before!
My additional suggestions should free up more longs [noparse]:)[/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Post Edited (Bill Henning) : 6/21/2010 5:42:59 PM GMT
Dave,
This is a fair idea. I suppose some flags can be set for multiplexing features in the fetcher cog command mailbox without giving up too much address space. A separate COG to abstract away the hardware to me is ideal and is worth losing a COG over. Off-loading some code to the separate cog would allow some simple changes hopefully. The problem is in pulling apart the math spaghetti code to allow such a feature [noparse]:)[/noparse] All of it could be moved easily, but that would impact performance.
Thanks,
--Steve
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
You're right. Lots of things can be removed from the Spin definition, but I fear that would not serve the community very well. Longer term some redefinition would be useful especially to allow things like in-line LMM/PASM.
If there was a way through a third-party compiler to specify SpinLight or whatever, then that might work, but I doubt BradC, mpark or the next guy/gal to do a compiler would be up with that. I have no idea how one would add LMM stuff at this point.
IMHO, a Spin-clone that works by fetching from a separate COG would be better than redefining the language near term.
Thanks for giving up some of your time,
--Steve
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
I'd also like to point out that bigger chunks of the Spin interpreter could be temporarily swapped out if a larger FCACHE area was needed.· After the large FCACHE is no longer needed the portion of the Spin interpreter that was swapped out would be copied back into the cog.
Dave
I think you missed the part about adding them back in as LMM/FCACHE'd code [noparse]:)[/noparse]
User level there would be no difference, except a tiny slowdown on the startup of those spin bytecodes!
The Spin byte codes would be the same, except maybe a "LMM" intro byte.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
$3C could be it! for running in-line code
Or, we could hijack "cognew(0,0)" to mean "go to next long boundry and exec LMM code"
As for the replacements, same byte codes, except they would now run FCACHE'd LMM code
You are very right about swapping out part of the interpreter... but I think removing the str* *fill and *move would make enough room for LMM/FCACHE, and as you noticed, they typically run a tight loop - perfect for FCACHE - which is why I selected them. Minimal execution time impact!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Seems like j3 and j4 case, lookup, lookdown and friends are more swappable
and minimum code change friendly.
I'm worried that any swapping will cause Spin to be not cognew-able though.
Thanks,
--Steve
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Thanks,
--Steve
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
The fetcher cog can be lock()'d, however multiple cogs contending for the fetcher may slow things down too much - in which case bigspin would be for the "main" large business logic app.
If several big apps are needed, it may make more sense to make bigspin multi-task several spin contexts, switching every X byte codes.
Frankly, I would personally be THRILLED to just run ONE bigspin against VMCOG!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
One thing I did which was very useful was I created an instruction class of the form %00_xx_xxxx which inserts the xx_xxxx into the ISNTR field of a synthetic PASM instruction, forces generic source and destination, and pops the arguments onto the stack, executes the synthetic instruction wc wz wr, and pushes the result and saves the flags for retrieval. This allows a relatively small amount of cog code to implement all of the pasm native math instructions in stack machine form. This frees up beaucoup cog RAM for the other primitives and EEPROM or FlexMem drivers.
(Oh, and yes while it's a whole different way of optimizing I was inspired by LMM. Thanks, Bill.)
Cheers,
--Steve
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
Remove SQRT. You may also be able to remove some other functions. Use this space to implement what you need and get that working.
Hopefully by then I can help and get my faster Spin Interpreter working. But remember, my version uses a hub table for decoding vectors. I can always regress my code to where I had it working but it is offline so I have to search my backups. The way I wrote the code was to first place an overlay handler into the code. That permitted me to free enough space to place the decode vectors in place (IIRC saved about 20+ instructions in speed), and remove the overlay again. Each phase was fully tested. Then I started speeding up each section of the code, verifying as I went. I included my debugger to verify results. I fed all variables into the section I sped up to verify it worked.
FWIW I saved a huge amount of time in the maths section by utilising some of the saved space. Chip also found a faster way for one of the maths functions (divide or sqrt???).
The last thing I did was make some changes to a group of functions by utilising the new space I found to unravel some complex code, and here is where I introduced a bug that I never really looked for. So it is only a matter of regression to the working version.
Now, I certainly have enough space to implement LMM or overlays for more code functions. So I suggest you get them working and hopefully I can then help in getting the whole thing going. Otherwise I will dig out what I have done for you.
The interpreter has $3C free. There are other·sub-codes that are not used. Use $3C for now anyway.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
· Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz
Ok, I'm keen to see 'Big Spin' too. And I think a fantastic use for such a program would be the HTML display that jazzed has proposed in another very active thread. I just posted a long reply over there.
And, just as a general vibe of the thing, I think this is entirely possible without having to wait for the propII.
But in essence - add a serial ram chip to a propeller, free up the entire hub ram for video buffer, drop a couple of instructions out of the spin interpreter cog to free up space for a serial ram driver, then instead of loading instructions from hub to interpret, load them from a serial ram chip, and re-add those removed spin instructions using some (possibly slow, inefficent code) from the serial ram chip. Later, to speed things up, think about clever cache code. But keep it simple to begin with - just load bytes one at a time from external serial ram instead of hub ram for the spin interpreter. Is this possible? Could the hardware be as simple as one 8 pin memory chip? Or adding as many of these 70c SPI 8kilobyte chips as needed? www.futurlec.com/Memory/23K640.shtml
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.smarthome.viviti.com/propeller
Post Edited (Dr_Acula) : 6/22/2010 9:28:45 AM GMT
Right so far [noparse]:)[/noparse]
Sorry, not quite.
Adding just a serial ram chip would be quite slow for fetching byte codes directly, as it would take about 8us per byte with a small/simple implementation of SPI, and about 3.2us with a 10mbps timer based one - which would use up a timer from Spin, and need much more memory in the cog.
What we were talking about was freeing enough space to add mailbox handling code, so it could talk to VMCOG, or enough space for Jazzed's caching eeprom code. Later there might be versions targeting particular memory designs directly.
The 8kb chips are a bad deal on $/byte basis, the $1.50 32KB ones cost about half on a per byte basis.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Yes, that is a bit slow. What is that, 125khz? Yikes!
Ok, vmcog is the smart long term answer. Short term, I wonder if I have enough hardware sitting in front of me right now with a 512k ram chip that can access at about 3.8Mhz per byte and has a driver that is about 20 longs of pasm?
SpinXMM certainly makes for interesting reading. Is the byte read code this bit?
and could you just call a subroutine that reads from external ram instead? I'm sure it isn't that easy though. Got to find all the pcurr for instance.
There probably is a catch somewhere. How far can calls go - 32k, 64k, something else?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.smarthome.viviti.com/propeller
The full 512KB could be supported for code pretty easy, just find all references to pcur, and modify as needed.
I think a great first step would be a bigspin that just supported a large binary image, used for code/constants/initial data values, and only executed code from xmm, using the hub for stack and variables.
The reason I suggest that as a first step is that then (for now) RD{BYTE|WORD|LONG} WR{BYTE|WORD|LONG} as well as VAR and stack references don't have to change much (except for initializing VAR sections from data in the code image)
After that worked, and people got used to the possibility of very large spin programs, VAR could be moved to xmm-only, allowing for HUGE arrays. Frankly, it may be worthwhile to ONLY move arrays, and leave simple global variables in the hub.
DAT sections will be a bit of a pain in xmm until cognew etc get modified to only expect them in xmm, copying the code referenced by cognew to the hub before starting it.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Jonathan
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
lonesock
Piranha are people too.
(geek humor escalation => "There are 100 types of people: those who understand binary, and those who understand bit masks")
- SQRT (rarely if ever used)
- STR* (tight loops, perfect for FCACHE)
- *FILL (tight loops, perfect for FCACHE)
- *COPY (tight loops, perfect for FCACHE)
Tossing all of the above out should leave plenty of room for a small LMM interpreter, a small FCACHE area, and VMCOG interfacing code [noparse]:)[/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
REALLY NICE WORK STEVE!
I can see I need to play with this once I am back from UPEW...
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
· Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz