Embedded support for LMM within SPIN (discussion & proposals)

Cluso99 · 2008-12-21 10:25

Support for some type of LMM within the SPIN Interpreter - discussion and ideas

The following is copied from the thread :
IDE for linux / Mac (and Windows) users - Ver 0.14 - The "Yet another" release··(page 19)
http://forums.parallax.com/showthread.php?p=755835

BradC said...
Its interesting you bring this up. I've been playing with and pondering the addition of a "BAS" block with a PBASIC syntax-ish style. I had envisioned a helper cog, perhaps in an LMM style that dropped pre-defined blocks together, something along the lines of SX/B to enable a "BASIC" on the Propeller. So I don't believe we are way out of line in our thinking.

Mike Green said...
Here's a previous thread where I mentioned a native Prop compiler-compiler and the beginnings of a Basic compiler. I really have not worked on it much since this was posted other than to look at the possibility of compiling into Spin byte codes rather than LMM object code.

Cluso99 said...
...there is room in my ClusoInterpreter (not finally debugged...) to possibly add LMM. Basically I have LMM (basic style) currently in my interpreter (as a zero footprint debugger). Anyway ... we would need the compiler to be modified for this and there are two versions out there that the writers could modify.

One caveat - LMM variables have to be in hub unless the ClusoInterpreter is specifically compiled with overlays, or with features removed.

BradC said...
I like the idea of adding LMM support to the compiler.·... I just need to sit down with someone and dedicate some time to understanding precisely what it is that you have in mind and how it needs to be put together...

Ale said...
Maybe just adding some macro support can help with the LMM support.

I'm writting a simple user interface, config panels, with 4 or 5 "controls". All drawing is made native in pasm, user inut and responses are done in LMM: relative jump, constant loading, subroutine call to LMM code, just 5 routines. It gets... complicated I mean there are loads of:...

call··#krnl_loadcnt
long·@k4_cnt_FONT+(krnl_r0<<26)+16

call·#krnl_call
long··@lmm_config+16

and other niceties (with the scheme I proposed some time ago). but, it works . Some macros could easily help and make the code a bit more flexible without fixing the compiler to a specific lmm type. (I thought for my compiler to do some sort of find-and-replace method, just for simplicity).

Cluso99 said...
@Ale:

The LMM (interpreter) that I implemented in the debugger uses 5 longs: 4 for the loop and execution and 1 for data. However, the actual LMM code is more complex because it has to take care of jumps.

A call to an LMM routine would make sense for the interpreter to implement. The completion of the LMM would then just be another fixed call to return to the interpreter. I could see this as being relatively easy to implement from both the interpreter and the compilers perspectives. The main complexity is the fact that mov instructions will not work as expected because variables are not in cog memory. This means it is not true LMM, but a variant or subset. However, I could see some real speed advantages of mixing spin and LMM.

Let the discussion begin...

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Prop Tools under Development or Completed (Index)
http://forums.parallax.com/showthread.php?p=753439

cruising][noparse][[/noparse]url=http://www.bluemagic.biz]cruising[noparse][[/noparse]/url][/url]

This is a [noparse][[/noparse]b]bold[noparse][[/noparse]/b] test.

Phil Pilgrim (PhiPi) · 2008-12-21 19:21

There's another option: port the Spin interpreter to LMM code. That would free up cog memory for variables, allowing MOVs to operate on cog data. It would also permit a vast expansion of the interpreter's capabilities. Spin programs would run at least 4X slower, though.

-Phil

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Just a few PropSTICK Kit bare PCBs left!

Ale · 2008-12-21 22:01

Phil... that's an excellent idea... I do not think it will make it considerably slower, but benchmarks will say!

Cluso99 · 2008-12-22 02:40

Overlays would be faster than LMM for the spin interpreter. I have overlays working which hit the 16 cycle sweetspot, which LMM cannot - unless you run LMM backwards which is in itself is an interesting idea. However, if the compiler reordered the LMM backwards it would make writing code easier.

There are certain routines in the interpreter that I am sure are used less often, such as multiply, divide, square root. This would free quite a bit of code space for variables to be stored within the cog. I even think this would be good for spin.

If spin were in LMM, it would run around 4 times slower. The reason for this is that the LMM code could be written in reverse, so the sweetspot could be hit. If spin were in LMM, it would run around 8 times slower. This is because the sweetspot cannot be hit (see below).· See Phil's and my comments below. We can hit the sweetspot for LMM (in reverse) for dual fetching yielding 4 times slower than pasm. The decoding would probably be similar in timing as I am using a lookup table held in hub. However, some routines could be rolled out to deliver speed gains, but at the expense of hub memory.

My ClusoInterpreter is showing a likely performance increase of 25-30%, so we have that gain to begin with. The mathematic routines are now very tight and much faster than previously, so they will be· 4 · 8 ··4 times slower than my version. I get a huge gain here.

A mix of pasm, overlays and LMM for the interpreter could be a good mix. Sounds like an interesting challenge

Postedit: I was just checking the execution loop. It will be around 8 times slower for LMM as it is not possible to catch the hub in the sweetspot, so it will be 32 cycles for each LMM instruction. The loop is: The simple loop for forward LMM takes 32 cycles and is shown below:

loop  rdlong  lmm,ptr  ' fetch LMM instruction from hub
      add     ptr,#4
lmm   nop              ' <-- LMM instruction executes here
      jmp     loop

Postedit: Here is reverse LMM execution loop in 16 cycles per LMM

loop  rdlong    LMM1,ptr                'copy LMM from hub to cog   (hptr ignores last 2 bits!)
      sub       ptr,#7                  'decrement hub ptr by 1 long (prev by 1, now by 7)
LMM1  nop                               ' <-- first LMM executed
      rdlong    LMM2,ptr                'copy LMM from hub to cog
LMM2  nop                               ' <-- second LMM executed
      djnz      ptr,#loop               'decrement hub ptr by 1 long (now by 1, next by 7)

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Prop Tools under Development or Completed (Index)
http://forums.parallax.com/showthread.php?p=753439

cruising][noparse][[/noparse]url=http://www.bluemagic.biz]cruising[noparse][[/noparse]/url][/url]

This is a [noparse][[/noparse]b]bold[noparse][[/noparse]/b] test.

Post Edited (Cluso99) : 12/22/2008 8:24:15 PM GMT

heater · 2008-12-22 15:31

What we need here is for Brad to sit down an get his BST Spin compiler to emit LMM code instead of spin byte codes then include an LMM kernel in the finished binary.

Whilst he's at it he could introduce an LMM block to contain hand coded LMM.

Given the rate at which Brad works he could probably have this up and running whist everyone is sleeping off their Christmas dinner [noparse]:)[/noparse]

I guess my serious point is to not introduce yet another language, Basic, Forth or whatever but to use the one we already know and love. C has already been done.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Post Edited (heater) : 12/22/2008 3:36:50 PM GMT

Phil Pilgrim (PhiPi) · 2008-12-22 16:23

You can still get a 1:4 performance ratio if you use a reverse LMM. Writing reversed code by hand is a rather onerous task, but performing a reversal on normal code can be automated.

-Phil

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Just a few PropSTICK Kit bare PCBs left!

Cluso99 · 2008-12-22 19:54

@Phil:
Damn. I hadn't seen this thread. You alerted me to the fact that the sweetspot could be hit by dual fetches in reverse and that is what I used in my overlay loader.
Assembly Oververlay Loader for Cog FAST (renamed & released)
http://forums.parallax.com/forums/default.aspx?f=25&m=272823

One thing to keep in mind is the dual fetch/execute and it's impact on any jumps. I have not read the whole article yet, but it looks interesting.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Prop Tools under Development or Completed (Index)
http://forums.parallax.com/showthread.php?p=753439

cruising][noparse][[/noparse]url=http://www.bluemagic.biz]cruising[noparse][[/noparse]/url][/url]

This is a [noparse][[/noparse]b]bold[noparse][[/noparse]/b] test.

Cluso99 · 2008-12-22 23:01

Does anyone want to do some playing with LMM if I knock up a quick version of the Interpreter to run LMM (forwards) ?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Prop Tools under Development or Completed (Index)
http://forums.parallax.com/showthread.php?p=753439

cruising][noparse][[/noparse]url=http://www.bluemagic.biz]cruising[noparse][[/noparse]/url][/url]

This is a [noparse][[/noparse]b]bold[noparse][[/noparse]/b] test.

hippy · 2009-01-16 19:52

An LMM interpreted Spin VM is viable. Reverse LMM is the best choice for execution speed but the worse choice for developing / debugging IMO, but easy(ish) to convert for a release.

Don't forget though that a large Spin VM written in LMM takes away Hub memory ( quite quickly ) which would be used by your Spin program.

So ... Minimal footprint Spin VM running the LMM from external I2C Eeprom. Slower still but maximum RAM for Spin code. Would need a mechanism to get the LMM into that Eeprom space which may have to be separate to any user I2C.

Afraid I don't have any time at present to get involve din Prop development.

Mike Green · 2009-01-16 20:47

An interesting thought for the future would be an LMM Spin interpreter in ROM. That would free up much of the cog RAM for overlays and their data and would allow ready mixing of Spin interpretation and assembly routines (with cog data). Since assembly routines could be loaded from either ROM or hub ram, this would provide for easy extensibility of Spin.

hippy · 2009-01-17 17:10

The Spin interpreter is already in ROM and can be emulated although it needs a VM which has to be more complicated than LMM in that it needs to emulate PASM entirely. It also needs to decrypt the PASM in ROM which is the Spin interpreter.

I like the idea of an LMM-based Spin interpreter in ROM with some means of easily switching from PASM intepreter to LMM interpreter or other mechanism for extensibility. Not sure how that would fit though with protecting the IP Rights of such an interpreter. I get the impression that the Spin interpreter will be more protected in the Prop II.

BradC · 2009-01-17 19:14

hippy said...
I get the impression that the Spin interpreter will be more protected in the Prop II.

I asked off the record a little while back, and received the reply that the intention was the interpreter would be made available similar to the current one.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Cardinal Fang! Fetch the comfy chair.

hippy · 2009-01-18 15:42

@ BradC : Thanks for that, you can tell I've been away, so a belated "Whoop!", thanks Chip, Ken & Co if that does happen.

If it doesn't I doubt it would be the end of the world. I'm sure what we have can be stretched into what will be and we'll have third party Spin interpreters if needed.

Either way we can complain about not having the challenge of reverse engineering and decoding the new interpreter or that the effort is too hard

Embedded support for LMM within SPIN (discussion & proposals)

Comments