Shop OBEX P1 Docs P2 Docs Learn Events
Spin + LMM Pasm — Parallax Forums

Spin + LMM Pasm

Dave HeinDave Hein Posts: 6,347
edited 2010-07-13 02:35 in Propeller 1
There was some discussion about extending the Spin interpreter to execute instructions from· EEPROM, RAM, etc.· The thread is located at http://forums.parallax.com/showthread.php?p=917166 .· This thread also discussed the idea of adding an LMM Pasm interpreter to the Spin interpreter.· I decided to try to implement a simple LMM interpreter -- and it works!

I have a attached a demo program that implements a serial output driver in LMM Pasm.· The program runs at 57,600 baud, but I have tested it up to a rate of 460,800 with an 80 MHz system clock.· I started with the Spin interpreter source that Chip posted a while ago.· I commented out the 14 instructions that implement the square root instruction, and replaced it with 9 instructions that implement a simple LMM interpreter plus 5 registers that can be used by the LMM program.

Bill Henning has suggested that the STR*, *FILL and *MOVE instructions could be moved out of the Spin interpreter to implement an LMM interpreter with more psuedo-ops and a cache area for running small loops.· The STR*, *FILL and *MOVE instructions could then be implemented as cached programs.· This would also allow adding back the square root instruction.· The LMM interpreter could be invoked with the unused $3C Spin bytecode, or possibly an unused combination of spin bytecodes and parameters.

Dave
«134

Comments

  • Bill HenningBill Henning Posts: 6,445
    edited 2010-06-24 19:02
    Excellent work Dave!

    I will try it after UPEW. I am working on a new demo for UPEW right now [noparse]:)[/noparse]

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
    My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
    and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
    Las - Large model assembler Largos - upcoming nano operating system
  • jazzedjazzed Posts: 11,803
    edited 2010-06-24 19:19
    Dave Hein said...
    I have a attached a demo program that implements a serial output driver in LMM Pasm. The program runs at 57,600 baud, but I have tested it up to a rate of 460,800 with an 80 MHz system clock. I started with the Spin interpreter source that Chip posted a while ago. I commented out the 14 instructions that implement the square root instruction, and replaced it with 9 instructions that implement a simple LMM interpreter plus 5 registers that can be used by the LMM program.
    That looks great Dave smile.gif

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Propeller Pages: Propeller JVM
  • Brian RileyBrian Riley Posts: 626
    edited 2010-06-24 21:40
    This seems great ..... I have a question. I 'understand' what LMM is, but only in the broadest generalities. Is there somewhere a formal document or exhaustive Forum thread that lays out exactly what is LMM?

    cheers ... BBR

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    cheers ... brian riley, n1bq, underhill center, vermont
    The Shoppe at Wulfden
    www.wulfden.org/TheShoppe/
    www.wulfden.org/TheShoppe/prop/ - Propeller Products
    www.wulfden.org/TheShoppe/k107/ - Serial LCD Display Gear
  • Dave HeinDave Hein Posts: 6,347
    edited 2010-06-24 21:56
    LMM stands for Large Memory Model, and it·is described here -- http://propeller.wikispaces.com/Large+Memory+Model·.· It takes advantage of the fact that PASM instructions can be executed from anywhere in cog memory, and the carry and zero flags are not normally changed unless the instruction explicitly specifies to change them.· This way a program, such as the LMM interpreter can fetch instructions from Hub RAM and execute them one at a time.· Jumps have to be handled differently by manipulating the Hub address index (i.e., program counter).

    One thing I notice in running the demo is that the Prop Serial Terminal should be loaded before you started.· The demo generates a lot of serial traffic, and it seems to hang PST if you start it after loading the Prop.

    Dave
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2010-06-24 23:26
    This is very exciting. I think this might be the first step towards 'big spin'. I'm still not 100% sure in my mind if this is 'big spin' or 'big pasm' but either would be amazingly useful. Hopefully these questions are not too complicated;

    1) Which variable in the program is the 'program counter'?
    2) Where is the routine that reads and writes memory?
    3) Are you reading from eeprom at the moment or somewhere else? If eeprom, how do writes work?
    4) A general question to myself and others - what is the smallest external ram driver code possible? Eg SPI vs I2C vs direct drive sram vs latch drive sram.
    5) How would this model handle a big array - eg myarray[noparse][[/noparse]50000] I have a vague feeling LMM might easily be able to handle that, but I'm not sure.
    6) How would a compiler handle a big array that takes more than hub ram, then some code? Can you just mindlessly code your big array/big program or are there special cases to watch out for?

    I'm still trying to work out what this can and can't do. There seem to be a lot of building blocks already working - eg could you take Kye's code, load 512k of data from an sd card into an external ram, load this driver program into a cog, set it going and have a huge program running that used virtually none of the hub ram (thus leaving it free for other things). Are there essential things needed for hub - eg the program counter?

    Addit: @Brian This seems great ..... I have a question. I 'understand' what LMM is, but only in the broadest generalities. Is there somewhere a formal document or exhaustive Forum thread that lays out exactly what is LMM?

    From a practical point of view, LMM means you can start writing pasm code for a cog and keep writing and writing and not have to worry that it is too big to fit in a cog. It is the secret to getting all the retro emulations into one cog - they don't really fit in one cog but you can write the program as if they do.

    It doesn't take long to run out of cog space (2k), but of course, eventually this model falls over when you run out of hub space. I *think* what Dave has done here is solve the problem by storing the program in external memory.

    I'm still a bit confused about it myself, but as I see it there are two problems - 'big spin' where you have a spin interpreter pulling instructions from external memory rather than from hub ram, and 'external memory LMM' where pasm instructions are being read from external memory. I have a vague feeling Dave might have solved both these problems.

    I suspect there are speed tradeoffs to consider with external memory. Clearly, eeprom and spi ram and other serial chips are going to be slower than parallel. In terms of speed, the order is, from slow to fast - serial eeprom/ram, latched sram, direct drive sram. And of course, the cost in terms of propeller pins works the other way. Where is the optimum balance between speed and pins? I suspect that may change for any project using this model, so in an ideal world one would have driver code that you could drop in for each type of memory.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    www.smarthome.viviti.com/propeller

    Post Edited (Dr_Acula) : 6/25/2010 12:05:40 AM GMT
  • BradCBradC Posts: 2,601
    edited 2010-06-24 23:59
    If you removed the sqrt and built it as part of the interpreter sitting in hub ram, then used the sqrt instruction to execute it, you could then use the $3C opcode to execute arbitrary LMM instructions. That way the interpreter would perform the same as it does now with the only addition to the compiler of the $3C instruction. At the moment, bst supports the Bytecode() extension to build arbitrary spin bytecodes which might be useful for testing purposes and we could look at making some "official" command to utilise the LMM instructions as things firm up.

    Great work Dave!

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    "I mean, if I went around sayin' I was an emperor just because some moistened bint had lobbed a scimitar at me they'd put me away!"
  • jazzedjazzed Posts: 11,803
    edited 2010-06-25 00:52
    Dr_Acula said...
    I suspect there are speed tradeoffs to consider with external memory. Clearly, eeprom and spi ram and other serial chips are going to be slower than parallel. In terms of speed, the order is, from slow to fast - serial eeprom/ram, latched sram, direct drive sram. And of course, the cost in terms of propeller pins works the other way. Where is the optimum balance between speed and pins?
    I'm using Propeller boot EEPROM (2 pins) with a 2KB HUB Cache with the Propeller JVM to fetch/execute byte-code at a fair clip (less than 2x slower than fetching from HUB). A SPI SRAM (2 more pins) or SD Card (4 more pins) would be slightly faster and easier for development (less EEPROM replacement cost).

    It would be reasonable to expect something similar from a Spin interpreter (for fetch/execute) and have most of HUB memory left for Video buffers, etc.... Relative performance for the FIBO tests are presented in the log10 chart below.

    @BradC, it's nice to see your active interest in this LMM approach.

    Cheers,
    --Steve

    attachment.php?attachmentid=71213

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Propeller Pages: Propeller JVM
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2010-06-25 01:08
    I have to confess that the more I read this code, the more confused I become. I'm not even sure now what a 'LMM interpreter in a Spin interpreter" actually is!

    I'm even more confused trying to work out what is new code, what is old code, and things like
    ' This LMM Spin routine transmits the byte at location $7fff using the asynchronous serial
    ' protocol on P30.  The baud rate is determined by the bit cycle count stored in reg5.
    DAT
                            org     0
                            ' Create bit mask in reg1 for P30 and set OUTA and DIRA
    txbyte                  mov     reg1, #1
                            shl     reg1, #30
                            or      outa, reg1
                            or      dira, reg1
    
    



    where it says this a "LMM Spin routine" and then immediately launches into pasm, not spin.

    My understanding is that the existing standard spin interpreter runs in one cog and pulls bytes in one at a time from hub memory and interprets them.

    My understanding of a pure LMM pasm routine is like in the qZ80 emulation where there is a huge chunk of pasm code and these are pulled in one at a time into an interpreter.

    I'm really confused about the idea of a spin interpreter that replaces the square root instruction to run a pasm LMM. Does the spin code keep running too? How do you mix and match spin and pasm in this new environment?

    And where are the bytes coming from for the LMM pasm? I got the impression they were coming from external eeprom but I can't see any eeprom driver code. Only instructions like rdbyte and wrbyte which are sending bytes to and from hub.

    I'll post again when I am less confused. I still have this vague feeling that there are a number of threads with some brilliant code that just need to be brought together somehow.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    www.smarthome.viviti.com/propeller
  • jazzedjazzed Posts: 11,803
    edited 2010-06-25 01:15
    Dr_Acula said...
    .... And where are the bytes coming from for the LMM pasm? I got the impression they were coming from external eeprom but I can't see any eeprom driver code. Only instructions like rdbyte and wrbyte which are sending bytes to and from hub. ...
    I'm afraid your enthusiasm preceded your investigation. "Ready, Fire, Aim" is OK sometimes wink.gif
    Dave's code is just demonstrating a proof of a possibility. A fully developed idea will come later.

    Cheers,
    --Steve

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Propeller Pages: Propeller JVM
  • RossHRossH Posts: 5,519
    edited 2010-06-25 01:15
    @Dr_Acula,

    This is neither 'big SPIN' or 'big PASM' - I don't know if anyone is still working on 'big SPIN', but 'big PASM' has been available for quite a long time now. I think Bill's assembler will do it, and Catalina certainly does (once the initial C to PASM translation is complete, Catalina essentially just uses Homespun to assemble a 'big PASM' program which can be executed by the Catalina LMM Kernel - if you wanted to 'hand write' an LMM PASM program instead then you can do that too - up to 16Mb of it. I already handwrite some of the C library functions in LMM PASM).

    This is something different - it's a clever way of allowing SPIN to easily execute LMM PASM extensions "on the fly". This will make SPIN a lot more usable since one of it's biggest drawbacks has been that SPIN programs execute very slowly, but speeding them up by adding PASM extensions is way too complex for most beginners since it requires you to fire up another cog and then interact with it - and even then you can only execute 496 instructions at a time. Even worse, the cost of loading a cog to execute an arbitrary PASM function "on the fly" made the whole idea pointless - till now!

    Congratulations, Dave. I can see applications of this in Parallax's forthcoming SPIN/C Hybrid.

    Ross.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Catalina - a FREE C compiler for the Propeller - see Catalina
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2010-06-25 01:34
    Ah, I think I can see where this is going. RossH This is something different - it's a clever way of allowing SPIN to easily execute LMM PASM extensions "on the fly"

    So ok, I have my spin code and I want to include a tiny pasm program that I pass one variable, and the pasm program adds one to that variable and then returns? I can do that using this method without having to start up a cog?

    Ok, combine that with the graph jazzed posted. Add an spi ram. Can I now pull my pasm instructions for the above out of an external ram?

    I'm looking at the pasm ram driver code Cluso has, and my latch ram driver code. It isn't very big. Borrow another 20-30 instructions off some spin instructions and the ram driver code can be included in the cog code.

    If so, could you start to think of a library of pasm routines you might load into ram at startup (or sd card, or eeprom) and they are sitting there ready to be used? A library that started by replacing the square root function for instance?

    jazzed - you have code that loads eeprom into a little cache in hub? I've been playing around with Kye's code and it would be dead easy to load that from an sd card instead.

    Yes, "ready, fire aim" here again, but could this lead to the idea of a library of LMM functions you store on an sd card (or move on bootup to an spi ram for faster access) and only run if you need them? This would be perfect for those instructions that you only use infrequently.

    Now I'm racing ahead here, but what if you had a very simple system for such a library, where you only pass one variable to a library function. That variable is a memory location in hub, and the library expects to find up to n variables (whatever it needs) at that location. It then returns n variables at the same location. This could work for passing and returning one long, and it could work for passing and returning strings and arrays. It could solve the problem of predefining certain locations in hub which I think is where the definition of a spin operating system is getting bogged down. Define them on the fly (and if possible, recycle hub locations left over from loading up cogs, so you don't even need any extra memory).

    Hmm - I hope that isn't 'ready, miss, fire, aim'?


    But going right back to the beginning, yes you don't have to fire up a cog. Normal code:
    PUB main | i
      ifnot cog_loaded
        cog_loaded := 1
        coginit(0, @spin_interp, 4)
    ....
    
    DAT                     org
    
    spin_interp             mov     x,#$1F0-pbase           'entry, load initial parameters
                            mov     y,par
    
    



    ie, some data code which goes into a cog with a coginit somewhere.

    Dave's nifty code - the subtlety is hard to spot:
    DAT
                            org     0
                            ' Create bit mask in reg1 for P30 and set OUTA and DIRA
    txbyte                  mov     reg1, #1
                            shl     reg1, #30
                            or      outa, reg1
    
    



    But there is no coginit associated with txbyte.

    Hmm - even without external ram, you can do some clever things scattering tiny DAT bits of code all through your program. Have a library of these routines. eg BST could list them all and you could click a button that pastes them into the code. You could put them all at the end, or put them near the code that called them.

    Faster and less code space than spin.

    Yes - it could be much faster. Consider a tiny 4 long pasm routine. Right now, to load that into a cog loads up 496 longs, then you have to fire it up and interact with it. And putting new different code into that cog is not easy.

    But this method means you could have hundreds of tiny DAT pasm routines in a program.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    www.smarthome.viviti.com/propeller

    Post Edited (Dr_Acula) : 6/25/2010 1:55:03 AM GMT
  • jazzedjazzed Posts: 11,803
    edited 2010-06-25 01:55
    Dr_Acula said...
    Ah, I think I can see where this is going.
    By George, I think you've got it!

    Your racing ahead has merit and is not without precedent.
    There are still many possibilities as this thread shows, it just takes
    some inspiration to break it out and perspiration to make it functional.

    Cheers,
    --Steve

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Propeller Pages: Propeller JVM
  • Cluso99Cluso99 Posts: 18,069
    edited 2010-06-25 01:59
    Congratulations Dave! Nice progress.

    There is room for the LMM in my Faster Interpreter, so guess you have now found the motivation for me to debug it! Now to find the time. Meanwhile, you can continue with the fine work.

    Drac: What LMM does it that it effectively executes an inline PASM sequence that is run from memory other than within the cog. If the pasm code resides in hub then by definition, the best performance that can be achieved is 25% (1/4) of raw pasm code because you have to fetch the long from hub (copy it into cog) and then execute it in a loop. The tightest loop implementation takes 16 clocks and depends on the code of LMM, but this requires complex coding. An easier LMM loop takes 32 clocks, which is 12.5% (1/8) of raw pasm code.

    Of course, fetching from external SRAM, EEPROM, etc will slow this considerably. These are byte or bit accesses, not long accesses, so you can imagine the speed hit here. Perhaps this may be where Bill's VMxxx coding will come in handy.

    The real ability is to be able to add some inline LMM code to increase the spin interpreter speed on specific code sequences and to be able to intermix this with spin.

    Dave: FWIW I wrote a zero-footprint debugger which only uses 4 longs and totally resides in the shaddow registers. I used this to debug my Interpreter, so it can trace both the spin bytecodes and the underlying pasm, operated by commands. This debugger is actually a LMM version but it is not built for speed. You may find it handy for debugging purposes. It is in the obex.

    Brad: Thanks for your enthusiasm, and of course your compiler IDE.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Links to other interesting threads:

    · Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
    · Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
    · Prop Tools under Development or Completed (Index)
    · Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
    · Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
    My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz
  • localrogerlocalroger Posts: 3,452
    edited 2010-06-25 02:06
    Now that I've had a chance to look at it I'd just like to chip in my USD$0.02 and say this is a very cool hack. At last we have an answer for all those people who want to know how to run an asm routine inline without starting another cog, waiting for a response, etc. etc. This is the answer to that.
  • Dave HeinDave Hein Posts: 6,347
    edited 2010-06-25 02:19
    I may have made it confusing by referencing jazzed's thread about extending Spin so it can execute from external EEPROM/RAM.· I thought it was important to point out where some of the ideas originated.· My approach was to enable the Spin interpreter to execute both Spin and PASM code from the internal 32K RAM.

    Ross is correct in pointing out that there are existing compilers that use LMM PASM.· This includes the Catalina and ImageCraft C compilers, PropBasic, and other implemetations.· I understand that Bill Henning is working on an assembler that will use LMM as well.

    I think there is some value in integrating a LMM PASM interpreter into the Spin interpreter.· Portions of Spin code that·are running too slowly could be converted into PASM and executed without having to load another cog.· This would be good in applications that are already using all 8 cogs·and need more speed improvement in the Spin code.

    If the LMM PASM interpreter were standardized we could run·compiled C, PropBasic or LAS·code in the same cog that is running Spin.· It's something to think about.

    Dave
  • jazzedjazzed Posts: 11,803
    edited 2010-06-25 02:34
    @BradC, I'm just a little confused about needing to overload 2 separate byte-codes to implement this in-line LMM. Can't we just define a new symbol PASM(...), LMM(...), or something for opcode $3f and have the parameters that follow be the arguments for the LMM?
    Dave Hein said...
    I thought it was important to point out where some of the ideas originated. My approach was to enable the Spin interpreter to execute both Spin and PASM code from the internal 32K RAM.
    Most of us get it I think. I certainly appreciate the reference[noparse]:)[/noparse] Of course I didn't necessarily have any influence on the idea. They are of course separate topics and we have trouble with that around here sometimes[noparse]:)[/noparse]
    Cluso99 said...
    Of course, fetching from external SRAM, EEPROM, etc will slow this considerably. These are byte or bit accesses, not long accesses, so you can imagine the speed hit here. Perhaps this may be where Bill's VMxxx coding will come in handy.
    It has already been demonstrated in several ways that external parallel SRAM is not necessary to achieve *functional* performance. All we need now is for you (or someone else) to finish a *roomy* spin interpreter re-write [noparse];)[/noparse]

    Cheers,
    --Steve

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Propeller Pages: Propeller JVM
  • RossHRossH Posts: 5,519
    edited 2010-06-25 03:38
    @Dave,

    There will probably never be a "standard" for LMM PASM - and in my view that's a good thing. Anything beyond the basic LMM loop involves significant and complex tradeoffs in terms of time, space and functionality. Quite rightly, each current LMM implementation is "tuned" to be good at different things - because 496 instructions is simply not enough to do everything that would be required in an ideal implementation. This is why SPIN is so limited - I would bet a free copy of Catalina that it is not by design, but simply because Chip ran out of space!

    A classic example is floating point - if you want to add floating point to a SPIN program (and don't have a spare cog) then implementing it by converting the current PASM floating point functions to LMM PASM is quite feasible - but for many applications would still be too slow to be useful (i.e. 4 or 5 times slower). For fast floating point you need to implement it as an LMM PASM "primitive" - i.e. as a normal PASM function that executes within the LMM cog itself (this is what Catalina does). This gets you the best floating point speed - but of course it means you now need to find yet more space in your cog. Which means you have to make a decision about what else to to throw out. Which causes more functional limitations, or means more things have to be implemented in LMM PASM (and therefore run slower) etc etc.

    I think in such cases it is better to have different LMM implementations - e.g. one for those who want fast floating point, and another one with either no floating point support (or at least significantly slower floating point), but which is faster in all other respects for those who don't.

    Ross.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Catalina - a FREE C compiler for the Propeller - see Catalina
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2010-06-25 03:58
    Ok, inline LMM Pasm for dummies (like me!)
    con
    
    load spin cog
    main
      repeat
        routine1
        routine2
        routine3
    
    pub routine1
      byte[noparse][[/noparse]$7fff] := value
      result := ^^ @routine1lmm
    
    dat
      org 0
    routine1lmm
      some pasm code
      jmp     #push
    
    pub routine2
     pub routine1
      byte[noparse][[/noparse]$7fff] := value
      result := ^^ @routine2lmm
    
    dat
      org 0
    routine2lmm
      some pasm code
      jmp     #push
    
    pub routine3
      byte[noparse][[/noparse]$7fff] := value
      result := ^^ @routine3lmm
    
    dat
      org 0
    routine3lmm
      some code
      jmp     #push
    
    *****
    modified spin interpreter code
    
    



    Is that the rough skeletal structure?

    What does ^^ @ mean?

    Would you associate each pub with its lmm code, or do you group all the lmm code together (or does it not really matter?)

    Would $7FFF be a convenient post box for passing data, and could you count down from that for things like strings? If you do count down from the top of memory, can you protect this memory to prevent strange errors creeping in as the entire program gets bigger and bigger and starts overwriting this upper memory? (Would that be a job for the compiler?)

    Does every bit of lmm code need to have an associated bit of spin driver code? Or can you call lmm code directly (eg via postbox locations above)?

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    www.smarthome.viviti.com/propeller

    Post Edited (Dr_Acula) : 6/25/2010 4:04:10 AM GMT
  • Cluso99Cluso99 Posts: 18,069
    edited 2010-06-25 06:22
    jazzed said...
    It has already been demonstrated in several ways that external parallel SRAM is not necessary to achieve *functional* performance.
    Yes. I was merely pointing out the obvious to us. Having the ability is the main thing. We can then disect code to make the best use of the tools available.

    I noted a few·ideas flowing through here, so I thought I would throw a couple out there to consider...

    1. I built a fast and small overlay loader which I used initially in my faster spin development. We use this in ZiCog. For any code that has repeating loops this is much faster than LMM. A mix of both could be a good way to go. For instance, divide might be in an overlay whereas coginit could be LMM. Floating point could perhaps be a mix of both (I have never looked at it).

    2. It was mentioned that·this (spin interpreter with LMM PASM)·is a quick way to invoke a small routine in a cog. There is a much better way (if spin is not required) and that is to have a stub loaded into a cog which waits for a task to run, either using Overlay or LMM. The stub could have both abilities, and of course much more room available for pasm, so some routines could even be present.

    There are some very interesting possibilities coming out here !

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Links to other interesting threads:

    · Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
    · Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
    · Prop Tools under Development or Completed (Index)
    · Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
    · Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
    My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz
  • jazzedjazzed Posts: 11,803
    edited 2010-06-25 09:52
    Dr_Acula said...

      byte[noparse][[/noparse]$7fff] := value
      result := ^^ @routine1lmm
    
    


    What does ^^ @ mean?

    Would $7FFF be a convenient post box for passing data, and could you count down from that for things like strings?

    Normally "result := ^^ @routiine1lmm" means calculate square root of the address of routine1lmm and assign the result to result. In the case where ^^ is overloaded in the interpreter, it means run the lmm interpreter fetching code at address routine1lmm.

    byte[noparse][[/noparse]$7fff] would not be necessary if a special compiler statement and code was created that took arguments for running the lmm. Until a compiler modification is made, such things would be necessary though.

    If $3F could be redefined and compiler changed, the new statement could look like this: "ASM(@routine1lmm,@value)" Of course such things would be up to BradC/mpark or some other compiler person.

    Cheers,
    --Steve

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Propeller Pages: Propeller JVM
  • BradCBradC Posts: 2,601
    edited 2010-06-25 10:06
    jazzed said...
    @BradC, I'm just a little confused about needing to overload 2 separate byte-codes to implement this in-line LMM. Can't we just define a new symbol PASM(...), LMM(...), or something for opcode $3f and have the parameters that follow be the arguments for the LMM?

    Isn't that what I said (Except I thought the unused opcode was $3C not $3F). It's certainly what I meant. What I tried to say was you can insert arbitrary bytecode into a spin routine for testing now using bytecode() and if things firm up nicely I see absolutely no reason not to turn that into a dedicated SPIN command of some kind to invoke LMM code in the future.

    Read that as "I'm not adverse to supporting some extension to the compiler, but for right now you can use the $3C bytecode by using bytecode() anyway"

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    "I mean, if I went around sayin' I was an emperor just because some moistened bint had lobbed a scimitar at me they'd put me away!"
  • jazzedjazzed Posts: 11,803
    edited 2010-06-25 10:15
    BradC said...
    What I tried to say was you can insert arbitrary bytecode into a spin routine for testing now using bytecode() and if things firm up nicely I see absolutely no reason not to turn that into a dedicated SPIN command of some kind to invoke LMM code in the future.
    Ooops, yes I saw the reference to bytecode(), but I missed the intention of it.
    What is the bytecode() syntax, and how might it be used in this example?

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Propeller Pages: Propeller JVM
  • BradCBradC Posts: 2,601
    edited 2010-06-25 10:47
    jazzed said...
    BradC said...
    What I tried to say was you can insert arbitrary bytecode into a spin routine for testing now using bytecode() and if things firm up nicely I see absolutely no reason not to turn that into a dedicated SPIN command of some kind to invoke LMM code in the future.
    Ooops, yes I saw the reference to bytecode(), but I missed the intention of it.
    What is the bytecode() syntax, and how might it be used in this example?

    It's in the manual!

    (I've been waiting to say that!)

    Seriously, it's like string() but it inserts arbitrary bytecode into the method where you use it..

    Local Parameter DBASE:0000 - Result
    Local Variable  DBASE:0004 - X
    Local Variable  DBASE:0008 - Y
    |===========================================================================|
    2                        X := Y
    Addr : 0018:             68  : Variable Operation Local Offset - 2 Read
    Addr : 0019:             65  : Variable Operation Local Offset - 1 Write
    3                        Bytecode($68,$65)
    Addr : 001A:             68  : Variable Operation Local Offset - 2 Read
    Addr : 001B:             65  : Variable Operation Local Offset - 1 Write
    Addr : 001C:             32  : Return                                           
    
    



    So in this instance the lines are equivalent.

    Don't forget to push the parameters prior to calling the routine!

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    "I mean, if I went around sayin' I was an emperor just because some moistened bint had lobbed a scimitar at me they'd put me away!"
  • TonyWaiteTonyWaite Posts: 219
    edited 2010-06-25 11:04
    Is this the *breakthrough* ?

    I can envisage a generalised 'utility' programming environment for the Prop that might make it much *simpler*
    to work with:

    So that the PropBASIC user could easily interact with Spin objects...

    So that the memory limitations would not usually be relevant...

    So that the beginner could use the power of the Propeller almost immediately...

    So that all of the Prop 'lurkers' like me could take the plunge without hitting a wall of complexity.

    Are these assumptions correct or have I got needlessly excited?

    Regards,

    T o n y
  • Cluso99Cluso99 Posts: 18,069
    edited 2010-06-25 11:54
    Dave: I just looked at your code.

    May I suggest you add the following statements to fill any unused space and also report an error (the :gaps line and the fit $1E5 line). Otherwise you may spend a lot of time looking for a bug because these locations must be kept by the Interpreter - I found out the hard way!!
    '
    :gaps                   long    $0[noparse][[/noparse]$1E5 - $]             'fill the gaps       
                            fit     $1E5                                          
    '
    masklong                long    $FFFFFFFF               '(temporarily used by runner code)
    masktop                 long    $80000000               '(temporarily used by runner code)
    maskwr                  long    $00800000               '(temporarily used by runner code)
    
    

    Chip found a 6 long shorter sqrt but that still does not give you enough space.

    It is an interesting·way you have started the interpreter. I went through a fairly convoluted way to start mine.

    I noted that you cannot use a constant easily in LMM. This is something I had not thought about. Maybe Bill has a solution?

    BTW: You may require Chip's permission to use the Interpreter code as it is not MIT licensed as far as I am aware. I do not think this will be an issue though. My modifications to his code, as much as they apply, are MIT licensed.

    result := ^^ @txbyte
    


    That is certainly a cunning way to use the SQRT function to run the LMM code·at @txbyte.

    I have just been motivated to dig out my interpreter artchive. It looks as though the latest published code ClusoInterpreter_260C_006F.spin was the latest working version.·This version has $1C7-$1E5 free (31 longs free). However, my latest version _015F (with bug) only has 2 (+11 being used as internal debugging) longs free. I have attached v260C_006F here.
    ·



    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Links to other interesting threads:

    · Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
    · Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
    · Prop Tools under Development or Completed (Index)
    · Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
    · Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
    My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz

    Post Edited (Cluso99) : 6/25/2010 12:05:15 PM GMT
  • RossHRossH Posts: 5,519
    edited 2010-06-25 12:45
    TonyWaite said...
    Is this the *breakthrough* ?

    No, I don't think so. While it makes it easier to integrate LMM PASM with SPIN, I don't see that as something Propeller beginners would want. Nor do they need it - SPIN by itself is a very good tool for beginners. It can take you all the way from knowing very little about programming to implementing sophisticated concurrent algorithms.

    I think this idea is more useful for intermediate level Prop users who are pushing the boundaries of what is possible using SPIN alone, but who do not want to switch to to another language.

    To those users I think it will be quite valuable.

    Ross.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Catalina - a FREE C compiler for the Propeller - see Catalina
  • Dave HeinDave Hein Posts: 6,347
    edited 2010-06-25 14:01
    @Ross, floating point is one of the applications I had in mind for inline PASM.· I think there are lots of uses for Spin + LMM PASM.· It could be used for any bit of code that needs to run faster than Spin, but the programmer would prefer to run it inline instead of starting up another cog.

    @Cluso, I like the idea of an overlay loader.· I plan on adding an FCACHE pseudo-op, which would provide that functionality.· The FCACHE area would overlap the user register area, so there would be a trade-off between using lots of registers or running a tight loop in the cache.· Larger chunks of code could be loaded by temporarily overwriting a portion of the Spin interpreter, and then reloading that portion when done.

    I don't understand the reasons for the :gaps space.· Does the Spin interpreter require masklong, masktop and maskwrite to be at a specific location?· I did put a FIT 496 at the end to make sure it fits, and I do intend to add more user registers to use up all the space.· I don't understand Chip's comment that says "temporarily used by the runner code".· Is there something special about these three locations?

    @Brad, a bytecode($3C) command will work great for this application.· I was planning on doing some self-modifying code to insert the $3C, but you already have a way to do it.· Now I just need to figure out how the Spin interpreter handles parameters so I can pass the address of the LMM PASM routine to it.

    @Tony, I wouldn't consider this a "breakthrough", but more like a different path along the evolutionary tree.· Spin + LMM PASM might work out, or it might be a deadend.· I haven't looked at the PropBasic LMM interpreter, but if that could be fully integrated into the Spin interpreter we could run both languages in the same cog.· I think that opens up some interesting possibilies.· It may be possible to have a compile time switch for the C compilers that would make them compatible as well.

    There have been discussions in the past about extending the Spin language.· I proposed the idea of adding CPUB, CPRI, CDAT and CVAR instructions to Spin that would cause PASM code to be generated that would run directly in another cog.· I even protyped an xspin pre-processer that would generate PASM code and convert xspin source files to spin.· Now I think this could include LMM PASM as well.
  • lonesocklonesock Posts: 917
    edited 2010-06-25 17:44
    So, semi-random question: is there any way to modify a cog register inside a running SPIN Interpreter? If so, we could have a simple function Patch_LMM_SPIN( cog_ID ), no need to include the code for the whole modified PNUT in our applications.

    Jonathan

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    lonesock
    Piranha are people too.
    (geek humor escalation => "There are 100 types of people: those who understand binary, and those who understand bit masks")
  • BeanBean Posts: 8,129
    edited 2010-06-25 19:45
    lonesock,
    · I don't think so.
    · I tried setting "lmm_pc := @txbyte" in spin so I could use the parameter as the value (instead of putting it into $7FFF),·and it doesn't seem to work.


    [noparse][[/noparse]Edit]
    ··Okay, I figured it out. You need to set them BEFORE you start the cog.
    · In this edited version, I set the LMM starting address in "lmm_start" then use the ^^ parameter as the data (instead of having to put the data into $7FFF).

    · This does have alot of possiblities... This simple example allows sending 57,600 baud serial data without using another cog.
    ·
    · Thanks for posting this Dave. Very cool.

    Bean.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    Use BASIC on the Propeller with the speed of assembly language.
    PropBASIC thread http://forums.parallax.com/showthread.php?p=867134

    March 2010 Nuts and Volts article·http://www.parallax.com/Portals/0/Downloads/docs/cols/nv/prop/col/nvp5.pdf
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    There are two rules in life:
    · 1) Never divulge all information
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    If you choose not to decide, you still have made a choice. [noparse][[/noparse]RUSH - Freewill]

    Post Edited (Bean) : 6/26/2010 12:00:04 AM GMT
  • Cluso99Cluso99 Posts: 18,069
    edited 2010-06-26 00:06
    We all have different pieces to the puzzle of the Interpreter. I came at it from a replacement point of view. So I replaced code section·with working and tested code sections, without understanding the actual bytecodes. I have a much better understanding of how the bytecodes work, but absolutely no understanding of how they are created from spin, which is a function of the compiler.
    Dave said...
    I don't understand the reasons for the :gaps space.· Does the Spin interpreter require masklong, masktop and maskwrite to be at a specific location?
    Absolutely.·IIRC the runner is·used by·the cogxxx functions. So locations $1E5 onwards MUST be maintained.

    My overlay loader (it maintains a load at maximum hub speed of 16 clocks per long) is in the OBEX. If you have any questions, just ask.

    To expand the Interpreter, I suggest we use 2 bytecodes... $3C $xx where xx defines up to 256 new opcodes. Note that these opcodes should be allocated in blocks of 3 because the decoding makes them easy for this. Let us start at $04 (we may need to do some return type functions for $00-$03).

    So, for LMM inline, You would use $39 $hh $ll $3C $04 ($39 $hh $ll is a push and hhll = @LMM_start) {note: $3B $xx $xx $xx $xx for full long address}

    We will also going to need to have a return to spin address as a fixed location. If the interpreter changes, then the "PUSHx" address you currently use will change. Let me think on this. We do not want to waste code space in the interpreter. The interpreter always has the fetch loop at cog $020 and my interpreter maintains this. So, you wish to just return (no push required) just jmp #$020. Therefore, maybe all pushes should be done in LMM? What do you think?

    lonesock: Yes it is possible to patch the rom interpreter to take control. Hippy did it first. I have done it to run LMM in shadow ram to take control to run my zero footprint debugger. However, if we work with my Interpreter, we also get a speed gain·plus the space, and the ability to compile it. It does use 256 longs in hub for the decode table, and of course the 2KB for the Interpreter to reside in hub, although we could perhaps load this from SD with some extra code.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Links to other interesting threads:

    · Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
    · Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
    · Prop Tools under Development or Completed (Index)
    · Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
    · Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
    My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz
Sign In or Register to comment.