Shop OBEX P1 Docs P2 Docs Learn Events
CMM in hardware? — Parallax Forums

CMM in hardware?

David BetzDavid Betz Posts: 14,516
edited 2014-08-10 08:04 in Propeller 1
I know it's crazy but I've been wondering how practical it would be to implement the PropGCC CMM engine in hardware. Mostly, it unpacks a byte stream into PASM instructions that it executes one at a time. The idea would be to fetch a long at a time from hub memory and step through bytes decoding CMM instructions and building a PASM instruction which would be executed as soon as it was complete. I've attached a document describing the PropGCC CMM instruction set. Tell me if I'm insane! :-)

CMM.txt

By the way, the advantage of this approach is that there is already a PropGCC code generator for this instruction set.

Comments

  • Cluso99Cluso99 Posts: 18,069
    edited 2014-08-08 14:43
    David,
    You're insane ;)
    As indeed most of us are here ;)
    Now, let me go look at what you posted.
  • David BetzDavid Betz Posts: 14,516
    edited 2014-08-08 14:48
    Cluso99: Thanks for the encouragement! :-)
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-08-08 14:58
    I think that's doable. Be fun trying anyway.
  • David BetzDavid Betz Posts: 14,516
    edited 2014-08-08 14:59
    Cluso99 wrote: »
    I think that's doable. Be fun trying anyway.
    I'm going to at least make a stab at it although I have very little Verilog experience. Just one class in college.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-08-08 15:05
    That's one class more than most of us, me thinks.
  • overclockedoverclocked Posts: 80
    edited 2014-08-08 15:09
    Is it fair to say that this is just a seperate instruction set, so actually it has nothing to do with the original Propeller?
    I second that, a good hack. Probably no problem at all adding it.
    Just make sure to implement simulation so you can try your core within the development environment. It will save you time and be good to have a test-platform later.

    Good luck!
  • David BetzDavid Betz Posts: 14,516
    edited 2014-08-08 15:11
    Cluso99 wrote: »
    That's one class more than most of us, me thinks.
    I have a number of FPGA boards that I bought with the idea of trying to run the Verilog code I wrote for a simple processor. I never got around to it of course. The DE0-Nano I'm using for the P1 source is one of those boards. I had it before the P2 image became available for it. Chip releasing that image and this P1 source code gave me an opportunity to dig it out and actually use it. Maybe I should try my CPU code as well! :-)
  • jazzedjazzed Posts: 11,803
    edited 2014-08-08 16:16
    Is it fair to say that this is just a seperate instruction set, so actually it has nothing to do with the original Propeller?
    I second that, a good hack. Probably no problem at all adding it.

    The CMM instruction set is closely related to Propeller PASM but up to 65% smaller than comparable LMM*. It's like a thumb mode except that most of the instructions get decoded into PASM by a COG core virtual machine which is relatively slow.

    The idea seems to be to get rid of the PASM virtual machine by changing the COG (as an experiment for a CMM-native propeller) so that all the time-consuming decoding gets done in the COG core hardware (for the set of instructions that are used most often). I don't think that any optimizations for HUB window stalls have been considered, and they probably don't need to be in a first cut at least.

    Advantages:

    1. Higher code density over LMM*
    2. Speed on par with LMM* for non-fcached code second only to COG PASM speed
    3. CMM allows block read/execute of up to 64 PASM longs at COG speed
    4. Getting rid of the VM saves global HUB RAM space.
    5. A proven compiler already exists that runs the CMM code

    Disadvantages:

    1. Bigger COGs?
    2. Development time.

    *Notes: LMM was suggested by Bill Henning as a way to read and execute PASM code from HUB RAM one long at a time. Fcache is a larger fill and execute mechanism Bill mentioned as a higher performance overlay method. The only problem with LMM is the macros needed to do certain things and the fetch window stall problem. Macros used for LMM have been considered for a HUB-EXEC mechanism, and auto increment index variants of RDLONG have been considered to fix the fetch window stall problem.
  • Bob Lawrence (VE1RLL)Bob Lawrence (VE1RLL) Posts: 1,720
    edited 2014-08-08 17:14
    re: I have very little Verilog experience. Just one class in college.

    That's two more than me :)
  • jmgjmg Posts: 15,173
    edited 2014-08-08 21:40
    David Betz wrote: »
    .. Tell me if I'm insane! :-)

    By the way, the advantage of this approach is that there is already a PropGCC code generator for this instruction set.

    Not insane, the idea has strong merit.
    The PropGCC code generator is a serious plus.

    It probably needs a pause/enable signal added to the Prop opcode fetch, so it can idle clocks until the CMM reader is ready. ( I think such a signal is already needed for the WAIT opcodes. )

    When this works from HUB, it should be possible to expand it to stream from a QuadSPI device :)
  • David BetzDavid Betz Posts: 14,516
    edited 2014-08-09 04:33
    jmg wrote: »
    Not insane, the idea has strong merit.
    The PropGCC code generator is a serious plus.

    It probably needs a pause/enable signal added to the Prop opcode fetch, so it can idle clocks until the CMM reader is ready. ( I think such a signal is already needed for the WAIT opcodes. )

    When this works from HUB, it should be possible to expand it to stream from a QuadSPI device :)
    Anything is possible. We have the source code. We have the technology!
    I guess the problem is that the resulting COG might be more then double the size of the current COG so you'd need a much bigger FPGA to get a P1+CMM with 8 COGs.
  • David BetzDavid Betz Posts: 14,516
    edited 2014-08-10 08:04
    I just remembered something about the CMM instruction set that might make this project less useful.
    $4y   brw: y is the condition for the branch, next 2 bytes are address
    
    Macro Instructions
    $06   lcall <word>: call subroutine
          mov __TMP0, #<word>
          call #_LMM_CALL_INDIRECT
    
    CMM uses 16 bit code addresses since on the real P1 there isn't really any need to address more than 32K of memory. This was going to be a problem when porting CMM to P2 as well but it will require at least minor changes to the CMM code generator in PropGCC in order to allow it to address the larger hub memory people have been talking about. We could, of course, handle 64k perfectly well but more than that will require us to extend the immediate addresses to at least three bytes and hence will require tools changes.
Sign In or Register to comment.