Shop OBEX P1 Docs P2 Docs Learn Events
Break the 2KB Cog RAM barrier running PASM ! — Parallax Forums

Break the 2KB Cog RAM barrier running PASM !

Cluso99Cluso99 Posts: 18,069
edited 2014-09-10 18:56 in Propeller 1
I have been wanting my 10,000th post to be something special :):):):):)

So, for your enjoyment...
Break the 2KB Cog RAM barrier running PAS


Some of you know I have been working on this for a little while.
My version of P1V has 4 @ 4KB Cogs and 48KB Hub RAM with remapping for ROM use.
This is a reasonably simple mechanism, pioneered for hubexec on the P2.

New Instructions:
                ' iiiiii_zcri_cccc_ddddddddd_sssssssss
[COLOR=#020fc0]AUGDS #D,#S     ' 000110_00xx_xxxx_ddddddddd_sssssssss
AUGS  #S        ' 000110_01xs_ssss_sssssssss_sssssssss
[/COLOR]
A single normal instruction following the AUGDS #D,#S instruction will have its:
* "S" 9-bit field left extended by #S (9-bits)
* "D" 9-bit field left extended by #D (9-bits)
If the #D and/or #S is not required, just use the value of "0".

A single normal instruction following the AUGS #S instruction will have its:
* "S" 9-bit field left extended by #S (23-bits) (ie works as a 32-bit immediate value)

A single instruction following the AUGS #S instruction which in turn follows from AUGDS #D,#S will have its:
* "S" 9-bit field left extended by #S (23-bits from the AUGS instruction)
* "D" 9-bit field left extended by #D (9-bits from the AUGDS instruction)

Example code snippets:
CON
_AUGDS  = 0110_0001_0000 << 18
_AUGS   = 0110_010       << 23
'       long  _AUGS  | ($FFFF_FE00 >> 9)               ' AUGS  #S    (23bits)
'       long  _AUGS  | 00_1100_1100_1100_1100_110      ' AUGS  #S    (23bits)
'       long  _AUGDS | (_0000_000 << 9) | (_0000_000)  ' AUGDS #D,#S (9+9bits)

DAT
        ........
' fill upper ram & lower ram (do lower first)
              mov       count,#16 '#511
              movs      :ind,#$050                      ' init the value stored            
              movd      :ind,#$100                      ' init the cog ram base addr      
              movs      :inc,#$080                      ' init the value stored            
              movd      :inc,#$100                      ' init the cog ram base addr      
              nop
:fill
' lower
:ind          mov       0-0,#0-0                        ' mov lower $000++, #$4444_44nn++
              add       :ind,#1                         ' S++ (can overflow into D++)
              add       :ind,h200                       ' D++

'             jmp       #:stepover

' upper
:augds  long  _AUGDS | (_0000_001 << 9) | (_0000_000)  ' set D=$200 (ie upper cog ram) & S=0
:augs   long  _AUGS  | ($C7CC_0C00 >> 9)                     ' set S=$CCCC_CCxx


:inc          mov       0-0,#0-0                        ' mov upper $200++, #$4444_44nn++
              add       :inc,#1                         ' S++                    
              add       :augs,#8                        ' AUG(S)++8              
              add       :inc,h200                       ' D++

' loop
:stepover     djnz      count,#:fill

''-----------------------------------------------------------------------------------
' readback upper ram & lower ram & special regs $1Fx
              mov       count,#16 '511
              movs      :inx,#$100                      ' init the cog ram base addr
              movs      :iny,#$100                      ' init the cog ram base addr
              movs      :inz,#$1F0                      ' init the cog ram special addr    
              nop
:rd
' upper
              long      _AUGDS | (_0000_000 << 9) | (_0000_001) ' set D=0 & S=$200 (ie upper cog ram)
:inx          mov       t0,0-0                          ' mov upper $200++ to t0
              add       :inx,#1                         ' S++                    
' lower
:iny          mov       t1,0-0                          ' mov lower $000++ to t1
              add       :iny,#1                         ' S++
' special
:inz          mov       t2,0-0                          ' mov $1F0++ to t2
              add       :inz,#1                         ' S++
              movd      :inz,#t2                        ' ensure wrap S
' send
              call      #send
              djnz      count,#:rd
''-----------------------------------------------------------------------------------
        ........

2014/09/09: Now working properly :)

Oops! still some problems with this code. I do have code executing in extended cog that works around the bugs.

I had some obscure bugs in the initialisation section of the fpga and these were being masked by my spin bugs :( Of course I wasn't looking at my spin was I.
Below is updated fpga code and the spin/pasm test program.


I anticipate being able to utilise the AUGDS instruction before a JMPRET (CALL/JMP/RET) instruction.
Currently I have not modified the JMPRET to utilise the AUGDS instruction properly, although I think a JMP should work.
Alternately a new JMPRETX instruction could be added.

Code is posted for the DE0. Otherwise just recompile.
Spin & PASM code is also posted (uses 3 cogs: Spin, PASM, FDX)

Comments

  • PublisonPublison Posts: 12,366
    edited 2014-09-06 03:06
    Congrats Ray, a new VIP!
  • pik33pik33 Posts: 2,366
    edited 2014-09-06 03:20
    A good piece of good work :) And my time is very limited now :(
  • ozpropdevozpropdev Posts: 2,792
    edited 2014-09-06 03:29
    Nice work Mr VIP! :)
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-09-06 11:33
    Way to go Ray!
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-09-06 15:09
    Thanks guys.
    Marvelous what a rethink and overnight can do. Before retiring I realised that my understanding of the states m[0]..m[4] were off slightly. This is likely my problem, but not sure I will get much time today - it's Fathers Day here in Oz. Not sure what is planned, but likely catching up with our two boys, their wives, and our 4 grandkids. Our daughter and husband and their 4.5 month old twin boys live in S.Korea and will be here in under two weeks for a visit. We spent some time with them just after the boys were born, so it will be nice to see them again, even tho' we see them on facetime most evenings. Another magic of the internet!

    BTW I am not sure how much this breakthrough means... It is the basics of hubexec!!! The pc bit width just needs to be expanded (now a parameter) and the instruction fetch mechanism needs to determine if the fetch is from hub, whereby it needs to wait for a hub slot. All this, and much more, was done in the P2, so its all possible - but in a simpler fashion.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-09-08 20:16
    AUGDS & AUGS are now working properly (as far as I can tell anyway) :)
    I have updated my first post with code and notes.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-09-09 01:14
    OOPS. Still some problems.

    I can work around the bugs.

    I have been able to jump to extended cog ram and then execute a couple of instructions and then jump back to lower cog :)
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-09-09 14:02
    Very cool!
    Cluso99 wrote: »
    OOPS. Still some problems.

    I can work around the bugs.

    I have been able to jump to extended cog ram and then execute a couple of instructions and then jump back to lower cog :)
  • roglohrogloh Posts: 5,791
    edited 2014-09-10 17:33
    Cluso99 wrote: »
    OOPS. Still some problems.

    I can work around the bugs.

    I have been able to jump to extended cog ram and then execute a couple of instructions and then jump back to lower cog :)

    @Cluso,
    I was looking over your instruction encodings for your AUGDS/AUGS enhancements in your top post and was wondering if you might want to encode it such that WC and WZ modifiers are both left free for other purposes or instructions. I think the NR bit would be a good way to differentiate between the AUGDS and AUGS variants. Right now you show it as "x" or don't care in your first post, but this is potentially wasteful. It might be nice to define it now so we don't bake it in this way once people begin to use it and find out later it is hard to change/reuse. If you are still fixing your bugs now might make a good opportunity for doing this.

    rogloh
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-09-10 18:23
    rogloh,
    I have defined the first 7 bits (opcode+wz) as 7'b000110_0. This way I have the ability of using 7'b000110_1 as other special instructions.
    The AUGS uses 23 immediate bits, being i+cccc+ddddddddd+sssssssss
    I am using the wc to differentiate between AUGDS and AUGD.
    My testing used the NR=0. Its not a big job to change the structure once I have it fully debugged.
    I intend to re-enable the cccc bits on the AUGDS instruction so that a group of them could be used as alternates using the condition codes.

    My current problems is to do with when I enable the modifications to the following instruction. I am modifying the previous and subsequent instructions in different ways at preset. So it's just a matter of correcting this. Been fun tho' following the timing.

    I can compile extended cog code with proptool and just load it up from hub into upper cog directly. And I can jump between upper and normal (lower) cog ram easily already using the normal jmpret instruction preceeded by AUGDS.

    Once this is done, I want to look at masking INA with AND, OR and SHR (perhaps not ROR as I suggested). Then I can look at OUTA.

    Your RDxxxx will come in handy for loading up upper cog ram :)
  • roglohrogloh Posts: 5,791
    edited 2014-09-10 18:56
    Cluso99 wrote: »
    rogloh,
    I have defined the first 7 bits (opcode+wz) as 7'b000110_0. This way I have the ability of using 7'b000110_1 as other special instructions.
    The AUGS uses 23 immediate bits, being i+cccc+ddddddddd+sssssssss
    I am using the wc to differentiate between AUGDS and AUGD.
    My testing used the NR=0. Its not a big job to change the structure once I have it fully debugged.
    I intend to re-enable the cccc bits on the AUGDS instruction so that a group of them could be used as alternates using the condition codes.
    Sounds good. That could be handy.
    My current problems is to do with when I enable the modifications to the following instruction. I am modifying the previous and subsequent instructions in different ways at preset. So it's just a matter of correcting this. Been fun tho' following the timing.

    I hear you there.
    I can compile extended cog code with proptool and just load it up from hub into upper cog directly. And I can jump between upper and normal (lower) cog ram easily already using the normal jmpret instruction preceeded by AUGDS.

    Once this is done, I want to look at masking INA with AND, OR and SHR (perhaps not ROR as I suggested). Then I can look at OUTA.

    Your RDxxxx will come in handy for loading up upper cog ram :)

    This is sounding really good. The ability to extend the COG RAM size is excellent, particularly for increasing instruction space, relieving the need for hub based LMM for the much larger programs only. Storing data in the high COG memory will naturally be doable too, you just tradeoff extra instruction space (for prepending AUGDS) in accessing it.
Sign In or Register to comment.