Break the 2KB Cog RAM barrier running PASM !
Cluso99
Posts: 18,069
I have been wanting my 10,000th post to be something special
So, for your enjoyment...
Some of you know I have been working on this for a little while.
My version of P1V has 4 @ 4KB Cogs and 48KB Hub RAM with remapping for ROM use.
This is a reasonably simple mechanism, pioneered for hubexec on the P2.
New Instructions:
* "S" 9-bit field left extended by #S (9-bits)
* "D" 9-bit field left extended by #D (9-bits)
If the #D and/or #S is not required, just use the value of "0".
A single normal instruction following the AUGS #S instruction will have its:
* "S" 9-bit field left extended by #S (23-bits) (ie works as a 32-bit immediate value)
A single instruction following the AUGS #S instruction which in turn follows from AUGDS #D,#S will have its:
* "S" 9-bit field left extended by #S (23-bits from the AUGS instruction)
* "D" 9-bit field left extended by #D (9-bits from the AUGDS instruction)
Example code snippets:
2014/09/09: Now working properly
Oops! still some problems with this code. I do have code executing in extended cog that works around the bugs.
I had some obscure bugs in the initialisation section of the fpga and these were being masked by my spin bugs Of course I wasn't looking at my spin was I.
Below is updated fpga code and the spin/pasm test program.
I anticipate being able to utilise the AUGDS instruction before a JMPRET (CALL/JMP/RET) instruction.
Currently I have not modified the JMPRET to utilise the AUGDS instruction properly, although I think a JMP should work.
Alternately a new JMPRETX instruction could be added.
Code is posted for the DE0. Otherwise just recompile.
Spin & PASM code is also posted (uses 3 cogs: Spin, PASM, FDX)
So, for your enjoyment...
Break the 2KB Cog RAM barrier running PAS
Some of you know I have been working on this for a little while.
My version of P1V has 4 @ 4KB Cogs and 48KB Hub RAM with remapping for ROM use.
This is a reasonably simple mechanism, pioneered for hubexec on the P2.
New Instructions:
' iiiiii_zcri_cccc_ddddddddd_sssssssss [COLOR=#020fc0]AUGDS #D,#S ' 000110_00xx_xxxx_ddddddddd_sssssssss AUGS #S ' 000110_01xs_ssss_sssssssss_sssssssss [/COLOR]A single normal instruction following the AUGDS #D,#S instruction will have its:
* "S" 9-bit field left extended by #S (9-bits)
* "D" 9-bit field left extended by #D (9-bits)
If the #D and/or #S is not required, just use the value of "0".
A single normal instruction following the AUGS #S instruction will have its:
* "S" 9-bit field left extended by #S (23-bits) (ie works as a 32-bit immediate value)
A single instruction following the AUGS #S instruction which in turn follows from AUGDS #D,#S will have its:
* "S" 9-bit field left extended by #S (23-bits from the AUGS instruction)
* "D" 9-bit field left extended by #D (9-bits from the AUGDS instruction)
Example code snippets:
CON _AUGDS = 0110_0001_0000 << 18 _AUGS = 0110_010 << 23 ' long _AUGS | ($FFFF_FE00 >> 9) ' AUGS #S (23bits) ' long _AUGS | 00_1100_1100_1100_1100_110 ' AUGS #S (23bits) ' long _AUGDS | (_0000_000 << 9) | (_0000_000) ' AUGDS #D,#S (9+9bits) DAT ........ ' fill upper ram & lower ram (do lower first) mov count,#16 '#511 movs :ind,#$050 ' init the value stored movd :ind,#$100 ' init the cog ram base addr movs :inc,#$080 ' init the value stored movd :inc,#$100 ' init the cog ram base addr nop :fill ' lower :ind mov 0-0,#0-0 ' mov lower $000++, #$4444_44nn++ add :ind,#1 ' S++ (can overflow into D++) add :ind,h200 ' D++ ' jmp #:stepover ' upper :augds long _AUGDS | (_0000_001 << 9) | (_0000_000) ' set D=$200 (ie upper cog ram) & S=0 :augs long _AUGS | ($C7CC_0C00 >> 9) ' set S=$CCCC_CCxx :inc mov 0-0,#0-0 ' mov upper $200++, #$4444_44nn++ add :inc,#1 ' S++ add :augs,#8 ' AUG(S)++8 add :inc,h200 ' D++ ' loop :stepover djnz count,#:fill ''----------------------------------------------------------------------------------- ' readback upper ram & lower ram & special regs $1Fx mov count,#16 '511 movs :inx,#$100 ' init the cog ram base addr movs :iny,#$100 ' init the cog ram base addr movs :inz,#$1F0 ' init the cog ram special addr nop :rd ' upper long _AUGDS | (_0000_000 << 9) | (_0000_001) ' set D=0 & S=$200 (ie upper cog ram) :inx mov t0,0-0 ' mov upper $200++ to t0 add :inx,#1 ' S++ ' lower :iny mov t1,0-0 ' mov lower $000++ to t1 add :iny,#1 ' S++ ' special :inz mov t2,0-0 ' mov $1F0++ to t2 add :inz,#1 ' S++ movd :inz,#t2 ' ensure wrap S ' send call #send djnz count,#:rd ''----------------------------------------------------------------------------------- ........
2014/09/09: Now working properly
Oops! still some problems with this code. I do have code executing in extended cog that works around the bugs.
I had some obscure bugs in the initialisation section of the fpga and these were being masked by my spin bugs Of course I wasn't looking at my spin was I.
Below is updated fpga code and the spin/pasm test program.
I anticipate being able to utilise the AUGDS instruction before a JMPRET (CALL/JMP/RET) instruction.
Currently I have not modified the JMPRET to utilise the AUGDS instruction properly, although I think a JMP should work.
Alternately a new JMPRETX instruction could be added.
Code is posted for the DE0. Otherwise just recompile.
Spin & PASM code is also posted (uses 3 cogs: Spin, PASM, FDX)
Comments
Marvelous what a rethink and overnight can do. Before retiring I realised that my understanding of the states m[0]..m[4] were off slightly. This is likely my problem, but not sure I will get much time today - it's Fathers Day here in Oz. Not sure what is planned, but likely catching up with our two boys, their wives, and our 4 grandkids. Our daughter and husband and their 4.5 month old twin boys live in S.Korea and will be here in under two weeks for a visit. We spent some time with them just after the boys were born, so it will be nice to see them again, even tho' we see them on facetime most evenings. Another magic of the internet!
BTW I am not sure how much this breakthrough means... It is the basics of hubexec!!! The pc bit width just needs to be expanded (now a parameter) and the instruction fetch mechanism needs to determine if the fetch is from hub, whereby it needs to wait for a hub slot. All this, and much more, was done in the P2, so its all possible - but in a simpler fashion.
I have updated my first post with code and notes.
I can work around the bugs.
I have been able to jump to extended cog ram and then execute a couple of instructions and then jump back to lower cog
@Cluso,
I was looking over your instruction encodings for your AUGDS/AUGS enhancements in your top post and was wondering if you might want to encode it such that WC and WZ modifiers are both left free for other purposes or instructions. I think the NR bit would be a good way to differentiate between the AUGDS and AUGS variants. Right now you show it as "x" or don't care in your first post, but this is potentially wasteful. It might be nice to define it now so we don't bake it in this way once people begin to use it and find out later it is hard to change/reuse. If you are still fixing your bugs now might make a good opportunity for doing this.
rogloh
I have defined the first 7 bits (opcode+wz) as 7'b000110_0. This way I have the ability of using 7'b000110_1 as other special instructions.
The AUGS uses 23 immediate bits, being i+cccc+ddddddddd+sssssssss
I am using the wc to differentiate between AUGDS and AUGD.
My testing used the NR=0. Its not a big job to change the structure once I have it fully debugged.
I intend to re-enable the cccc bits on the AUGDS instruction so that a group of them could be used as alternates using the condition codes.
My current problems is to do with when I enable the modifications to the following instruction. I am modifying the previous and subsequent instructions in different ways at preset. So it's just a matter of correcting this. Been fun tho' following the timing.
I can compile extended cog code with proptool and just load it up from hub into upper cog directly. And I can jump between upper and normal (lower) cog ram easily already using the normal jmpret instruction preceeded by AUGDS.
Once this is done, I want to look at masking INA with AND, OR and SHR (perhaps not ROR as I suggested). Then I can look at OUTA.
Your RDxxxx will come in handy for loading up upper cog ram
I hear you there.
This is sounding really good. The ability to extend the COG RAM size is excellent, particularly for increasing instruction space, relieving the need for hub based LMM for the much larger programs only. Storing data in the high COG memory will naturally be doable too, you just tradeoff extra instruction space (for prepending AUGDS) in accessing it.