New P1V capabilities added - a COGRAM stack

rogloh · 2014-09-15 07:11

Today for my second Verilog project I took a pipe wrench to the pipeline (slightly

) and worked in some fresh push/pop functionality to the P1V, this time using the internal COG RAM for holding a stack and making use of a new instruction opcode. I've been testing each operation out and it appears to be working well so far for me. Anyone wanting to play around with it just needs to unzip and copy the two verilog files in the attached zipfile to a clean P1V build folder and give it a try. The changes were isolated to this feature only for my testing and are exclusive of my other hub based push/pop changes.

The new instructions provide the following functionality:

PUSH   S  ' push 32 bit register onto COGRAM stack
PUSH  #S  ' push constant/pointer onto COG stack
POP    D  ' pop 32 bits from COG stack into register
RETX  #S  ' pop COG PC from the COG stack and return to that location, optionally add amount #S to SP following that process.
CALLX #S  ' absolute call, pushes PC onto stack, then calls function at the absolute address
CALLX  S  ' indirect call (eg calling a function pointer), pushes PC onto stack and calls function at address in register S
JMPX  #S  ' absolute jump (no actual stack usage)
JMPX   S  ' indirect jump (no actual stack usage)

With this change we will now have a way to push/pop/call/return from an internal COG stack and do it all within the standard 4 clock cycle instruction timing. Though also beneficial for generalized COG PASM, I see this being extremely useful when running high level language code (eg. C) within an expanded COG memory space. With that combination we can hopefully get decent sized C apps running at 20MIPS+ when we can keep both the code and stack in COG RAM and minimize access to the hub.

This latest mod uses the unused "ONES" instruction opcode as well as the NR/WC/IM flags to expand into the different combinations of instructions required. Note that I have not yet used WZ modifer flag, that still functions to optionally overwrite the Z flag and it will be set to 1 if the stack pointer reaches 0 after the instruction executed, and cleared otherwise. This WZ feature could possibly be useful for detecting the stack is about to underflow below 0, without explicitly checking SP each time.

The stack grows downwards and the internal COG stack pointer has been mapped to COG RAM location $1ef. Any writes to $1ef adjust the SP and it can be read back from there. It currently uses the bottom 9 bits of the 32 bit register and ignores the rest of the register bits (though they will still be read back).

The optional #S value used in the RETX instruction can actually also be non-zero and this allows the stack to be increased by a constant amount after returning the PC. This could come in handy for cleaning up the stack frame if the caller pushes arguments. Eg. if a caller pushes 3 arguments on the stack then calls a function, the function knows how many args it gets passed and can then automatically adjust SP by 3 at return time by executing RETX #3, saving an extra instruction to do this work in the caller. If you don't need this feature just ignore it and leave it as zero.

The raw instruction encodings I am using right now (subject to change) are these:

' iiiiii_zcri_cccc_ddddddddd_sssssssss

PUSH   S        ' 000111_x100_xxxx_---------_sssssssss
PUSH  #S        ' 000111_x101_xxxx_---------_sssssssss
POP    D        ' 000111_x110_xxxx_ddddddddd_---------  (this may potentially be changed over to use 000111_x110_xxxx_---------_sssssssss, see below)
RETX  #S        ' 000111_x111_xxxx_---------_sssssssss

JMPX   S        ' 000111_x000_xxxx_---------_sssssssss
JMPX  #S        ' 000111_x001_xxxx_---------_sssssssss
CALLX  S        ' 000111_x010_xxxx_---------_sssssssss
CALLX #S        ' 000111_x011_xxxx_---------_sssssssss  Note: ------- are reserved

As you can see this is very easily extended to allow for 18 bit sized register pointers and constants for all variants of the instructions using the free bits remaining. I see this work ultimately dovetailing very nicely into Cluso's >2kB PASM barrier project (allowing up to a ridiculous 1MB of COG RAM). I want to try to look at that part soon as well. For things to all fit consistently, I am thinking my POP D may actually be changed to POP S soon (with S getting used as a destination register), but I'm not 100% sure on that part yet.

It would be nice to keep the other opcode (000110) free for Cluso's AUGDS and perhaps other functions. I have a few other ideas there brewing for some expanded/indirect addresssing modes which might come in handy for reading/writing data from/to the expanded COG memory. This stuff would be important for C code too unless its data segment is restricted to the first ~480 locations or so, which would be too limiting in many cases.

Also I have kept the original JMP and JMPRET, CALL and RET totally untouched right now. In theory the original JMP S would do the same thing as JMPX S, but I like keeping them separate so you know the original one jumps within the lower 2kB (for compatibility) and the extended jump and call includes new behavior and can jump wherever it wants over the full COGRAM space (which could be more than 2kB). Alternatively we could use WC in the original JMP to indicate the new 18bit behavior or save that for something else altogether.

cogstack.zip

Update: Version 2 with minor fix. This update now allows the POP 0x1ef operation to function as you would expect when popping SP off the stack.
cogstackv2.zip

Willy Ekerslyke · 2014-09-15 12:57

Very clever and very useful!

Cluso99 · 2014-09-15 18:23

Congrats - looking good!

rogloh · 2014-09-15 18:24

Cool thx, yep I hope it is useful. I want to ultimately be able to use it for getting mid sized C code app objects (say ~16kB-32kB) running fast from COG RAM.

Dreamt up and found one corner case issue, if you pop off the top of stack data back into the SP register itself (0x1ef) it will overwrite the just popped value with the incremented value of the previous stack pointer value which isn't what you would want.

I believe I can fix this discrepancy by ignoring the second write step when I detect this special case. I'll try that out today and update the zip when I can.

Roger.

rogloh · 2014-09-15 18:26

Cluso99 wrote: »

Congrats - looking good!

With this and your AUGDS we will be looking good for fast/larger COG code.

I am thinking next about putting in two more instructions for COG memory pointer accesses. These will be called LOAD D,S and STORE D,S/#, and behave as D=*S, *D=S/# respectively, allowing single cycle indirect access to data in high memory without necessarily preceding with AUGDS.

This would be very useful for C using pointers.

Cluso99 · 2014-09-15 18:45

rogloh,
I have been assuming a cog only has a max of 8KB. Nothing particular about this size.

I have been thinking about my AUGxx and the required JMPRET. I can compile each block as a page of 2KB with PropTool.
My thoughts have been different to yours.

By using the standard JMPRET and the WC (which is generally unused and useless), any JMPRET would be within the same 2KB page unless preceeded by AUGxx.
When the WC modifier is used, the JMPRET behaviour would be to use Page 0 as the source and destination, allowing quick returns to Page 0.
So, pretty much, only the JMPRET to other pages would require the AUGxx instruction(s).
However, this would not be an easy solution for hubexec, which will require something more like what you have done.

FWIW I had some problems with my P1V intermittently failing in unusual ways. Thanks to erco, I bought a cheap USB V/A meter. It showed that my special debug config was using ~250mA and my laptop was having problems. Solved the problem using a Dual USB plug that came with one of my external USB HDD.

rogloh · 2014-09-15 19:49

Hi Cluso. So let me try to understand what you want...I'm still not 100% sure after reading your last post.

Case 1)
JMPRET D,S/# WC - jumps to S/# within same 2kB, return address stored where, same page or low page?

Case 2)
JMPRET D,S/# - normal behavior as we have today - all low page only for both D, S

Case 3)
AUGDS #DH, #SH
JMPRET DL, #SL WC - jumps to given address SH:SL in arbitrary page #SH and return address is stored in #DH:#DL ?

We should try to come up with something that fits together well. I currently see a need for these sorts of things...

AUGDS #,# to allow any source/dest of the next instruction to be in high memory. Allows DJNZ and JMP/JMPRET to reach high COG memory.
AUGDS D,S, read D, S as input args for addresses to be used in next instruction - allows indirect addressing/augmentation
LOAD D,S - behaves as D=*S in a single instruction cycle, D is in low memory unless preceded by AUGDS perhaps
STORE D,S/# - behaves as *D=S/# in a single instruction cycle, S is from low memory unless preceded by AUGDS perhaps
AUGS #S - perhaps useful for loading 32 bit constants, but you could just store the 32 bit constant somewhere instead of the extra AUGS couldn't you?

Also if we can fit all of this into the spare opcode (000110), that could work out well for everyone.

jmg · 2014-09-15 19:51

rogloh wrote: »

Dreamt up and found one corner case issue, if you pop off the top of stack data back into the SP register itself (0x1ef) it will overwrite the just popped value with the incremented value of the previous stack pointer value which isn't what you would want.

Some might consider that a stack overflow case ?

rogloh · 2014-09-15 19:57

LOL. I know its weird. Would anyone normally push/pop the SP onto the stack? Maybe to change contexts I guess but not too often else really.

Actually you could always use the pop SP to increment the stack to an earlier value and quickly jump over some pushed arguments like when restoring a stack frame. However my RETX # allows that type of behavior.

rogloh · 2014-09-15 21:51

POP SP fix added to first post.

Cluso99 · 2014-09-15 22:37

rogloh wrote: »

Hi Cluso. So let me try to understand what you want...I'm still not 100% sure after reading your last post.

Case 1)
JMPRET D,S/# WC - jumps to S/# within same 2kB, return address stored where, same page or low page?

Case 2)
JMPRET D,S/# - normal behavior as we have today - all low page only for both D, S

Case 3)
AUGDS #DH, #SH
JMPRET DL, #SL WC - jumps to given address SH:SL in arbitrary page #SH and return address is stored in #DH:#DL ?

We should try to come up with something that fits together well. I currently see a need for these sorts of things...
AUGDS #,# to allow any source/dest of the next instruction to be in high memory. Allows DJNZ and JMP/JMPRET to reach high COG memory.

AUGDS D,S, read D, S as input args for addresses to be used in next instruction - allows indirect addressing/augmentation

LOAD D,S - behaves as D=*S in a single instruction cycle, D is in low memory unless preceded by AUGDS perhaps

STORE D,S/# - behaves as *D=S/# in a single instruction cycle, S is from low memory unless preceded by AUGDS perhaps

AUGS #S - perhaps useful for loading 32 bit constants, but you could just store the 32 bit constant somewhere instead of the extra AUGS couldn't you?

Also if we can fit all of this into the spare opcode (000110), that could work out well for everyone.

I will try to explain with examples. Consider a page to be 2KB (that suits PropTool nicely). Page 0 is the normal 2KB in cogs where $000..$1EF is cog ram and $1F0..$1FF are the special registers and shadow ram. Page 1 is cog address $200..$3FF, etc.

A normal program just uses JMPRET (JMP/CALL/RET) as normal.

A Page 0 program can JMPRET to Pages 1/2/3... by preceeding the JMPRET with an AUGDS #D,#S.

  AUGDS   0,0   'AUGDS is always immediate D & S values and is the Page# for both the goto address S, and the return address D (if used)
  CALL    <page-n-label>

The <page-n-label> can be in any page, and the return label <page-n-label-ret> may also be in any page. Typically, they would be in the same page, or the return address would be in page 0.
The JMPRET only saves the 9-bit return address, not the page#, at the designated return address (AUGDS D value for the page).

Now code is running in PageN. Any JMPRET (JMP/CALL/RET) will be within the current page unless the WC modifier is used, where the D & S addresses will be in Page 0 due to WC modifier, or the JMPRET instruction is preceeded by an AUGDS #D,#S instruction, in which case the page bits will be formed from the AUGDS instruction.

  JMP     <page-0-label> WC    'WC = always use Page 0 for D & S addresses

What this means is that for most instructions, the normal JMPRET (JMP/CALL/RET) will be used, and the code will be within a page. When the code is resorting back to Page 0, the WC modifier will be appended to the JMPRET instruction. Only when a JMPRET instruction is outside the current page or page 0, will an extra prefixed AUGDS instruction be required.

The existing PropTool can be used to compile these pages of code.

This method avoids the issue of making the JMPRET instruction relative, avoides the requirement for a new JMPRET, and will work with the existing compiler. A little care is required for making code into page blocks.

This method does not preclude using an advanced new JMPRET instruction(s).

We should try to come up with something that fits together well. I currently see a need for these sorts of things...
AUGDS #,# to allow any source/dest of the next instruction to be in high memory. Allows DJNZ and JMP/JMPRET to reach high COG memory.

AUGDS D,S, read D, S as input args for addresses to be used in next instruction - allows indirect addressing/augmentation

LOAD D,S - behaves as D=*S in a single instruction cycle, D is in low memory unless preceded by AUGDS perhaps

STORE D,S/# - behaves as *D=S/# in a single instruction cycle, S is from low memory unless preceded by AUGDS perhaps

AUGS #S - perhaps useful for loading 32 bit constants, but you could just store the 32 bit constant somewhere instead of the extra AUGS couldn't you?

AUGDS #D,#S does indeed allow the next instruction to reach any and all pages, including page 0 - just make D=0 or S=0 for page 0.
I am testing using the MOV and ADD instructions, preceeded by AUGDS.

AUGDS D,S as input args for the next instruction.
Not quite sure what you mean here??? Need an example please. Probably doable.

LOAD D,S and STORE D,S/#
Again, not sure what you mean here. I am moving code from Page 0 to Page 1+ and back using the AUGDS & MOV instruction pair This does not require a special instruction to do this.

AUGS #S is useful for storing a 32-bit constant.
Yes, it could be stored elsewhere, but when used with hubexec, the constant needs to be embedded between the 2 instructions AUGS and the following instruction. This is why Chip introduced the AUGD and AUGS instructions in the P2.

Bill Henning · 2014-09-17 09:13

NICE addition!

rogloh wrote: »
Today for my second Verilog project I took a pipe wrench to the pipeline (slightly ) and worked in some fresh push/pop functionality to the P1V, this time using the internal COG RAM for holding a stack and making use of a new instruction opcode. I've been testing each operation out and it appears to be working well so far for me. Anyone wanting to play around with it just needs to unzip and copy the two verilog files in the attached zipfile to a clean P1V build folder and give it a try. The changes were isolated to this feature only for my testing and are exclusive of my other hub based push/pop changes.

The new instructions provide the following functionality:
PUSH   S  ' push 32 bit register onto COGRAM stack
PUSH  #S  ' push constant/pointer onto COG stack
POP    D  ' pop 32 bits from COG stack into register
RETX  #S  ' pop COG PC from the COG stack and return to that location, optionally add amount #S to SP following that process.
CALLX #S  ' absolute call, pushes PC onto stack, then calls function at the absolute address
CALLX  S  ' indirect call (eg calling a function pointer), pushes PC onto stack and calls function at address in register S
JMPX  #S  ' absolute jump (no actual stack usage)
JMPX   S  ' indirect jump (no actual stack usage)

jac_goudsmit · 2014-09-17 12:09

Added to the Github TODO list!

===Jac

New P1V capabilities added - a COGRAM stack

Comments