New P1V capabilities added - a COGRAM stack
rogloh
Posts: 5,809
Today for my second Verilog project I took a pipe wrench to the pipeline (slightly ) and worked in some fresh push/pop functionality to the P1V, this time using the internal COG RAM for holding a stack and making use of a new instruction opcode. I've been testing each operation out and it appears to be working well so far for me. Anyone wanting to play around with it just needs to unzip and copy the two verilog files in the attached zipfile to a clean P1V build folder and give it a try. The changes were isolated to this feature only for my testing and are exclusive of my other hub based push/pop changes.
The new instructions provide the following functionality:
With this change we will now have a way to push/pop/call/return from an internal COG stack and do it all within the standard 4 clock cycle instruction timing. Though also beneficial for generalized COG PASM, I see this being extremely useful when running high level language code (eg. C) within an expanded COG memory space. With that combination we can hopefully get decent sized C apps running at 20MIPS+ when we can keep both the code and stack in COG RAM and minimize access to the hub.
This latest mod uses the unused "ONES" instruction opcode as well as the NR/WC/IM flags to expand into the different combinations of instructions required. Note that I have not yet used WZ modifer flag, that still functions to optionally overwrite the Z flag and it will be set to 1 if the stack pointer reaches 0 after the instruction executed, and cleared otherwise. This WZ feature could possibly be useful for detecting the stack is about to underflow below 0, without explicitly checking SP each time.
The stack grows downwards and the internal COG stack pointer has been mapped to COG RAM location $1ef. Any writes to $1ef adjust the SP and it can be read back from there. It currently uses the bottom 9 bits of the 32 bit register and ignores the rest of the register bits (though they will still be read back).
The optional #S value used in the RETX instruction can actually also be non-zero and this allows the stack to be increased by a constant amount after returning the PC. This could come in handy for cleaning up the stack frame if the caller pushes arguments. Eg. if a caller pushes 3 arguments on the stack then calls a function, the function knows how many args it gets passed and can then automatically adjust SP by 3 at return time by executing RETX #3, saving an extra instruction to do this work in the caller. If you don't need this feature just ignore it and leave it as zero.
The raw instruction encodings I am using right now (subject to change) are these:
As you can see this is very easily extended to allow for 18 bit sized register pointers and constants for all variants of the instructions using the free bits remaining. I see this work ultimately dovetailing very nicely into Cluso's >2kB PASM barrier project (allowing up to a ridiculous 1MB of COG RAM). I want to try to look at that part soon as well. For things to all fit consistently, I am thinking my POP D may actually be changed to POP S soon (with S getting used as a destination register), but I'm not 100% sure on that part yet.
It would be nice to keep the other opcode (000110) free for Cluso's AUGDS and perhaps other functions. I have a few other ideas there brewing for some expanded/indirect addresssing modes which might come in handy for reading/writing data from/to the expanded COG memory. This stuff would be important for C code too unless its data segment is restricted to the first ~480 locations or so, which would be too limiting in many cases.
Also I have kept the original JMP and JMPRET, CALL and RET totally untouched right now. In theory the original JMP S would do the same thing as JMPX S, but I like keeping them separate so you know the original one jumps within the lower 2kB (for compatibility) and the extended jump and call includes new behavior and can jump wherever it wants over the full COGRAM space (which could be more than 2kB). Alternatively we could use WC in the original JMP to indicate the new 18bit behavior or save that for something else altogether.
cogstack.zip
Update: Version 2 with minor fix. This update now allows the POP 0x1ef operation to function as you would expect when popping SP off the stack.
cogstackv2.zip
The new instructions provide the following functionality:
PUSH S ' push 32 bit register onto COGRAM stack PUSH #S ' push constant/pointer onto COG stack POP D ' pop 32 bits from COG stack into register RETX #S ' pop COG PC from the COG stack and return to that location, optionally add amount #S to SP following that process. CALLX #S ' absolute call, pushes PC onto stack, then calls function at the absolute address CALLX S ' indirect call (eg calling a function pointer), pushes PC onto stack and calls function at address in register S JMPX #S ' absolute jump (no actual stack usage) JMPX S ' indirect jump (no actual stack usage)
With this change we will now have a way to push/pop/call/return from an internal COG stack and do it all within the standard 4 clock cycle instruction timing. Though also beneficial for generalized COG PASM, I see this being extremely useful when running high level language code (eg. C) within an expanded COG memory space. With that combination we can hopefully get decent sized C apps running at 20MIPS+ when we can keep both the code and stack in COG RAM and minimize access to the hub.
This latest mod uses the unused "ONES" instruction opcode as well as the NR/WC/IM flags to expand into the different combinations of instructions required. Note that I have not yet used WZ modifer flag, that still functions to optionally overwrite the Z flag and it will be set to 1 if the stack pointer reaches 0 after the instruction executed, and cleared otherwise. This WZ feature could possibly be useful for detecting the stack is about to underflow below 0, without explicitly checking SP each time.
The stack grows downwards and the internal COG stack pointer has been mapped to COG RAM location $1ef. Any writes to $1ef adjust the SP and it can be read back from there. It currently uses the bottom 9 bits of the 32 bit register and ignores the rest of the register bits (though they will still be read back).
The optional #S value used in the RETX instruction can actually also be non-zero and this allows the stack to be increased by a constant amount after returning the PC. This could come in handy for cleaning up the stack frame if the caller pushes arguments. Eg. if a caller pushes 3 arguments on the stack then calls a function, the function knows how many args it gets passed and can then automatically adjust SP by 3 at return time by executing RETX #3, saving an extra instruction to do this work in the caller. If you don't need this feature just ignore it and leave it as zero.
The raw instruction encodings I am using right now (subject to change) are these:
' iiiiii_zcri_cccc_ddddddddd_sssssssss PUSH S ' 000111_x100_xxxx_---------_sssssssss PUSH #S ' 000111_x101_xxxx_---------_sssssssss POP D ' 000111_x110_xxxx_ddddddddd_--------- (this may potentially be changed over to use 000111_x110_xxxx_---------_sssssssss, see below) RETX #S ' 000111_x111_xxxx_---------_sssssssss JMPX S ' 000111_x000_xxxx_---------_sssssssss JMPX #S ' 000111_x001_xxxx_---------_sssssssss CALLX S ' 000111_x010_xxxx_---------_sssssssss CALLX #S ' 000111_x011_xxxx_---------_sssssssss Note: ------- are reserved
As you can see this is very easily extended to allow for 18 bit sized register pointers and constants for all variants of the instructions using the free bits remaining. I see this work ultimately dovetailing very nicely into Cluso's >2kB PASM barrier project (allowing up to a ridiculous 1MB of COG RAM). I want to try to look at that part soon as well. For things to all fit consistently, I am thinking my POP D may actually be changed to POP S soon (with S getting used as a destination register), but I'm not 100% sure on that part yet.
It would be nice to keep the other opcode (000110) free for Cluso's AUGDS and perhaps other functions. I have a few other ideas there brewing for some expanded/indirect addresssing modes which might come in handy for reading/writing data from/to the expanded COG memory. This stuff would be important for C code too unless its data segment is restricted to the first ~480 locations or so, which would be too limiting in many cases.
Also I have kept the original JMP and JMPRET, CALL and RET totally untouched right now. In theory the original JMP S would do the same thing as JMPX S, but I like keeping them separate so you know the original one jumps within the lower 2kB (for compatibility) and the extended jump and call includes new behavior and can jump wherever it wants over the full COGRAM space (which could be more than 2kB). Alternatively we could use WC in the original JMP to indicate the new 18bit behavior or save that for something else altogether.
cogstack.zip
Update: Version 2 with minor fix. This update now allows the POP 0x1ef operation to function as you would expect when popping SP off the stack.
cogstackv2.zip
zip
6K
Comments
Dreamt up and found one corner case issue, if you pop off the top of stack data back into the SP register itself (0x1ef) it will overwrite the just popped value with the incremented value of the previous stack pointer value which isn't what you would want.
I believe I can fix this discrepancy by ignoring the second write step when I detect this special case. I'll try that out today and update the zip when I can.
Roger.
With this and your AUGDS we will be looking good for fast/larger COG code.
I am thinking next about putting in two more instructions for COG memory pointer accesses. These will be called LOAD D,S and STORE D,S/#, and behave as D=*S, *D=S/# respectively, allowing single cycle indirect access to data in high memory without necessarily preceding with AUGDS.
This would be very useful for C using pointers.
I have been assuming a cog only has a max of 8KB. Nothing particular about this size.
I have been thinking about my AUGxx and the required JMPRET. I can compile each block as a page of 2KB with PropTool.
My thoughts have been different to yours.
By using the standard JMPRET and the WC (which is generally unused and useless), any JMPRET would be within the same 2KB page unless preceeded by AUGxx.
When the WC modifier is used, the JMPRET behaviour would be to use Page 0 as the source and destination, allowing quick returns to Page 0.
So, pretty much, only the JMPRET to other pages would require the AUGxx instruction(s).
However, this would not be an easy solution for hubexec, which will require something more like what you have done.
FWIW I had some problems with my P1V intermittently failing in unusual ways. Thanks to erco, I bought a cheap USB V/A meter. It showed that my special debug config was using ~250mA and my laptop was having problems. Solved the problem using a Dual USB plug that came with one of my external USB HDD.
Case 1)
JMPRET D,S/# WC - jumps to S/# within same 2kB, return address stored where, same page or low page?
Case 2)
JMPRET D,S/# - normal behavior as we have today - all low page only for both D, S
Case 3)
AUGDS #DH, #SH
JMPRET DL, #SL WC - jumps to given address SH:SL in arbitrary page #SH and return address is stored in #DH:#DL ?
We should try to come up with something that fits together well. I currently see a need for these sorts of things...
Also if we can fit all of this into the spare opcode (000110), that could work out well for everyone.
Some might consider that a stack overflow case ?
Actually you could always use the pop SP to increment the stack to an earlier value and quickly jump over some pushed arguments like when restoring a stack frame. However my RETX # allows that type of behavior.
A normal program just uses JMPRET (JMP/CALL/RET) as normal.
A Page 0 program can JMPRET to Pages 1/2/3... by preceeding the JMPRET with an AUGDS #D,#S. The <page-n-label> can be in any page, and the return label <page-n-label-ret> may also be in any page. Typically, they would be in the same page, or the return address would be in page 0.
The JMPRET only saves the 9-bit return address, not the page#, at the designated return address (AUGDS D value for the page).
Now code is running in PageN. Any JMPRET (JMP/CALL/RET) will be within the current page unless the WC modifier is used, where the D & S addresses will be in Page 0 due to WC modifier, or the JMPRET instruction is preceeded by an AUGDS #D,#S instruction, in which case the page bits will be formed from the AUGDS instruction.
What this means is that for most instructions, the normal JMPRET (JMP/CALL/RET) will be used, and the code will be within a page. When the code is resorting back to Page 0, the WC modifier will be appended to the JMPRET instruction. Only when a JMPRET instruction is outside the current page or page 0, will an extra prefixed AUGDS instruction be required.
The existing PropTool can be used to compile these pages of code.
This method avoids the issue of making the JMPRET instruction relative, avoides the requirement for a new JMPRET, and will work with the existing compiler. A little care is required for making code into page blocks.
This method does not preclude using an advanced new JMPRET instruction(s).
AUGDS #D,#S does indeed allow the next instruction to reach any and all pages, including page 0 - just make D=0 or S=0 for page 0.
I am testing using the MOV and ADD instructions, preceeded by AUGDS.
AUGDS D,S as input args for the next instruction.
Not quite sure what you mean here??? Need an example please. Probably doable.
LOAD D,S and STORE D,S/#
Again, not sure what you mean here. I am moving code from Page 0 to Page 1+ and back using the AUGDS & MOV instruction pair This does not require a special instruction to do this.
AUGS #S is useful for storing a 32-bit constant.
Yes, it could be stored elsewhere, but when used with hubexec, the constant needs to be embedded between the 2 instructions AUGS and the following instruction. This is why Chip introduced the AUGD and AUGS instructions in the P2.
===Jac