I think I agree, but SKIP is going to have to become SKP to make:
SKP
JMPSKP
CALLSKP
...Wait. There's a reason I didn't do that, in the first place. These instructions are not regular jumps, since they are confined to cog memory. Also, they do not use separate jump addresses and skip patterns, as one would expect. They are a special case of instruction that exists to exploit two things: fast branching and fast skipping, only possible inside cog memory.
I moved the RET/RETA/RETB instructions to overlap with CALL/CALLA/CALLB, since they could be differentiated by opposite immediate bits. This freed up three {#}D instruction slots, into which I placed SKIP and two other new ones that really take care of business:
Ah, more instruction encoding changes. So much for a frozen design. :-)
It's a good thing that I wrote a program to parse your instruction spreadsheet to generate the tables to drive the assembler.
I think I agree, but SKIP is going to have to become SKP to make:
SKP
JMPSKP
CALLSKP
...Wait. There's a reason I didn't do that, in the first place. These instructions are not regular jumps, since they are confined to cog memory. Also, they do not use separate jump addresses and skip patterns, as one would expect. They are a special case of instruction that exists to exploit two things: fast branching and fast skipping, only possible inside cog memory.
ROBOJMP
ROBOSUB
I don't have a problem with JMPSKP and CALLSKP and being limited to cog/lut addresses. They are quite specific (not generic) instructions.
For example, IIRC the original Tachyon interpreter executes bytecodes very quickly because each bytecode is basically the address of the routine implementing it. The outer loop for such an interpreter is extremely small and fast, and skip is unlikely to be of much help.
Yes, IIRC Peter changed to a 16b-code in V4 to get more speed, and it would be interesting to see what his inner loops code like on the newest P2 .
It is a good idea to test this on more than one 'byte-code' engine.
eg I see in the Lua thread, that has a 6b instruction choice.
@jmg, you have asked ...
@Peter_Jakacki might not have seen your question.
So I take the liberty to present you the latest Tachyon-P2 kernel snipped I found on Peter's DROPBOX.
It is a wordcode interpreter, uses the first 256 addresses to directly address COG memory.
Actually this code seems a little 'old' - 17 days ;-). The code for Prop1 below for Tachyon V4.2 shows some enhancements.
DAT
'*************** TACHYON COG KERNEL for Propeller 2 *******************
org 0
RESET
call #INITCOG ' run non-time critical init from hubexec
jmp #doNEXT
'
' main Forth wordcode interpreter
'
exec
cmp X,wordcodes wc
if_nc jmp #ENTER
call X ' could call cog or hub code - use ret to return
doNEXT rdword X,PTRA++ ' read word code instruction address,
' PTRA used as instruction pointer
cmp ops,X wc
if_nc jmp #exec
call #dolit8 ' handle 8 bit literal inside wordcode, saves a rdbyte
jmp #doNEXT
ops long $FF00-1
wordcodes long endcode
' Save IP and load with new IP from call
ENTER test X,#1 wz
andn X,#1
if_z wrlut PTRA,retptr ' save IP onto return stack
if_z add retptr,#1
mov PTRA,X ' jump to new wordcode
jmp #doNEXT
Prop1 V4.2 code for inner loop
{ *** RUNTIME BYTECODE INTERPRETER for Propeller 1 *** }
' * * * *
' Fetch the next byte code instruction in hub RAM pointed to by the instruction pointer IP
' This is the very heart of the runtime interpreter
'
doNEXT rdword R0,IP ' read word code instruction
add IP,#2 wc ' advance IP to next wordcode (clears the carry too!)
shr R0,#9 nr,wz ' cog or hub?
if_z jmp R0 ' execute the code by directly indexing the first 512 longs in cog
' literal?
mov X,R0
shr X,#15 nr,wz ' embedded 15-bit literal?
if_nz and X,mask15
if_nz jmp #PUSHX ' push this literal without having to do a call/return
' conditional jump?
cmp jumpop,X wc ' test for jump + disp
if_c jmp #doJUMP
' register op?
cmp regop,X wc ' test for REG + disp
if_c jmp #doREG
' must be a call/jump
test X,#1 wz ' skip pushing the IP if bit 0 is set
if_nz andn X,#1 ' and fixup
if_z call #SAVEIP ' otherwise this is an address of a high-level word code
mov IP,X ' so after saving the IP, load it with new address
jmp #doNEXT
' Conditional - Jumps are encoded with an 8-bit signed word displacement
doJUMP ' unconditional jump ? c=1
tjnz tos,#DROP ' continue without jumping - discard flag
test X,#$80 wz ' reverse jump? nz
and X,#$7F ' mask displacement
shl X,#1 ' index words
sumnz IP,X ' +/- jump
jmp #DROP ' discard flag if conditional
' any opcodes from here on are jumps with encoded +/-128 displacement
jumpop long $7F00-1
doREG '( -- addr )
and X,#$FF
add X,regptr
jmp #PUSHX
' address 256 bytes of task registers per cogid
regop long rg-1
I like JMPSKP & CALLSKP much more than SHRED & SHREDR, and am fine with them being cog/lut only while the other JMP/CALL instructions are not.
Also, SKIP can stay SKIP, it doesn't have to shorten to SKP.
Some day I'll convince you that the instructions can be longer than 7 characters. Fitting in a single 8 char tab stop space is not important. It's a detriment.
Hub jump could be added with probably little effort with something like.
SETQ #base
CALLSKP reg
Where the resulting branchaddress would be to (base << 10) + reg[9:0]. If base is zero, then it would act as it does now (cog/lut exec). This would give you a skip table of 256 entries (because of the byte addressing).
Hub jump could be added with probably little effort with something like.
SETQ #base
CALLSKP reg
Where the resulting branchaddress would be to (base << 10) + reg[9:0]. If base is zero, then it would act as it does now (cog/lut exec). This would give you a skip table of 256 entries (because of the byte addressing).
The question is: do we need it?
If hub jump/call was to be supported (I'm quite happy without it) then it would be better to be consistent and use the AUG + JMPSKP/CALLSKP sequence.
I like JMPSKP & CALLSKP much more than SHRED & SHREDR, and am fine with them being cog/lut only while the other JMP/CALL instructions are not.
Also, SKIP can stay SKIP, it doesn't have to shorten to SKP.
Some day I'll convince you that the instructions can be longer than 7 characters. Fitting in a single 8 char tab stop space is not important. It's a detriment.
I would use JUMPSKIP, CALLSKIP, etc.
+1
Perhaps it's more a restriction in pnut that's clouding Chip's view?
Anyway this topic can wait until the silicon is underway.
Whoa! This would blow up in cog-exec mode, because the REP logic counts instructions. If SKIP is causing fewer or variable numbers of instructions to execute, it's not going to work, unless you planned very carefully with perhaps constant instruction counts in every SKIP case, and set the REP block size accordingly.
All REP does is get you out of needing a 4-clock branch instruction. It's only useful/beneficial for blocks of only several instructions, I think. There's probably no need for SKIP inside REP. But, you may find some value in it. Seems like a recipe for madness, to me.
That means code that works in HUBexec, will break if moved into COG ? (and vice versa)
That rather breaks one of the fundamental, hard won P2 gains, where binary code did not care where it ran, it just ran a bit slower in HUB. Nothing broke.
Ah, more instruction encoding changes. So much for a frozen design. :-)
It's a good thing that I wrote a program to parse your instruction spreadsheet to generate the tables to drive the assembler.
Some day I'll convince you that the instructions can be longer than 7 characters. Fitting in a single 8 char tab stop space is not important. It's a detriment.
I would use JUMPSKIP, CALLSKIP, etc.
Yup, the days of needing to be 8 modulus paranoid, are well behind us.
Better to design for the human eyeball, not some archaic TAB setting.
So I take the liberty to present you the latest Tachyon-P2 kernel snipped I found on Peter's DROPBOX.
It is a wordcode interpreter, uses the first 256 addresses to directly address COG memory.
Actually this code seems a little 'old' - 17 days ;-).
DAT
'*************** TACHYON COG KERNEL for Propeller 2 *******************
org 0
RESET
call #INITCOG ' run non-time critical init from hubexec
jmp #doNEXT
'
' main Forth wordcode interpreter
exec
cmp X,wordcodes wc
if_nc jmp #ENTER
call X ' could call cog or hub code - use ret to return
doNEXT rdword X,PTRA++ ' read word code instruction address,
' PTRA used as instruction pointer
cmp ops,X wc
if_nc jmp #exec
call #dolit8 ' handle 8 bit literal inside wordcode, saves a rdbyte
jmp #doNEXT
ops long $FF00-1
wordcodes long endcode
' Save IP and load with new IP from call
ENTER test X,#1 wz
andn X,#1
if_z wrlut PTRA,retptr ' save IP onto return stack
if_z add retptr,#1
mov PTRA,X ' jump to new wordcode
jmp #doNEXT
Thanks, so how much can that improve, with Chip's new SKIP variants ?
So I take the liberty to present you the latest Tachyon-P2 kernel snipped I found on Peter's DROPBOX.
It is a wordcode interpreter, uses the first 256 addresses to directly address COG memory.
Actually this code seems a little 'old' - 17 days ;-).
DAT
'*************** TACHYON COG KERNEL for Propeller 2 *******************
org 0
RESET
call #INITCOG ' run non-time critical init from hubexec
jmp #doNEXT
'
' main Forth wordcode interpreter
exec
cmp X,wordcodes wc
if_nc jmp #ENTER
call X ' could call cog or hub code - use ret to return
doNEXT rdword X,PTRA++ ' read word code instruction address,
' PTRA used as instruction pointer
cmp ops,X wc
if_nc jmp #exec
call #dolit8 ' handle 8 bit literal inside wordcode, saves a rdbyte
jmp #doNEXT
ops long $FF00-1
wordcodes long endcode
' Save IP and load with new IP from call
ENTER test X,#1 wz
andn X,#1
if_z wrlut PTRA,retptr ' save IP onto return stack
if_z add retptr,#1
mov PTRA,X ' jump to new wordcode
jmp #doNEXT
Thanks, so how much can that improve, with Chip's new SKIP variants ?
ok then - let's see how the amateur (me) and the P2EG (Prop2 experts group) can improve Peter's Tachyon Forth word code engine for Prop2 before Peter wakes up and joins in ;-)
As Mike write this requires combining some words COG code for using SKIP.
Let's take this snippet:
' C@ ( caddr -- byte ) Fetch a byte from hub memory
CFETCH rdbyte tos,tos
jmp unext
' W@ ( waddr -- word ) Fetch a word from hub memory
WFETCH rdword tos,tos
jmp unext
' @ ( addr -- long ) Fetch a long from hub memory
FETCH rdlong tos,tos
jmp unext
could be rewritten to:
' C@ ( caddr -- byte ) Fetch a byte from hub memory | C@ | W@ | @ | JMP
CFETCH rdbyte tos,tos ' 0 1 1 0
' W@ ( waddr -- word ) Fetch a word from hub memory
WFETCH rdword tos,tos ' 1 0 1 0
' @ ( addr -- long ) Fetch a long from hub memory
FETCH rdlong tos,tos ' 1 0 1 0
jmp unext ' 1 1 0 0
but we do use the lower 9 bits of the wordcode directly to address COG and there is currently no redirection via LUT decoding. Bit 15 is used to flag 15 bit immediate literals. other bits are used to code relative jumps and register access.
So no space/bits free in the wordcode to store a SKIP map.
Also most kernel words are only 2 or 3 PASM instructions anyhow, so not really much to save.
I just spent quite some time reading V4.2 kernel code again ... I do NOT see right now where SKIP would help a lot with the current architecture. :-(
@cgracey, what happens if there's an active skip mask inside of a rep block? Will the rep still work correctly?
When REP is running in hub-exec, it actually does a JMP to get back to the top of the block, which entails resetting and reloading the FIFO. In cog-exec, the PC just gets changed in order to loop. So, I think SKIP would work in either case, but it may not work the same, as there'd be an extra SKIP bit consumed on each block iteration in hub-exec mode. In any case, SKIP would behave the same in both modes if it was just used within the code block, and not when it looped.
Whoa! This would blow up in cog-exec mode, because the REP logic counts instructions. If SKIP is causing fewer or variable numbers of instructions to execute, it's not going to work, unless you planned very carefully with perhaps constant instruction counts in every SKIP case, and set the REP block size accordingly.
All REP does is get you out of needing a 4-clock branch instruction. It's only useful/beneficial for blocks of only several instructions, I think. There's probably no need for SKIP inside REP. But, you may find some value in it. Seems like a recipe for madness, to me.
Obviously skip works out a count to know how many instructions to skip over. Could that value be used to decrement the REP block size?
If yes, easy/complex to do? If its complex, then its a caveat for skip instructions.
Obviously skip works out a count to know how many instructions to skip over. Could that value be used to decrement the REP block size?
If yes, easy/complex to do? If its complex, then its a caveat for skip instructions.
It's way too complicated. It's like mixing polar and Cartesian domains. I imagine brushing my teeth while riding a dirt bike.
Chip,
Why are we "hidebound to keeping it under 7 characters" ???? I think only you are thinking that is a requirement, and it's just NOT.
I know. I'm just used to an 8-spaces-per-tab world.
Upon someone pointing out that we could use longer names, I reviewed all of our mnemonics and I think they are sufficiently expressive at under eight characters, each.
The only problem is this new instruction that CALLs and SKIPs, but with a single operand.
Comments
+1. So much more clear.
That's my favourite type of shredding! :cool:
I think I agree, but SKIP is going to have to become SKP to make:
SKP
JMPSKP
CALLSKP
...Wait. There's a reason I didn't do that, in the first place. These instructions are not regular jumps, since they are confined to cog memory. Also, they do not use separate jump addresses and skip patterns, as one would expect. They are a special case of instruction that exists to exploit two things: fast branching and fast skipping, only possible inside cog memory.
ROBOJMP
ROBOSUB
CALLPAT
It's a good thing that I wrote a program to parse your instruction spreadsheet to generate the tables to drive the assembler.
That sounds like "Call Skype"
@jmg, you have asked ...
@Peter_Jakacki might not have seen your question.
So I take the liberty to present you the latest Tachyon-P2 kernel snipped I found on Peter's DROPBOX.
It is a wordcode interpreter, uses the first 256 addresses to directly address COG memory.
Actually this code seems a little 'old' - 17 days ;-). The code for Prop1 below for Tachyon V4.2 shows some enhancements.
Prop1 V4.2 code for inner loop
Also, SKIP can stay SKIP, it doesn't have to shorten to SKP.
Some day I'll convince you that the instructions can be longer than 7 characters. Fitting in a single 8 char tab stop space is not important. It's a detriment.
I would use JUMPSKIP, CALLSKIP, etc.
Where the resulting branchaddress would be to (base << 10) + reg[9:0]. If base is zero, then it would act as it does now (cog/lut exec). This would give you a skip table of 256 entries (because of the byte addressing).
The question is: do we need it?
+1
Perhaps it's more a restriction in pnut that's clouding Chip's view?
Anyway this topic can wait until the silicon is underway.
Twice in a week we agree. What happened?
Except AUG is for extending immediate values, whereas SETQ is often used to modify the behavior of an instruction.
That means code that works in HUBexec, will break if moved into COG ? (and vice versa)
That rather breaks one of the fundamental, hard won P2 gains, where binary code did not care where it ran, it just ran a bit slower in HUB. Nothing broke.
Yup, the days of needing to be 8 modulus paranoid, are well behind us.
Better to design for the human eyeball, not some archaic TAB setting.
Easy to read, & maintain, always trumps terse.
Thanks, so how much can that improve, with Chip's new SKIP variants ?
Mike
ok then - let's see how the amateur (me) and the P2EG (Prop2 experts group) can improve Peter's Tachyon Forth word code engine for Prop2 before Peter wakes up and joins in ;-)
As Mike write this requires combining some words COG code for using SKIP.
Let's take this snippet: could be rewritten to: but we do use the lower 9 bits of the wordcode directly to address COG and there is currently no redirection via LUT decoding. Bit 15 is used to flag 15 bit immediate literals. other bits are used to code relative jumps and register access.
So no space/bits free in the wordcode to store a SKIP map.
Also most kernel words are only 2 or 3 PASM instructions anyhow, so not really much to save.
I just spent quite some time reading V4.2 kernel code again ... I do NOT see right now where SKIP would help a lot with the current architecture. :-(
or HOPSKPNJMP
LOL
Obviously skip works out a count to know how many instructions to skip over. Could that value be used to decrement the REP block size?
If yes, easy/complex to do? If its complex, then its a caveat for skip instructions.
It's way too complicated. It's like mixing polar and Cartesian domains. I imagine brushing my teeth while riding a dirt bike.
SHRED does not sound like a serious suggestion, more whimsical.
If we weren't hidebound to keeping it under 7 characters, we could just call it CHA_CHA_CHA. But I don't know what we'd call the one that returns.
WOCKA_WOCKA_WOCKA
Why are we "hidebound to keeping it under 7 characters" ???? I think only you are thinking that is a requirement, and it's just NOT.
I know. I'm just used to an 8-spaces-per-tab world.
Upon someone pointing out that we could use longer names, I reviewed all of our mnemonics and I think they are sufficiently expressive at under eight characters, each.
The only problem is this new instruction that CALLs and SKIPs, but with a single operand.