Instinct says a pick-pair opcode should also be useful for Quadrature Encoders, but I'm not seeing an elegant outcome yet...
Found a solution that is state-based, ( and also a good argument for making Macro Assembler standard).
Idle state is 2 lines (test and jump to same) and
Change action is 3 lines. ( test, inc/dec, jump to new)
appx 28 lines of code for each Quad block.
If Chip has the index choice, then that 28 lines may be mostly share-able across 3 Quad instances.
Could do 3 threads of Quad, leaving 4th for serious lifting.
At a USB compatible, leisurely 48MHz, main thread is (say) ~30 mips and each quad can run at ~6 mips.
I'm not sure how jumps and threads play together in the fine print, but that's roughly 3 x 1MHz quad-capable + Main.
Allows trivial decoding of differential digital input pair...
PICKZC ina,#16' 16&17 are differential +/- pairif_00jmp #S0 state
if_01RCL data,#1' received a 1if_10SHL data,#1' received a 0if_11jmp #S1
This would be better in this case (but not useful for USB because you also need the previous state of the pair)
PICKZC ina,#16' 16&17 are differential +/- pair
if_diff RCR data,#1' save received bit (LSB first)
if_same jmp #S0S1 ' jmp if SE0 or SE1
.....
S0S1 if_11jmp #S1
S0...
S1...
There are a lot of times where I have needed to decode pairs of bits and needed the full 4 cases. This instruction will help a lot.
BTW I think SETZC is still a better name than PICKZC. If it is not too much trouble, I suggest the compiler permit SETZC <reg> as SETZC <reg>,#0.
This would be better in this case (but not useful for USB because you also need the previous state of the pair)
PICKZC ina,#16' 16&17 are differential +/- pair
if_diff RCR data,#1' save received bit (LSB first)
if_same jmp #S0S1 ' jmp if SE0 or SE1
.....
S0S1 if_11jmp #S1
S0...
S1...
There are a lot of times where I have needed to decode pairs of bits and needed the full 4 cases. This instruction will help a lot.
BTW I think SETZC is still a better name than PICKZC. If it is not too much trouble, I suggest the compiler permit SETZC <reg> as SETZC <reg>,#0.
Yep. I was trying for a small, simple generic piece of sample code... yours is simpler and has the right bit-order for USB
I am thrilled at the SETZC / PICZC instruction... it will make a lot of decoding easier!
Yes. Both SETZC / PICKZC and RESD / RESULT are both simple & elegant, and extremely useful.
I really love the extra extension of RESD / RESULT to be used with first following executed instruction permitting a set of alternate conditional instructions. It's the little gems like these that continue to boost the performance for those of us who wish to spend the time.
Yes. Both SETZC / PICKZC and RESD / RESULT are both simple & elegant, and extremely useful.
I really love the extra extension of RESD / RESULT to be used with first following executed instruction permitting a set of alternate conditional instructions. It's the little gems like these that continue to boost the performance for those of us who wish to spend the time.
You have to love tool flows, that are somehow not quite the same across families...
Google does not find much, might be time to contact Altera to find the magic preserve button ?
Contact Altera? They've made that an exercise in futility.
They've got lots of online help, but I can never find anything I need. One other thing I have never been able to find is any kind of benchmark comparisons between device families. If they don't want to "talk" about something, you will find no mention of the matter, anywhere. It all seems very controlled.
Here is an updated Instruction Summary with...
* Column for WZ & WC now has it for each op
* adds new instructions REPD & PICKZC (note PICKZC my not be correct opcode bits)
* removes instruction SETZC Tip to view without wrap - reduce text size (In IE Ctrl-Scroll)
You know we have this new RESD D/# instruction that lets you redirect the next result to another register.
What about expanding this capability a little (rdr = task's redirection register):
RESD D/# 'set rdr to D/#, redirect next write to [rdr]
RESDP D/# 'set rdr to D/#, redirect next write to [rdr++]
RESDN D/# 'set rdr to D/#, redirect next write to [rdr--]
RESDX D/# 'set rdr to D/#, redirect all subsequent writes to [rdr] until RESDOFF
RESDPX D/# 'set rdr to D/#, redirect all subsequent writes to [rdr++] until RESDOFF
RESDNX D/# 'set rdr to D/#, redirect all subsequent writes to [rdr--] until RESDOFF
RESDOFF D/# 'set rdr to D/#, cancel write redirection
RESD ' redirect next write to [rdr]
RESDP ' redirect next write to [rdr++]
RESDN ' redirect next write to [rdr--]
RESDX ' redirect all subsequent writes to [rdr] until RESDOFF
RESDPX ' redirect all subsequent writes to [rdr++] until RESDOFF
RESDNX ' redirect all subsequent writes to [rdr--] until RESDOFF
RESDOFF ' cancel write redirection
This scheme takes two more state bits and an incrementer/decrementer on the task's redirection register.
You know we have this new RESD D/# instruction that lets you redirect the next result to another register.
What about expanding this capability a little (rdr = task's redirection register):
RESD D/# 'set rdr to D/#, redirect next write to [rdr]
RESDP D/# 'set rdr to D/#, redirect next write to [rdr++]
RESDN D/# 'set rdr to D/#, redirect next write to [rdr--]
RESDX D/# 'set rdr to D/#, redirect all subsequent writes to [rdr] until RESDOFF
RESDPX D/# 'set rdr to D/#, redirect all subsequent writes to [rdr++] until RESDOFF
RESDNX D/# 'set rdr to D/#, redirect all subsequent writes to [rdr--] until RESDOFF
RESDOFF D/# 'set rdr to D/#, cancel write redirection
RESD ' redirect next write to [rdr]
RESDP ' redirect next write to [rdr++]
RESDN ' redirect next write to [rdr--]
RESDX ' redirect all subsequent writes to [rdr] until RESDOFF
RESDPX ' redirect all subsequent writes to [rdr++] until RESDOFF
RESDNX ' redirect all subsequent writes to [rdr--] until RESDOFF
RESDOFF ' cancel write redirection
This scheme takes two more state bits and an incrementer/decrementer on the task's redirection register.
WOW! That's some scheme Chip.
So each task has its own task register, and it can auto increment/decrement each time it gets used. That is now a really powerful instruction.
Rather than the RESDP & RESDN in pnut, perhaps they could be RESD #/D++ or RESD #/D-- to indicate increment or decrement, instead of revealing there are separate opcodes used.
You know we have this new RESD D/# instruction that lets you redirect the next result to another register.
What about expanding this capability a little (rdr = task's redirection register):
RESD D/# 'set rdr to D/#, redirect next write to [rdr]
RESDP D/# 'set rdr to D/#, redirect next write to [rdr++]
RESDN D/# 'set rdr to D/#, redirect next write to [rdr--]
RESDX D/# 'set rdr to D/#, redirect all subsequent writes to [rdr] until RESDOFF
RESDPX D/# 'set rdr to D/#, redirect all subsequent writes to [rdr++] until RESDOFF
RESDNX D/# 'set rdr to D/#, redirect all subsequent writes to [rdr--] until RESDOFF
RESDOFF D/# 'set rdr to D/#, cancel write redirection
RESD ' redirect next write to [rdr]
RESDP ' redirect next write to [rdr++]
RESDN ' redirect next write to [rdr--]
RESDX ' redirect all subsequent writes to [rdr] until RESDOFF
RESDPX ' redirect all subsequent writes to [rdr++] until RESDOFF
RESDNX ' redirect all subsequent writes to [rdr--] until RESDOFF
RESDOFF ' cancel write redirection
This scheme takes two more state bits and an incrementer/decrementer on the task's redirection register.
I don't know about this one. I'm just trying to imagine when we would redirect a whole lot of sequential instructions to a register (or a register block), so far all I could dream up was maybe inside some tight REP loop doing a form of SIMD or something. We have INDA++ already which could help do the writing to a block. So the only real benefit is in not destroying both the operands to the opcodes.
Chip, wouldn't post increment/decrement be better?
Might the instructions RESULT / RESULTX / RESULTN (result = result modify off = never) be better, and the ++ or -- on the #/D operand as pre or post (only 1) ?
Example of a Single Instruction Multiple Data (SIMD) type operation on adding 100 numbers was what came to mind.
REPS #100,#2
RESDPX C
op: ADD A,B
INCDS op
RESOFF C
Result C vector = A vector + B vector
EDIT: Actually that probably cannot work because there is not enough time in the self modifying code loop to work right, so perhaps it could be this instead:
REPS #100,#1
RESDPX C
op: ADD INDA++,INDB++
RESOFF C
Actually this is where the incrementing/decrementing RESD stuff became useful because INDA/INDB are already consumed in the loop itself. So yes it definitely could be of some benefit here.
We do 100 adds in 100 cycles above. To do this same operation otherwise takes a lot of instructions if you can't destroy the D operand.
ragloh,
I could see potential in using these instead of INDA/INDB (saved for other uses), as a block move between threads, as a way of using banking (instead of remapping). I am sure there are other uses - we just have to see.
Chip,
I see how RESD,RESDP and RESDN could be useful in multi-tasking as we only have 1 set of INDA/B registers.
Question: The SETMAP instruction now has a #S value. I recall you mentioning somewhere about selecting a
register block using any register rather than INDB or task number? Also can each hw task select a separate block?
I don't know about this one. I'm just trying to imagine when we would redirect a whole lot of sequential instructions to a register (or a register block), so far all I could dream up was maybe inside some tight REP loop doing a form of SIMD or something. We have INDA++ already which could help do the writing to a block. So the only real benefit is in not destroying both the operands to the opcodes.
INDA/INDB are not per-task, but global to the cog, so this would provide some new capability to all tasks.
Chip, wouldn't post increment/decrement be better?
Might the instructions RESULT / RESULTX / RESULTN (result = result modify off = never) be better, and the ++ or -- on the #/D operand as pre or post (only 1) ?
This scheme only does post-inc/dec on the redirect register.
We might need better names, for sure.
Also, these instructions have operand-less versions which don't write rdr, but set a redirect mode.
Example of a Single Instruction Multiple Data (SIMD) type operation on adding 100 numbers was what came to mind.
REPS #100,#2
RESDPX C
op: ADD A,B
INCDS op
RESOFF C
Result C vector = A vector + B vector
EDIT: Actually that probably cannot work because there is not enough time in the self modifying code loop to work right, so perhaps it could be this instead:
REPS #100,#1
RESDPX C
op: ADD INDA++,INDB++
RESOFF C
Actually this is where the incrementing/decrementing RESD stuff became useful because INDA/INDB are already consumed in the loop itself. So yes it definitely could be of some benefit here.
We do 100 adds in 100 cycles above. To do this same operation otherwise takes a lot of instructions if you can't destroy the D operand.
Chip,
I see how RESD,RESDP and RESDN could be useful in multi-tasking as we only have 1 set of INDA/B registers.
Question: The SETMAP instruction now has a #S value. I recall you mentioning somewhere about selecting a
register block using any register rather than INDB or task number? Also can each hw task select a separate block?
Register remapping is a cog-wide function that you'd probably want to set up once, unless, like for preemptive multi-threading, you'd like to switch around under software control. That S/# in SETTASK is to set a static remap value, rather than using only the current task ID or INDA/INDB for instantaneous remap.
RESDP D/# 'set rdr to D/#, redirect next write to [rdr++]
RESDN D/# 'set rdr to D/#, redirect next write to [rdr--]
This scheme takes two more state bits and an incrementer/decrementer on the task's redirection register.
Destination is only COG RAM registers? or can be HUB RAM too?
Do you think that It could be possible (or useful) to add OFFSET to those instructions?
For example (instruction for fixed 8 bytes offset) :
RESDP8 D/# 'set rdr to D/#, redirect next write to [rdr+8]
RESDN8 D/# 'set rdr to D/#, redirect next write to [rdr-8]
With an offset of 8, maybe some application could use eight synchronized cogs to get the fastest data transfer possible to fill HUB RAM (or use HUB RAM to synchonously send data using pin transfer (XFR).
I remember that high speed video on P1 was implemented using several synchronized cogs (is this right?).
Maybe some high speed video application could benefit. Or projects that made use or fast pipelined DAC/ADC.
Another example: with an +3 offset, three cogs can send R, G, B data indepently to a DAC (EDIT: "to three DACs") using pin transfer instructions. And hopefully with a 3x speed improvement over a single COG handling three colors.
Destination is only COG RAM registers? or can be HUB RAM too?
Do you think that It could be possible (or useful) to add OFFSET to those instructions?
For example (instruction for fixed 8 bytes offset) :
RESDP8 D/# 'set rdr to D/#, redirect next write to [rdr+8]
RESDN8 D/# 'set rdr to D/#, redirect next write to [rdr-8]
With an offset of 8, maybe some application could use eight synchronized cogs to get the fastest data transfer possible to fill HUB RAM (or use HUB RAM to synchonously send data using pin transfer (XFR).
I remember that high speed video on P1 was implemented using several synchronized cogs (is this right?).
Maybe some high speed video application could benefit. Or projects that made use or fast pipelined DAC/ADC.
Another example: with an +3 offset, three cogs can send R, G, B data indepently to a DAC (EDIT: "to three DACs") using pin transfer instructions. And hopefully with a 3x speed improvement over a single COG handling three colors.
What do you think?
I don't see a need for an offset because you set whatever address you want and it just goes from there.
This is something that the programmer is going to have to put in his head and then recognize when an opportunity to use it arises.
I can think of all kinds of image processing that would benefit from this. Imagine the task of comparing two images, from known optics, which are similar except for rotation and magnification(cyclopean vision), where your calculation yields a change in the position of the camera repeated cordic and math functions. These tools seem to fit the bill nicely.
Rich
In the past what happened was the documentation seemed to be purposefully targeted at various classes of users at various times in the device's life cycle… first being experts, followed by advanced, beginners and intermediate. Where most Prop1 users will probably want to start is with P1 compatible coding… and then adding in advanced programming as needed.
In the mass adaptation phase, the biggest problem seems to be directing users to appropriate documentation, based upon their individual skill set. Parallax does a good job of this, mostly through the forums. Even advanced users are going to struggle with some of the refinements that have occurred.
Integrating advanced help utilities into the various tools seems essential to me.
Rich
I didn't mean to go off-topic, just couldn't help myself:)
You know we have this new RESD D/# instruction that lets you redirect the next result to another register.
What about expanding this capability a little (rdr = task's redirection register):
RESD D/# 'set rdr to D/#, redirect next write to [rdr]
RESDP D/# 'set rdr to D/#, redirect next write to [rdr++]
RESDN D/# 'set rdr to D/#, redirect next write to [rdr--]
RESDX D/# 'set rdr to D/#, redirect all subsequent writes to [rdr] until RESDOFF
RESDPX D/# 'set rdr to D/#, redirect all subsequent writes to [rdr++] until RESDOFF
RESDNX D/# 'set rdr to D/#, redirect all subsequent writes to [rdr--] until RESDOFF
RESDOFF D/# 'set rdr to D/#, cancel write redirection
RESD ' redirect next write to [rdr]
RESDP ' redirect next write to [rdr++]
RESDN ' redirect next write to [rdr--]
RESDX ' redirect all subsequent writes to [rdr] until RESDOFF
RESDPX ' redirect all subsequent writes to [rdr++] until RESDOFF
RESDNX ' redirect all subsequent writes to [rdr--] until RESDOFF
RESDOFF ' cancel write redirection
This scheme takes two more state bits and an incrementer/decrementer on the task's redirection register.
Comments
Found a solution that is state-based, ( and also a good argument for making Macro Assembler standard).
Idle state is 2 lines (test and jump to same) and
Change action is 3 lines. ( test, inc/dec, jump to new)
appx 28 lines of code for each Quad block.
If Chip has the index choice, then that 28 lines may be mostly share-able across 3 Quad instances.
Could do 3 threads of Quad, leaving 4th for serious lifting.
At a USB compatible, leisurely 48MHz, main thread is (say) ~30 mips and each quad can run at ~6 mips.
I'm not sure how jumps and threads play together in the fine print, but that's roughly 3 x 1MHz quad-capable + Main.
Hard not to be impressed
PICKZC ina,#16 ' 16&17 are differential +/- pair if_diff RCR data,#1 ' save received bit (LSB first) if_same jmp #S0S1 ' jmp if SE0 or SE1 ..... S0S1 if_11 jmp #S1 S0... S1...
There are a lot of times where I have needed to decode pairs of bits and needed the full 4 cases. This instruction will help a lot.
BTW I think SETZC is still a better name than PICKZC. If it is not too much trouble, I suggest the compiler permit SETZC <reg> as SETZC <reg>,#0.
Yep, I forgot about USB having to remember the previous bit.
Because LBS is received first. (realised this error in my USB a day or two ago)
I am thrilled at the SETZC / PICZC instruction... it will make a lot of decoding easier!
Yes. Both SETZC / PICKZC and RESD / RESULT are both simple & elegant, and extremely useful.
I really love the extra extension of RESD / RESULT to be used with first following executed instruction permitting a set of alternate conditional instructions. It's the little gems like these that continue to boost the performance for those of us who wish to spend the time.
Contact Altera? They've made that an exercise in futility.
They've got lots of online help, but I can never find anything I need. One other thing I have never been able to find is any kind of benchmark comparisons between device families. If they don't want to "talk" about something, you will find no mention of the matter, anywhere. It all seems very controlled.
* Column for WZ & WC now has it for each op
* adds new instructions REPD & PICKZC (note PICKZC my not be correct opcode bits)
* removes instruction SETZC
Tip to view without wrap - reduce text size (In IE Ctrl-Scroll)
Propeller II Instructions as of 2014/03/12 (+ REPD, PICKZC) ---------------------------------------------------------------------------------------------------------------------------- ZCxS Opcode ZC I Cond Dest Source Instr00 01 10 11 Operand(s) Flags ---------------------------------------------------------------------------------------------------------------------------- ZCWS 00000ff ZC I CCCC DDDDDDDDD SSSSSSSSS RDBYTE RDBYTEC RDWORD RDWORDC D,S/PTRA/PTRB ZC ZC ZC ZC ZCWS 00001ff ZC I CCCC DDDDDDDDD SSSSSSSSS RDLONG RDLONGC RDAUX RDAUXR D,S/PTRA/PTRB ZC ZC ZC ZC ZCMS 00010ff ZC I CCCC DDDDDDDDD SSSSSSSSS ISOB NOTB CLRB SETB D,S/# ZC ZC ZC ZC ZCMS 00011ff ZC I CCCC DDDDDDDDD SSSSSSSSS SETBC SETBNC SETBZ SETBNZ D,S/# ZC ZC ZC ZC ZCMS 00100ff ZC I CCCC DDDDDDDDD SSSSSSSSS ANDN AND OR XOR D,S/# ZC ZC ZC ZC ZCMS 00101ff ZC I CCCC DDDDDDDDD SSSSSSSSS MUXC MUXNC MUXZ MUXNZ D,S/# ZC ZC ZC ZC ZCMS 00110ff ZC I CCCC DDDDDDDDD SSSSSSSSS ROR ROL SHR SHL D,S/# ZC ZC ZC ZC ZCMS 00111ff ZC I CCCC DDDDDDDDD SSSSSSSSS RCR RCL SAR REV D,S/# ZC ZC ZC ZC ZCWS 01000ff ZC I CCCC DDDDDDDDD SSSSSSSSS MOV NOT ABS NEG D,S/# ZC ZC ZC ZC ZCWS 01001ff ZC I CCCC DDDDDDDDD SSSSSSSSS NEGC NEGNC NEGZ NEGNZ D,S/# ZC ZC ZC ZC ZCMS 01010ff ZC I CCCC DDDDDDDDD SSSSSSSSS ADD SUB ADDX SUBX D,S/# ZC ZC ZC ZC ZCMS 01011ff ZC I CCCC DDDDDDDDD SSSSSSSSS ADDS SUBS ADDSX SUBSX D,S/# ZC ZC ZC ZC ZCMS 01100ff ZC I CCCC DDDDDDDDD SSSSSSSSS SUMC SUMNC SUMZ SUMNZ D,S/# ZC ZC ZC ZC ZCMS 01101ff ZC I CCCC DDDDDDDDD SSSSSSSSS MIN MAX MINS MAXS D,S/# ZC ZC ZC ZC ZCMS 01110ff ZC I CCCC DDDDDDDDD SSSSSSSSS ADDABS SUBABS INCMOD DECMOD D,S/# ZC ZC ZC ZC ZCMS 01111ff ZC I CCCC DDDDDDDDD SSSSSSSSS CMPSUB SUBR MUL SCL D,S/# ZC ZC ZC ZC ZCWS 10000ff ZC I CCCC DDDDDDDDD SSSSSSSSS DECOD2 DECOD3 DECOD4 DECOD5 D,S/# ZC ZC ZC ZC ---------------------------------------------------------------------------------------------------------------------------- Z-WS 1000100 Zf I CCCC DDDDDDDDD SSSSSSSSS ENCOD BLMASK D,S/# Z- Z- Z-WS 1000101 Zf I CCCC DDDDDDDDD SSSSSSSSS ONECNT ZERCNT D,S/# Z- Z- -CWS 1000110 fC I CCCC DDDDDDDDD SSSSSSSSS INCPAT DECPAT D,S/# -C -- --WS 1000111 ff I CCCC DDDDDDDDD SSSSSSSSS SPLITB MERGEB SPLITW MERGEW D,S/# -- -- -- -- --MS 10010nn nf I CCCC DDDDDDDDD SSSSSSSSS GETNIB SETNIB D,S/#,#0..7 -- -- --MS 1001100 nf I CCCC DDDDDDDDD SSSSSSSSS GETWORD SETWORD D,S/#,#0..1 -- -- --MS 1001101 ff I CCCC DDDDDDDDD SSSSSSSSS SETWRDS ROLNIB ROLBYTE ROLWORD D,S/# -- -- -- -- --MS 1001110 ff I CCCC DDDDDDDDD SSSSSSSSS SETS SETD SETCOND SETINST D,S/# -- -- -- -- --WS 1001111 ff I CCCC DDDDDDDDD SSSSSSSSS <empty> THALT D,S/# -- -- --MS 101000n nf I CCCC DDDDDDDDD SSSSSSSSS GETBYTE SETBYTE D,S/#,#0..3 -- -- --WS 1010010 ff I CCCC DDDDDDDDD SSSSSSSSS SETBYTS MOVBYTS PACKRGB UNPKRGB D,S/# -- -- -- -- --MS 1010011 ff I CCCC DDDDDDDDD SSSSSSSSS ADDPIX MULPIX BLNPIX MIXPIX D,S/# -- -- -- -- ZCMS 1010100 ZC I CCCC DDDDDDDDD SSSSSSSSS JMPSW D,S/@ ZC ZCMS 1010101 ZC I CCCC DDDDDDDDD SSSSSSSSS JMPSWD D,S/@ ZC --MS 1010110 ff I CCCC DDDDDDDDD SSSSSSSSS DJZ DJZD DJNZ DJNZD D,S/@ -- -- -- -- --RS 1010111 ff I CCCC DDDDDDDDD SSSSSSSSS JZ JZD JNZ JNZD D,S/@ -- -- -- -- ZCRS 10110ff ZC I CCCC DDDDDDDDD SSSSSSSSS TESTB TESTN TEST CMP D,S/# ZC ZC ZC ZC ZCRS 10111ff ZC I CCCC DDDDDDDDD SSSSSSSSS CMPX CMPS CMPSX CMPR D,S/# ZC ZC ZC ZC -CRS 11000fn nC I CCCC DDDDDDDDD SSSSSSSSS WAITPEQ WAITPNE D,S/#,#0..3 -C -C ---S 110010f nn I CCCC nnnnnnnnn SSSSSSSSS WAITVID WAITVID #0..$DFF,S/# -- -- --LS 110011f fL I CCCC DDDDDDDDD SSSSSSSSS WRBYTE WRWORD WRLONG WRWIDE D/#,S/PTRA/PTRB -- -- -- -- --LS 110100f fL I CCCC DDDDDDDDD SSSSSSSSS WRAUX WRAUXR SETACCA SETACCB D/#,S/#0..$FF/PTRX/PTRY -- -- -- -- --LS 110101f fL I CCCC DDDDDDDDD SSSSSSSSS MACA MACB MUL32 MUL32U D/#,S/# -- -- -- -- --LS 110110f fL I CCCC DDDDDDDDD SSSSSSSSS DIV32 DIV32U DIV64 DIV64U D/#,S/# -- -- -- -- --LS 110111f fL I CCCC DDDDDDDDD SSSSSSSSS SQRT64 QSINCOS QARCTAN QROTATE D/#,S/# -- -- -- -- --LS 111000n nL I CCCC DDDDDDDDD SSSSSSSSS CFGPINS cfgpins cfgpins SETMAP D/#,S/#,#0..2 -- -- -- -- --LS 111001f fL I CCCC DDDDDDDDD SSSSSSSSS SETSERA SETSERB SETCTRS SETWAVS D/#,S/# -- -- -- -- --LS 111010f fL I CCCC DDDDDDDDD SSSSSSSSS SETFRQS SETPHSS ADDPHSS SUBPHSS D/#,S/# -- -- -- -- --LS 111011f fL I CCCC DDDDDDDDD SSSSSSSSS SETXFR SETMIX COGRUN COGRUNX D/#,S/# -- -- -- -- --LS 111100f fL I CCCC DDDDDDDDD SSSSSSSSS FRAC <empty> <empty> <empty> D/#,S/# -- -- -- -- --LS 111101f fL I CCCC DDDDDDDDD SSSSSSSSS JP JPD JNP JNPD D/#,S/@ -- -- -- -- --WS 1111100 ff I CCCC DDDDDDDDD SSSSSSSSS LOCBASE LOCBYTE LOCWORD LOCLONG D,S/@ -- -- -- -- ---------------------------------------------------------------------------------------------------------------------------- --W- 1111101 00 f CCCC DDDDDDDDD sssssssss LOCINST JMPLIST D,@s -- -- ---- 1111101 01 0 BBAA ddddddddd sssssssss FIXINDA #d,#s / FIXINDB #d,#s / FIXINDS -- ---- 1111101 01 1 nnnn nnnnnnnnn nnniiiiii REPS #1..$10000,#1..64 -- ---- 1111101 1f n nnnn nnnnnnnnn nnnnnnnnn AUGS AUGD #23bits -- -- ---------------------------------------------------------------------------------------------------------------------------- ---- 1111110 00 0 CCCC ffnnnnnnn nnnnnnnnn LOCPTRA LOCPTRA LOCPTRB LOCPTRB #abs -- -- -- -- ---- 1111110 00 1 CCCC ffnnnnnnn nnnnnnnnn JMP JMP JMPD JMPD #abs -- -- -- -- ---- 1111110 01 0 CCCC ffnnnnnnn nnnnnnnnn LINK LINK LINKD LINKD {0} #abs -- -- -- -- ---- 1111110 01 1 CCCC ffnnnnnnn nnnnnnnnn CALL CALL CALLD CALLD #abs -- -- -- -- ---- 1111110 10 0 CCCC ffnnnnnnn nnnnnnnnn CALLA CALLA CALLAD CALLAD #abs -- -- -- -- ---- 1111110 10 1 CCCC ffnnnnnnn nnnnnnnnn CALLB CALLB CALLBD CALLBD #abs -- -- -- -- ---- 1111110 11 0 CCCC ffnnnnnnn nnnnnnnnn CALLX CALLX CALLXD CALLXD #abs -- -- -- -- ---- 1111110 11 1 CCCC ffnnnnnnn nnnnnnnnn CALLY CALLY CALLYD CALLYD #abs -- -- -- -- -------------------------------------------------------------------------------------------------------------------------- ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 0000000ff COGID TASKID LOCKNEW GETLFSR D ZC ZC ZC ZC ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 0000001ff GETCNT GETCNTX GETACAL GETACAH D ZC ZC ZC ZC ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 0000010ff GETACBL GETACBH GETPTRA GETPTRB D ZC ZC ZC ZC ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 0000011ff GETPTRX GETPTRY SERINA SERINB D ZC ZC ZC ZC ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 0000100ff GETMULL GETMULH GETDIVQ GETDIVR D ZC ZC ZC ZC ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 0000101ff GETSQRT GETQX GETQY GETQZ D ZC ZC ZC ZC ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 0000110ff GETPHSA GETPHZA GETCOSA GETSINA D ZC ZC ZC ZC ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 0000111ff GETPHSB GETPHZB GETCOSB GETSINB D ZC ZC ZC ZC ZCM- 1111111 ZC 0 CCCC DDDDDDDDD 0001000ff PUSHZC POPZC SUBCNT GETPIX D ZC ZC ZC ZC ZCM- 1111111 ZC 0 CCCC DDDDDDDDD 0001001ff BINBCD BCDBIN BINGRY GRYBIN D ZC ZC ZC ZC ZCM- 1111111 ZC 0 CCCC DDDDDDDDD 0001010ff ESWAP4 ESWAP8 SEUSSF SEUSSR D ZC ZC ZC ZC ZCM- 1111111 ZC 0 CCCC DDDDDDDDD 0001011ff INCD DECD INCDS DECDS D ZC ZC ZC ZC ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 0001100ff POPT0 POPT1 POPT2 POPT3 D ZC ZC ZC ZC ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 0001101ff POP <empty> <empty> <empty> D ZC ZC ZC ZC ---------------------------------------------------------------------------------------------------------------------------- --L- 1111111 00 L CCCC DDDDDDDDD 001iiiiii REPD D/#1..512,#1..64 -- ---------------------------------------------------------------------------------------------------------------------------- --L- 1111111 00 L CCCC DDDDDDDDD 0100000ff CLKSET COGSTOP LOCKSET LOCKCLR D/# -- -- -C -C --L- 1111111 00 L CCCC DDDDDDDDD 0100001ff LOCKRET RDWIDE RDWIDEC RDWIDEQ D/# -- -- -- -- ZCL- 1111111 ZC L CCCC DDDDDDDDD 0100010ff GETP GETNP SEROUTA SEROUTB D/# ZC ZC -C -C -CL- 1111111 0C L CCCC DDDDDDDDD 0100011ff CMPCNT WAITPX WAITPR WAITPF D/# -C -C -C -C --L- 1111111 00 L CCCC DDDDDDDDD 0100100ff PUSH <empty> SETXCH SETTASK D/# -- ZC -- -- --L- 1111111 00 L CCCC DDDDDDDDD 0100101ff SETRACE SARACCA SARACCB SARACCS D/# -- -- -- -- --L- 1111111 00 L CCCC DDDDDDDDD 0100110ff SETPTRA SETPTRB ADDPTRA ADDPTRB D/# -- -- -- -- --L- 1111111 00 L CCCC DDDDDDDDD 0100111ff SUBPTRA SUBPTRB SETWIDE SETWIDZ D/# -- -- -- -- --L- 1111111 00 L CCCC DDDDDDDDD 0101000ff SETPTRX SETPTRY ADDPTRX ADDPTRY D/# -- -- -- -- --L- 1111111 00 L CCCC DDDDDDDDD 0101001ff SUBPTRX SUBPTRY PASSCNT WAIT D/# -- -- -- -- --L- 1111111 00 L CCCC DDDDDDDDD 0101010ff OFFP NOTP CLRP SETP D/# -- -- -- -- --L- 1111111 00 L CCCC DDDDDDDDD 0101011ff SETPC SETPNC SETPZ SETPNZ D/# -- -- -- -- --L- 1111111 00 L CCCC DDDDDDDDD 0101100ff DIV64D SQRT32 QLOG QEXP D/# -- -- -- -- --L- 1111111 00 L CCCC DDDDDDDDD 0101101ff SETQI SETQZ CFGDACS SETDACS D/# -- -- -- -- --L- 1111111 00 L CCCC DDDDDDDDD 0101110ff CFGDAC0 CFGDAC1 CFGDAC2 CFGDAC3 D/# -- -- -- -- --L- 1111111 00 L CCCC DDDDDDDDD 0101111ff SETDAC0 SETDAC1 SETDAC2 SETDAC3 D/# -- -- -- -- --L- 1111111 00 L CCCC DDDDDDDDD 0110000ff SETCTRA SETWAVA SETFRQA SETPHSA D/# -- -- -- -- --L- 1111111 00 L CCCC DDDDDDDDD 0110001ff ADDPHSA SUBPHSA SETVID SETVIDY D/# -- -- -- -- --L- 1111111 00 L CCCC DDDDDDDDD 0110010ff SETCTRB SETWAVB SETFRQB SETPHSB D/# -- -- -- -- --L- 1111111 00 L CCCC DDDDDDDDD 0110011ff ADDPHSB SUBPHSB SETVIDI SETVIDQ D/# -- -- -- -- --L- 1111111 00 L CCCC DDDDDDDDD 0110100ff SETPIX SETPIXZ SETPIXU SETPIXV D/# -- -- -- -- --L- 1111111 00 L CCCC DDDDDDDDD 0110101ff SETPIXA SETPIXR SETPIXG SETPIXB D/# -- -- -- -- --L- 1111111 00 L CCCC DDDDDDDDD 0110110ff SETPORA SETPORB SETPORC SETPORD D/# -- -- -- -- --L- 1111111 00 L CCCC DDDDDDDDD 0110111ff RDWIDEA RDWIDEB WRWIDEA WRWIDEB D/#1..512 -- -- -- -- --L- 1111111 00 L CCCC DDDDDDDDD 0111000ff JMPT0 JMPT1 JMPT2 JMPT3 D/# -- -- -- -- --L- 1111111 00 L CCCC DDDDDDDDD 0111001ff PUSHT0 PUSHT1 PUSHT2 PUSHT3 D/# -- -- -- -- --L- 1111111 ZC L CCCC DDDDDDDDD 0111010ff COGNEW COGNEWX RESD <empty> {D} D/# ZC ZC -- -- ---------------------------------------------------------------------------------------------------------------------------- --R- 1111111 ZC x CCCC DDDDDDDDD 1000000ff LOCPTRA LOCPTRB JMP JMPD D ZC ZC ZC ZC --R- 1111111 ZC x CCCC DDDDDDDDD 1000001ff LINK LINKD CALL CALLD {0} D ZC ZC ZC ZC --R- 1111111 ZC x CCCC DDDDDDDDD 1000010ff CALLA CALLAD CALLB CALLBD D ZC ZC ZC ZC --R- 1111111 ZC x CCCC DDDDDDDDD 1000011ff CALLX CALLXD CALLY CALLYD D ZC ZC ZC ZC --R- 1111111 00 x CCCC DDDDDDDDD 1000100ff LODINDA LODINDB <empty> <empty> D -- -- -- -- ---------------------------------------------------------------------------------------------------------------------------- ZC-- 1111111 ZC x CCCC xxxxxxxxx 1100000ff RETA RETAD RETB RETBD ZC ZC ZC ZC ZC-- 1111111 ZC x CCCC xxxxxxxxx 1100001ff RETX RETXD RETY RETYD ZC ZC ZC ZC ZC-- 1111111 ZC x CCCC xxxxxxxxx 1100010ff RET RETD POLCTRA POLCTRB ZC ZC ZC ZC ZC-- 1111111 ZC x CCCC xxxxxxxxx 1100011ff POLVID CAPCTRA CAPCTRB CAPCTRS ZC -- -- -- ---- 1111111 00 x CCCC xxxxxxxxx 1100100ff SETPIXW CLRACCA CLRACCB CLRACCS -- -- -- -- ZC-- 1111111 ZC x CCCC xxxxxxxxx 1100101ff CHKPTRX CHKPTRY SYNCTRA SYNCTRB ZC ZC -- -- ---- 1111111 00 x CCCC xxxxxxxxx 1100110ff DCACHEX ICACHEX ICACHEP ICACHEN -- -- -- -- ---- 1111111 00 x 0000 xxxxxxxxx 1100111ff TLOCK TFREE LOADT3 SAVET3 -- -- -- -- ---------------------------------------------------------------------------------------------------------------------------- ZCL- 1111111 ZC L CCCC DDDDDDDDD 1111iiiii PICKZC D/#,#0-31 ZC Note: PICKZC may not be correct opcode in S[8:5]. ----------------------------------------------------------------------------------------------------------------------------
InstructionSummary_20140312b.spinYou know we have this new RESD D/# instruction that lets you redirect the next result to another register.
What about expanding this capability a little (rdr = task's redirection register):
RESD D/# 'set rdr to D/#, redirect next write to [rdr] RESDP D/# 'set rdr to D/#, redirect next write to [rdr++] RESDN D/# 'set rdr to D/#, redirect next write to [rdr--] RESDX D/# 'set rdr to D/#, redirect all subsequent writes to [rdr] until RESDOFF RESDPX D/# 'set rdr to D/#, redirect all subsequent writes to [rdr++] until RESDOFF RESDNX D/# 'set rdr to D/#, redirect all subsequent writes to [rdr--] until RESDOFF RESDOFF D/# 'set rdr to D/#, cancel write redirection RESD ' redirect next write to [rdr] RESDP ' redirect next write to [rdr++] RESDN ' redirect next write to [rdr--] RESDX ' redirect all subsequent writes to [rdr] until RESDOFF RESDPX ' redirect all subsequent writes to [rdr++] until RESDOFF RESDNX ' redirect all subsequent writes to [rdr--] until RESDOFF RESDOFF ' cancel write redirection
This scheme takes two more state bits and an incrementer/decrementer on the task's redirection register.
So each task has its own task register, and it can auto increment/decrement each time it gets used. That is now a really powerful instruction.
Rather than the RESDP & RESDN in pnut, perhaps they could be RESD #/D++ or RESD #/D-- to indicate increment or decrement, instead of revealing there are separate opcodes used.
I don't know about this one. I'm just trying to imagine when we would redirect a whole lot of sequential instructions to a register (or a register block), so far all I could dream up was maybe inside some tight REP loop doing a form of SIMD or something. We have INDA++ already which could help do the writing to a block. So the only real benefit is in not destroying both the operands to the opcodes.
Might the instructions RESULT / RESULTX / RESULTN (result = result modify off = never) be better, and the ++ or -- on the #/D operand as pre or post (only 1) ?
REPS #100,#2 RESDPX C op: ADD A,B INCDS op RESOFF C
Result C vector = A vector + B vector
EDIT: Actually that probably cannot work because there is not enough time in the self modifying code loop to work right, so perhaps it could be this instead:
REPS #100,#1 RESDPX C op: ADD INDA++,INDB++ RESOFF C
Actually this is where the incrementing/decrementing RESD stuff became useful because INDA/INDB are already consumed in the loop itself. So yes it definitely could be of some benefit here.
We do 100 adds in 100 cycles above. To do this same operation otherwise takes a lot of instructions if you can't destroy the D operand.
REP #100, #4 NOP MOV TEMP, INDA++ ADD TEMP, INDB++ copy: MOV DEST, TEMP INCD copy
So we get a 4x speedup above!I could see potential in using these instead of INDA/INDB (saved for other uses), as a block move between threads, as a way of using banking (instead of remapping). I am sure there are other uses - we just have to see.
I see how RESD,RESDP and RESDN could be useful in multi-tasking as we only have 1 set of INDA/B registers.
Question: The SETMAP instruction now has a #S value. I recall you mentioning somewhere about selecting a
register block using any register rather than INDB or task number? Also can each hw task select a separate block?
INDA/INDB are not per-task, but global to the cog, so this would provide some new capability to all tasks.
This scheme only does post-inc/dec on the redirect register.
We might need better names, for sure.
Also, these instructions have operand-less versions which don't write rdr, but set a redirect mode.
There are some cases where there's a definite benefit. This is practically a freebie, so I say put it in the toolbox.
Register remapping is a cog-wide function that you'd probably want to set up once, unless, like for preemptive multi-threading, you'd like to switch around under software control. That S/# in SETTASK is to set a static remap value, rather than using only the current task ID or INDA/INDB for instantaneous remap.
Destination is only COG RAM registers? or can be HUB RAM too?
Do you think that It could be possible (or useful) to add OFFSET to those instructions?
For example (instruction for fixed 8 bytes offset) :
RESDP8 D/# 'set rdr to D/#, redirect next write to [rdr+8]
RESDN8 D/# 'set rdr to D/#, redirect next write to [rdr-8]
With an offset of 8, maybe some application could use eight synchronized cogs to get the fastest data transfer possible to fill HUB RAM (or use HUB RAM to synchonously send data using pin transfer (XFR).
I remember that high speed video on P1 was implemented using several synchronized cogs (is this right?).
Maybe some high speed video application could benefit. Or projects that made use or fast pipelined DAC/ADC.
Another example: with an +3 offset, three cogs can send R, G, B data indepently to a DAC (EDIT: "to three DACs") using pin transfer instructions. And hopefully with a 3x speed improvement over a single COG handling three colors.
What do you think?
I don't see a need for an offset because you set whatever address you want and it just goes from there.
This is something that the programmer is going to have to put in his head and then recognize when an opportunity to use it arises.
Rich
That's fine. When they graduate, or look at someone else's code, they will find these gems being used ;
Probably the instruction set should be split into two, for normal and advanced users.
In the past what happened was the documentation seemed to be purposefully targeted at various classes of users at various times in the device's life cycle… first being experts, followed by advanced, beginners and intermediate. Where most Prop1 users will probably want to start is with P1 compatible coding… and then adding in advanced programming as needed.
In the mass adaptation phase, the biggest problem seems to be directing users to appropriate documentation, based upon their individual skill set. Parallax does a good job of this, mostly through the forums. Even advanced users are going to struggle with some of the refinements that have occurred.
Integrating advanced help utilities into the various tools seems essential to me.
Rich
I didn't mean to go off-topic, just couldn't help myself:)
It's just for cog RAM registers - a little trick to swap out result register addresses.
I can see all sorts of potential uses for this in graphics, string operations and even cryptography.
Sorry, cannot resist...
I wonder if there would be uses for the inverse?
SRCD D/#
SRCDP D/#
SRCDN D/#
SRCDOFF D/#
What would these do?
C.W.