How can SETTRACE operate, at the clock cycle level, @ 160 MHz?
SETRACE can output to the XCH pins, which are completely internal and will work fine at the max frequency. All peripherals have been fully hooked into Port D now, so you could have the trace go to the XCH output so that another cog could view it.
To make a tracer, you could have a cog view the target cog's 16 bit state through his XCH (port D) window, while his XFR circuit read those words every clock and stuffed them into the AUX RAM. He would use a WAITPEQ on Port D to capture the trigger event. Once WAITPEQ released, he would disable his XFR and do a GETSPA to discover where the buffer ended. Then he would have a history in the AUX RAM of what the target cog was doing before the breakpoint. Or, he could wait up to 512 clocks after WAITPEQ to get a post-trigger history before stopping XFR.
Some of you are concerned about timeline and adding features. To see how innocuous something like SETTRACE is, look at the Verilog code to implement it:
'Setrace' goes high when the SETRACE instruction executes. The 4 LSB's of D are captured into 'trace' which was cleared to %0000 on cog start. If bit 3 is high, bits 2..0 determine which word of output pins gets the 16 bits of trace data. 'Trace_outp' gets OR'd into the 128 output pin signals.
Anyway, you can see that something like SETRACE is very simple.
I've been tremendously busy with my new job, so I'll just throw out a couple thoughts here.
I already told Chip that I thought CRC instructions were compulsory for doing the stuff he wants, I couldn't see any way to really get a routine below 8 clocks per accumulate.
I've seen Verilog for doing HW CRC, it's not difficult to implement, there is even a web page that will take a polynomial and spit out static Verilog that does it all.
I want 2 CRC modes, a 16 bit CRC and 32 bit. The polynomial is user defined, so any number of CRC modes can be emulated.
Example:
SETCRC16 D/# ; set the polynomial for CRC16
CRC16 D, S ; accumulate S into D using polynomial
The proposal would then be 4 new instructions:
SETCRC16 ; set CRC16 polynomial
CRC16 ; CRC16 accumulate instruction
SETCRC32 ; set CRC32 polynomial
CRC32 ; CRC32 accumulate instruction
I've been tremendously busy with my new job, so I'll just throw out a couple thoughts here.
I already told Chip that I thought CRC instructions were compulsory for doing the stuff he wants, I couldn't see any way to really get a routine below 8 clocks per accumulate.
I've seen Verilog for doing HW CRC, it's not difficult to implement, there is even a web page that will take a polynomial and spit out static Verilog that does it all.
I want 2 CRC modes, a 16 bit CRC and 32 bit. The polynomial is user defined, so any number of CRC modes can be emulated.
Example:
SETCRC16 D/# ; set the polynomial for CRC16
CRC16 D, S ; accumulate S into D using polynomial
The proposal would then be 4 new instructions:
SETCRC16 ; set CRC16 polynomial
CRC16 ; CRC16 accumulate instruction
SETCRC32 ; set CRC32 polynomial
CRC32 ; CRC32 accumulate instruction
From looking at the various CRC implementations, it would seem that any length polynomial could be done without requiring different CRC16 and CRC32 instruction sets. At the end, the upper bits would just be cleared if the length was less than CRC32. This would then make only 2 instructions necessary.
How many clocks would be required by the CRC16/32 instruction? I presume you are feeding it a byte to be accumulated?
A lookup table in hub does work fairly well though. Of course memory is used. I think the CLUT is more valuable for other things than a lookup table.
Chip: Could you post your revised instruction list? I'd like to start updating PropGCC and I've been waiting for the changes to settle down a bit. It sounds like you've pretty much finished revising the instruction encodings. I don't mind if a few new instructions are added but I'd like to get the bulk of the work done now. Could you post your most recent instruction list with the bit encodings?
Thanks!
David
Edit: I just went back to your earlier post of an instruction set dated 10/16. If that is still accurate then I guess there is no need to repost it. Can you confirm?
From looking at the various CRC implementations, it would seem that any length polynomial could be done without requiring different CRC16 and CRC32 instruction sets. At the end, the upper bits would just be cleared if the length was less than CRC32. This would then make only 2 instructions necessary.
How many clocks would be required by the CRC16/32 instruction? I presume you are feeding it a byte to be accumulated?
A lookup table in hub does work fairly well though. Of course memory is used. I think the CLUT is more valuable for other things than a lookup table.
CRC16 and CRC32 are different algorithms and aren't implemented the same way. CRC32 has some reversing in it that CRC16 doesn't have.
I could be missing something obvious, but the algorithms are different.
Just wondering, will there be any way to get the current PC of a thread other than the current one? I want to do preemptive multitasking in a few cogs for a project I'm designing that will use the P2, and I plan on doing it by having one thread (#0) that only gets a turn every 16 instructions, by a SETTASK %%1111111111111110 (with the constant obviously in another register). This thread would sit in a djnz i, $ loop for a while and then SETTASK #0, so only it gets turns, and then record thread #1's PC and flags, swap thread 1 out to hub ram, load in a new thread, and put the task register back to only give thread 0 a turn every 16 clocks.The only thing I think is missing is an instruction to get a thread's PC and flags. This instruction wasn't necessary before threads were added because you could just use $ to get the current PC. But that only works with the current thread. Is there/can you please add a way to read other threads' PCs and flags?
Chip: Could you post your revised instruction list? I'd like to start updating PropGCC and I've been waiting for the changes to settle down a bit. It sounds like you've pretty much finished revising the instruction encodings. I don't mind if a few new instructions are added but I'd like to get the bulk of the work done now. Could you post your most recent instruction list with the bit encodings?
Thanks!
David
Edit: I just went back to your earlier post of an instruction set dated 10/16. If that is still accurate then I guess there is no need to repost it. Can you confirm?
A few things have changed in the last two weeks. Here is the latest. This will have to change just a little bit more to accommodate pixel blending:
Propeller II Instructions as of 11/01/2013
ZCDS (for D column: W=write, M=modify, R=read, L=read/immediate)
---------------------------------------------------------------------------------------------------------------------------------------------------------------
ZCWS 0000000 ZC I CCCC DDDDDDDDD SSSSSSSSS RDBYTE D,S/PTRx (waits for hub)
ZCWS 0000001 ZC I CCCC DDDDDDDDD SSSSSSSSS RDBYTEC D,S/PTRx (waits for hub if cache miss)
ZCWS 0000010 ZC I CCCC DDDDDDDDD SSSSSSSSS RDWORD D,S/PTRx (waits for hub)
ZCWS 0000011 ZC I CCCC DDDDDDDDD SSSSSSSSS RDWORDC D,S/PTRx (waits for hub if cache miss)
ZCWS 0000100 ZC I CCCC DDDDDDDDD SSSSSSSSS RDLONG D,S/PTRx (waits for hub)
ZCWS 0000101 ZC I CCCC DDDDDDDDD SSSSSSSSS RDLONGC D,S/PTRx (waits for hub if cache miss)
ZCWS 0000110 ZC I CCCC DDDDDDDDD SSSSSSSSS RDAUX D,S/#0..FF/SPx
ZCWS 0000111 ZC I CCCC DDDDDDDDD SSSSSSSSS RDAUXR D,S/#0..FF/SPx
ZCMS 0001000 ZC I CCCC DDDDDDDDD SSSSSSSSS ISOB D,S
ZCMS 0001001 ZC I CCCC DDDDDDDDD SSSSSSSSS NOTB D,S
ZCMS 0001010 ZC I CCCC DDDDDDDDD SSSSSSSSS CLRB D,S
ZCMS 0001011 ZC I CCCC DDDDDDDDD SSSSSSSSS SETB D,S
ZCMS 0001100 ZC I CCCC DDDDDDDDD SSSSSSSSS SETBC D,S
ZCMS 0001101 ZC I CCCC DDDDDDDDD SSSSSSSSS SETBNC D,S
ZCMS 0001110 ZC I CCCC DDDDDDDDD SSSSSSSSS SETBZ D,S
ZCMS 0001111 ZC I CCCC DDDDDDDDD SSSSSSSSS SETBNZ D,S
ZCMS 0010000 ZC I CCCC DDDDDDDDD SSSSSSSSS ANDN D,S
ZCMS 0010001 ZC I CCCC DDDDDDDDD SSSSSSSSS AND D,S
ZCMS 0010010 ZC I CCCC DDDDDDDDD SSSSSSSSS OR D,S
ZCMS 0010011 ZC I CCCC DDDDDDDDD SSSSSSSSS XOR D,S
ZCMS 0010100 ZC I CCCC DDDDDDDDD SSSSSSSSS MUXC D,S
ZCMS 0010101 ZC I CCCC DDDDDDDDD SSSSSSSSS MUXNC D,S
ZCMS 0010110 ZC I CCCC DDDDDDDDD SSSSSSSSS MUXZ D,S
ZCMS 0010111 ZC I CCCC DDDDDDDDD SSSSSSSSS MUXNZ D,S
ZCMS 0011000 ZC I CCCC DDDDDDDDD SSSSSSSSS ROR D,S
ZCMS 0011001 ZC I CCCC DDDDDDDDD SSSSSSSSS ROL D,S
ZCMS 0011010 ZC I CCCC DDDDDDDDD SSSSSSSSS SHR D,S
ZCMS 0011011 ZC I CCCC DDDDDDDDD SSSSSSSSS SHL D,S
ZCMS 0011100 ZC I CCCC DDDDDDDDD SSSSSSSSS RCR D,S
ZCMS 0011101 ZC I CCCC DDDDDDDDD SSSSSSSSS RCL D,S
ZCMS 0011110 ZC I CCCC DDDDDDDDD SSSSSSSSS SAR D,S
ZCMS 0011111 ZC I CCCC DDDDDDDDD SSSSSSSSS REV D,S
ZCWS 0100000 ZC I CCCC DDDDDDDDD SSSSSSSSS MOV D,S
ZCWS 0100001 ZC I CCCC DDDDDDDDD SSSSSSSSS NOT D,S
ZCWS 0100010 ZC I CCCC DDDDDDDDD SSSSSSSSS ABS D,S
ZCWS 0100011 ZC I CCCC DDDDDDDDD SSSSSSSSS NEG D,S
ZCWS 0100100 ZC I CCCC DDDDDDDDD SSSSSSSSS NEGC D,S
ZCWS 0100101 ZC I CCCC DDDDDDDDD SSSSSSSSS NEGNC D,S
ZCWS 0100110 ZC I CCCC DDDDDDDDD SSSSSSSSS NEGZ D,S
ZCWS 0100111 ZC I CCCC DDDDDDDDD SSSSSSSSS NEGNZ D,S
ZCMS 0101000 ZC I CCCC DDDDDDDDD SSSSSSSSS ADD D,S
ZCMS 0101001 ZC I CCCC DDDDDDDDD SSSSSSSSS SUB D,S
ZCMS 0101010 ZC I CCCC DDDDDDDDD SSSSSSSSS ADDX D,S
ZCMS 0101011 ZC I CCCC DDDDDDDDD SSSSSSSSS SUBX D,S
ZCMS 0101100 ZC I CCCC DDDDDDDDD SSSSSSSSS ADDS D,S
ZCMS 0101101 ZC I CCCC DDDDDDDDD SSSSSSSSS SUBS D,S
ZCMS 0101110 ZC I CCCC DDDDDDDDD SSSSSSSSS ADDSX D,S
ZCMS 0101111 ZC I CCCC DDDDDDDDD SSSSSSSSS SUBSX D,S
ZCMS 0110000 ZC I CCCC DDDDDDDDD SSSSSSSSS SUMC D,S
ZCMS 0110001 ZC I CCCC DDDDDDDDD SSSSSSSSS SUMNC D,S
ZCMS 0110010 ZC I CCCC DDDDDDDDD SSSSSSSSS SUMZ D,S
ZCMS 0110011 ZC I CCCC DDDDDDDDD SSSSSSSSS SUMNZ D,S
ZCMS 0110100 ZC I CCCC DDDDDDDDD SSSSSSSSS MIN D,S
ZCMS 0110101 ZC I CCCC DDDDDDDDD SSSSSSSSS MAX D,S
ZCMS 0110110 ZC I CCCC DDDDDDDDD SSSSSSSSS MINS D,S
ZCMS 0110111 ZC I CCCC DDDDDDDDD SSSSSSSSS MAXS D,S
ZCMS 0111000 ZC I CCCC DDDDDDDDD SSSSSSSSS ADDABS D,S
ZCMS 0111001 ZC I CCCC DDDDDDDDD SSSSSSSSS SUBABS D,S
ZCMS 0111010 ZC I CCCC DDDDDDDDD SSSSSSSSS INCMOD D,S
ZCMS 0111011 ZC I CCCC DDDDDDDDD SSSSSSSSS DECMOD D,S
ZCMS 0111100 ZC I CCCC DDDDDDDDD SSSSSSSSS CMPSUB D,S
ZCMS 0111101 ZC I CCCC DDDDDDDDD SSSSSSSSS SUBR D,S
ZCMS 0111110 ZC I CCCC DDDDDDDDD SSSSSSSSS MUL D,S (waits one clock)
ZCMS 0111111 ZC I CCCC DDDDDDDDD SSSSSSSSS SCL D,S (waits one clock)
ZCWS 1000000 ZC I CCCC DDDDDDDDD SSSSSSSSS DECOD3 D,S
ZCWS 1000001 ZC I CCCC DDDDDDDDD SSSSSSSSS DECOD4 D,S
ZCWS 1000010 ZC I CCCC DDDDDDDDD SSSSSSSSS DECOD5 D,S
Z-WS 1000011 Z0 I CCCC DDDDDDDDD SSSSSSSSS ENCOD D,S
Z-WS 1000011 Z1 I CCCC DDDDDDDDD SSSSSSSSS BLMASK D,S
Z-WS 1000100 Z0 I CCCC DDDDDDDDD SSSSSSSSS ONECNT D,S (waits one clock)
Z-WS 1000100 Z1 I CCCC DDDDDDDDD SSSSSSSSS ZERCNT D,S (waits one clock)
-CWS 1000101 0C I CCCC DDDDDDDDD SSSSSSSSS INCPAT D,S (waits three clocks)
-CWS 1000101 1C I CCCC DDDDDDDDD SSSSSSSSS DECPAT D,S (waits three clocks)
--WS 1000110 00 I CCCC DDDDDDDDD SSSSSSSSS BINGRY D,S
--WS 1000110 01 I CCCC DDDDDDDDD SSSSSSSSS GRYBIN D,S (waits one clock)
--WS 1000110 10 I CCCC DDDDDDDDD SSSSSSSSS SPLITB D,S
--WS 1000110 11 I CCCC DDDDDDDDD SSSSSSSSS MERGEB D,S
--WS 1000111 00 I CCCC DDDDDDDDD SSSSSSSSS SPLITW D,S
--WS 1000111 01 I CCCC DDDDDDDDD SSSSSSSSS MERGEW D,S
--WS 1000111 10 I CCCC DDDDDDDDD SSSSSSSSS ESWAP4 D,S
--WS 1000111 11 I CCCC DDDDDDDDD SSSSSSSSS ESWAP8 D,S
--MS 10010nn n0 I CCCC DDDDDDDDD SSSSSSSSS GETNIB D,S,#0..7
--MS 10010nn n1 I CCCC DDDDDDDDD SSSSSSSSS SETNIB D,S,#0..7
--MS 1001100 n0 I CCCC DDDDDDDDD SSSSSSSSS GETWORD D,S,#0..1
--MS 1001100 n1 I CCCC DDDDDDDDD SSSSSSSSS SETWORD D,S,#0..1
--MS 1001101 00 I CCCC DDDDDDDDD SSSSSSSSS SWBYTES D,S (switch/copy bytes in D, S = %11_10_01_00 = D same)
--MS 1001101 01 I CCCC DDDDDDDDD SSSSSSSSS ROLNIB D,S
--MS 1001101 10 I CCCC DDDDDDDDD SSSSSSSSS ROLBYTE D,S
--MS 1001101 11 I CCCC DDDDDDDDD SSSSSSSSS ROLWORD D,S
--MS 1001110 00 I CCCC DDDDDDDDD SSSSSSSSS SETS D,S
--MS 1001110 01 I CCCC DDDDDDDDD SSSSSSSSS SETD D,S
--MS 1001110 10 I CCCC DDDDDDDDD SSSSSSSSS SETX D,S
--MS 1001110 11 I CCCC DDDDDDDDD SSSSSSSSS SETI D,S
--MS 1001111 00 I CCCC DDDDDDDDD SSSSSSSSS PACKRGB D,S (8:8:8 -> 5:5:5 << 16 | D >> 16)
--MS 1001111 01 I CCCC DDDDDDDDD SSSSSSSSS UNPKRGB D,S (5:5:5 -> 8:8:8)
--WS 1001111 10 I CCCC DDDDDDDDD SSSSSSSSS SEUSSF D,S
--WS 1001111 11 I CCCC DDDDDDDDD SSSSSSSSS SEUSSR D,S
--MS 101000n n0 I CCCC DDDDDDDDD SSSSSSSSS GETBYTE D,S,#0..3
--MS 101000n n1 I CCCC DDDDDDDDD SSSSSSSSS SETBYTE D,S,#0..3
-CMS 1010010 0C I CCCC DDDDDDDDD SSSSSSSSS COGNEW D,S (waits for hub)
-CMS 1010010 1C I CCCC DDDDDDDDD SSSSSSSSS WAITCNT D,S (waits for CNT, +CNTX if WC)
--MS 1010011 00 I CCCC DDDDDDDDD SSSSSSSSS PIXBLND D,S (waits two clocks)
--MS 1010011 01 I CCCC DDDDDDDDD SSSSSSSSS PIXMUL1 D,S (waits two clocks)
--MS 1010011 10 I CCCC DDDDDDDDD SSSSSSSSS PIXMUL2 D,S (waits two clocks)
--MS 1010011 11 I CCCC DDDDDDDDD SSSSSSSSS PIXADD D,S (waits two clocks)
ZCWS 1010100 ZC I CCCC DDDDDDDDD SSSSSSSSS JMPRET D,S (set D to %1_1111_01xx for JMP/RET)
ZCWS 1010101 ZC I CCCC DDDDDDDDD SSSSSSSSS JMPRETD D,S (set D to %1_1111_01xx for JMP/RET)
--MS 1010110 00 I CCCC DDDDDDDDD SSSSSSSSS IJZ D,S
--MS 1010110 01 I CCCC DDDDDDDDD SSSSSSSSS IJZD D,S
--MS 1010110 10 I CCCC DDDDDDDDD SSSSSSSSS IJNZ D,S
--MS 1010110 11 I CCCC DDDDDDDDD SSSSSSSSS IJNZD D,S
--MS 1010111 00 I CCCC DDDDDDDDD SSSSSSSSS DJZ D,S
--MS 1010111 01 I CCCC DDDDDDDDD SSSSSSSSS DJZD D,S
--MS 1010111 10 I CCCC DDDDDDDDD SSSSSSSSS DJNZ D,S
--MS 1010111 11 I CCCC DDDDDDDDD SSSSSSSSS DJNZD D,S
ZCRS 1011000 ZC I CCCC DDDDDDDDD SSSSSSSSS TESTB D,S
ZCRS 1011001 ZC I CCCC DDDDDDDDD SSSSSSSSS TESTN D,S
ZCRS 1011010 ZC I CCCC DDDDDDDDD SSSSSSSSS TEST D,S
ZCRS 1011011 ZC I CCCC DDDDDDDDD SSSSSSSSS CMP D,S
ZCRS 1011100 ZC I CCCC DDDDDDDDD SSSSSSSSS CMPX D,S
ZCRS 1011101 ZC I CCCC DDDDDDDDD SSSSSSSSS CMPS D,S
ZCRS 1011110 ZC I CCCC DDDDDDDDD SSSSSSSSS CMPSX D,S
ZCRS 1011111 ZC I CCCC DDDDDDDDD SSSSSSSSS CMPR D,S
--RS 11000nn n0 I CCCC DDDDDDDDD SSSSSSSSS COGINIT D,S,#0..7 (waits for hub) (SETNIB :coginit,cog,#6)
---S 11000nn n1 I CCCC nnnnnnnnn SSSSSSSSS WAITVID #0..$DFF,S (waits for vid if single-task, loops if multi-task)
--RS 1100011 11 I CCCC DDDDDDDDD SSSSSSSSS WAITVID D,S (waits for vid if single-task, loops if multi-task)
-CRS 110010n nC I CCCC DDDDDDDDD SSSSSSSSS WAITPEQ D,S,#0..3 (waits for pins, +CNT if WC)
-CRS 110011n nC I CCCC DDDDDDDDD SSSSSSSSS WAITPNE D,S,#0..3 (waits for pins, +CNT if WC)
--LS 1101000 0L I CCCC DDDDDDDDD SSSSSSSSS WRBYTE D,S/PTR (waits for hub)
--LS 1101000 1L I CCCC DDDDDDDDD SSSSSSSSS WRWORD D,S/PTR (waits for hub)
--LS 1101001 0L I CCCC DDDDDDDDD SSSSSSSSS WRLONG D,S/PTR (waits for hub)
--LS 1101001 1L I CCCC DDDDDDDDD SSSSSSSSS FRAC D,S
--LS 1101010 0L I CCCC DDDDDDDDD SSSSSSSSS WRAUX D,S/#0..FF/SPx
--LS 1101010 1L I CCCC DDDDDDDDD SSSSSSSSS WRAUXR D,S/#0..FF/SPx
--LS 1101011 0L I CCCC DDDDDDDDD SSSSSSSSS SETACCA D,S
--LS 1101011 1L I CCCC DDDDDDDDD SSSSSSSSS SETACCB D,S
--LS 1101100 0L I CCCC DDDDDDDDD SSSSSSSSS MACA D,S
--LS 1101100 1L I CCCC DDDDDDDDD SSSSSSSSS MACB D,S
--LS 1101101 0L I CCCC DDDDDDDDD SSSSSSSSS MUL32 D,S
--LS 1101101 1L I CCCC DDDDDDDDD SSSSSSSSS MUL32U D,S
--LS 1101110 0L I CCCC DDDDDDDDD SSSSSSSSS DIV32 D,S
--LS 1101110 1L I CCCC DDDDDDDDD SSSSSSSSS DIV32U D,S
--LS 1101111 0L I CCCC DDDDDDDDD SSSSSSSSS DIV64 D,S
--LS 1101111 1L I CCCC DDDDDDDDD SSSSSSSSS DIV64U D,S
--LS 1110000 0L I CCCC DDDDDDDDD SSSSSSSSS SQRT64 D,S
--LS 1110000 1L I CCCC DDDDDDDDD SSSSSSSSS QSINCOS D,S
--LS 1110001 0L I CCCC DDDDDDDDD SSSSSSSSS QARCTAN D,S
--LS 1110001 1L I CCCC DDDDDDDDD SSSSSSSSS QROTATE D,S
--LS 1110010 0L I CCCC DDDDDDDDD SSSSSSSSS SETSERA D,S (config,baud)
--LS 1110010 1L I CCCC DDDDDDDDD SSSSSSSSS SETSERB D,S (config,baud)
--LS 1110011 0L I CCCC DDDDDDDDD SSSSSSSSS SETCTRS D,S (ctrb,ctra)
--LS 1110011 1L I CCCC DDDDDDDDD SSSSSSSSS SETWAVS D,S (ctrb,ctra)
--LS 1110100 0L I CCCC DDDDDDDDD SSSSSSSSS SETFRQS D,S (ctrb,ctra)
--LS 1110100 1L I CCCC DDDDDDDDD SSSSSSSSS SETPHSS D,S (ctrb,ctra)
--LS 1110101 0L I CCCC DDDDDDDDD SSSSSSSSS ADDPHSS D,S (ctrb,ctra)
--LS 1110101 1L I CCCC DDDDDDDDD SSSSSSSSS SUBPHSS D,S (ctrb,ctra)
--LS 1110110 0L I CCCC DDDDDDDDD SSSSSSSSS SETPIX0 D,S (config, Z)
--LS 1110110 1L I CCCC DDDDDDDDD SSSSSSSSS SETPIX1 D,S (U, V)
--LS 1110111 0L I CCCC DDDDDDDDD SSSSSSSSS SETPIX2 D,S (A, R)
--LS 1110111 1L I CCCC DDDDDDDDD SSSSSSSSS SETPIX3 D,S (G, B)
--LS 111100n nL I CCCC DDDDDDDDD SSSSSSSSS CFGPINS D,S,#0..2 (waits for alt)
--LS 1111001 1L I CCCC DDDDDDDDD SSSSSSSSS JMPTASK D,S (mask,address)
--LS 1111010 0L I CCCC DDDDDDDDD SSSSSSSSS JP D,S
--LS 1111010 1L I CCCC DDDDDDDDD SSSSSSSSS JPD D,S
--LS 1111011 0L I CCCC DDDDDDDDD SSSSSSSSS JNP D,S
--LS 1111011 1L I CCCC DDDDDDDDD SSSSSSSSS JNPD D,S
--RS 1111100 00 I CCCC DDDDDDDDD SSSSSSSSS TJZ D,S
--RS 1111100 01 I CCCC DDDDDDDDD SSSSSSSSS TJZD D,S
--RS 1111100 10 I CCCC DDDDDDDDD SSSSSSSSS TJNZ D,S
--RS 1111100 11 I CCCC DDDDDDDDD SSSSSSSSS TJNZD D,S
--RS 1111101 00 I CCCC DDDDDDDDD SSSSSSSSS TJP D,S
--RS 1111101 01 I CCCC DDDDDDDDD SSSSSSSSS TJPD D,S
--RS 1111101 10 I CCCC DDDDDDDDD SSSSSSSSS TJN D,S
--RS 1111101 11 I CCCC DDDDDDDDD SSSSSSSSS TJND D,S
---- 1111110 0n n nnnn nnnnnnnnn nnniiiiii REPS #1..$40000,#1..64
---- 1111110 10 x BBAA ddddddddd sssssssss SETINDA #s / SETINDB #d / SETINDS #d,#s
---- 1111110 11 x 0B0A ddddddddd sssssssss FIXINDA #d,#s / FIXINDB #d,#s / FIXINDS #d,#s
ZCW- 1111111 ZC x CCCC DDDDDDDDD x00000000 COGID D (waits for hub) (doesn't write if WC)
ZCW- 1111111 ZC x CCCC DDDDDDDDD x00000001 LOCKNEW D (waits for hub)
ZCW- 1111111 ZC x CCCC DDDDDDDDD x00000010 GETCNT D
ZCW- 1111111 ZC x CCCC DDDDDDDDD x00000011 GETCNTX D
ZCW- 1111111 ZC x CCCC DDDDDDDDD x00000100 GETLFSR D
ZCW- 1111111 ZC x CCCC DDDDDDDDD x00000101 GETTOPS D
ZCW- 1111111 ZC x CCCC DDDDDDDDD x00000110 GETACAL D (waits for mac)
ZCW- 1111111 ZC x CCCC DDDDDDDDD x00000111 GETACAH D (waits for mac)
ZCW- 1111111 ZC x CCCC DDDDDDDDD x00001000 GETACBL D (waits for mac)
ZCW- 1111111 ZC x CCCC DDDDDDDDD x00001001 GETACBH D (waits for mac)
ZCW- 1111111 ZC x CCCC DDDDDDDDD x00001010 GETPTRA D
ZCW- 1111111 ZC x CCCC DDDDDDDDD x00001011 GETPTRB D
ZCW- 1111111 ZC x CCCC DDDDDDDDD x00001100 GETSPA D
ZCW- 1111111 ZC x CCCC DDDDDDDDD x00001101 GETSPB D
ZCW- 1111111 ZC x CCCC DDDDDDDDD x00001110 SERINA D (waits for rx if single-task, loops if multi-task, releases if WC)
ZCW- 1111111 ZC x CCCC DDDDDDDDD x00001111 SERINB D (waits for rx if single-task, loops if multi-task, releases if WC)
ZCW- 1111111 ZC x CCCC DDDDDDDDD x00010000 GETMULL D (waits for mul if single-task, loops if multi-task)
ZCW- 1111111 ZC x CCCC DDDDDDDDD x00010001 GETMULH D (waits for mul if single-task, loops if multi-task)
ZCW- 1111111 ZC x CCCC DDDDDDDDD x00010010 GETDIVQ D (waits for div if single-task, loops if multi-task)
ZCW- 1111111 ZC x CCCC DDDDDDDDD x00010011 GETDIVR D (waits for div if single-task, loops if multi-task)
ZCW- 1111111 ZC x CCCC DDDDDDDDD x00010100 GETSQRT D (waits for sqrt if single-task, loops if multi-task)
ZCW- 1111111 ZC x CCCC DDDDDDDDD x00010101 GETQX D (waits for cordic if single-task, loops if multi-task)
ZCW- 1111111 ZC x CCCC DDDDDDDDD x00010110 GETQY D (waits for cordic if single-task, loops if multi-task)
ZCW- 1111111 ZC x CCCC DDDDDDDDD x00010111 GETQZ D (waits for cordic if single-task, loops if multi-task)
ZCW- 1111111 ZC x CCCC DDDDDDDDD x00011000 GETPHSA D
ZCW- 1111111 ZC x CCCC DDDDDDDDD x00011001 GETPHZA D (clears phsa)
ZCW- 1111111 ZC x CCCC DDDDDDDDD x00011010 GETCOSA D
ZCW- 1111111 ZC x CCCC DDDDDDDDD x00011011 GETSINA D
ZCW- 1111111 ZC x CCCC DDDDDDDDD x00011100 GETPHSB D
ZCW- 1111111 ZC x CCCC DDDDDDDDD x00011101 GETPHZB D (clears phsb)
ZCW- 1111111 ZC x CCCC DDDDDDDDD x00011110 GETCOSB D
ZCW- 1111111 ZC x CCCC DDDDDDDDD x00011111 GETSINB D
ZCM- 1111111 ZC x CCCC DDDDDDDDD x00100000 PUSHZC D
ZCM- 1111111 ZC x CCCC DDDDDDDDD x00100001 POPZC D
ZCM- 1111111 ZC x CCCC DDDDDDDDD x00100010 SUBCNT D (subtracts D from CNT, then CNTX if same thread)
ZCM- 1111111 ZC x CCCC DDDDDDDDD x00100011 GETPIX D (waits two clocks, needs two clocks per two prior stages)
--M- 1111111 xx x CCCC DDDDDDDDD x00100100 INCD D (D += $200)
--M- 1111111 xx x CCCC DDDDDDDDD x00100101 DECD D (D -= $200)
--M- 1111111 xx x CCCC DDDDDDDDD x00100110 INCDS D (D += $201)
--M- 1111111 xx x CCCC DDDDDDDDD x00100111 DECDS D (D -= $201)
--L- 1111111 xx L CCCC DDDDDDDDD x00101000 CLKSET D (waits for hub)
--L- 1111111 xx L CCCC DDDDDDDDD x00101001 COGSTOP D (waits for hub)
-CL- 1111111 xC L CCCC DDDDDDDDD x00101010 LOCKSET D (waits for hub)
-CL- 1111111 xC L CCCC DDDDDDDDD x00101011 LOCKCLR D (waits for hub)
--L- 1111111 xx L CCCC DDDDDDDDD x00101100 LOCKRET D (waits for hub)
--L- 1111111 xx L CCCC DDDDDDDDD x00101101 RDQUADC D/PTR (waits for hub if cache miss)
--L- 1111111 xx L CCCC DDDDDDDDD x00101110 RDQUAD D/PTR (waits for hub)
--L- 1111111 xx L CCCC DDDDDDDDD x00101111 WRQUAD D/PTR (waits for hub)
ZCL- 1111111 ZC L CCCC DDDDDDDDD x00110000 GETP D (pin into !Z/C via WZ/WC)
ZCL- 1111111 ZC L CCCC DDDDDDDDD x00110001 GETNP D (pin into Z/!C via WZ/WC)
-CL- 1111111 xC L CCCC DDDDDDDDD x00110010 SEROUTA D (waits for tx if single-task, loops if multi-task, releases if WC)
-CL- 1111111 xC L CCCC DDDDDDDDD x00110011 SEROUTB D (waits for tx if single-task, loops if multi-task, releases if WC)
-CL- 1111111 xC L CCCC DDDDDDDDD x00110100 CMPCNT D (subtracts D from CNT, then CNTX if same thread)
-CL- 1111111 xC L CCCC DDDDDDDDD x00110101 WAITPX D (waits for any edge, +CNT if WC)
-CL- 1111111 xC L CCCC DDDDDDDDD x00110110 WAITPR D (waits for pos edge, +CNT if WC)
-CL- 1111111 xC L CCCC DDDDDDDDD x00110111 WAITPF D (waits for neg edge, +CNT if WC)
ZCL- 1111111 ZC L CCCC DDDDDDDDD x00111000 SETZC D (D[1:0] into Z/C via WZ/WC)
--L- 1111111 xx L CCCC DDDDDDDDD x00111001 SETTASK D
--L- 1111111 xx L CCCC DDDDDDDDD x00111010 SETMAP D
--L- 1111111 xx L CCCC DDDDDDDDD x00111011 SETXCH D
--L- 1111111 xx L CCCC DDDDDDDDD x00111100 SETXFR D
--L- 1111111 xx L CCCC DDDDDDDDD x00111101 SARACCA D (waits for mac)
--L- 1111111 xx L CCCC DDDDDDDDD x00111110 SARACCB D (waits for mac)
--L- 1111111 xx L CCCC DDDDDDDDD x00111111 SARACCS D (waits for mac)
--L- 1111111 xx L CCCC DDDDDDDDD x01iiiiii REPD D,#1..64 (REPD $1FF,#1..64 = infinite repeat)
--L- 1111111 xx L CCCC DDDDDDDDD x10000000 SETSPA D
--L- 1111111 xx L CCCC DDDDDDDDD x10000001 SETSPB D
--L- 1111111 xx L CCCC DDDDDDDDD x10000010 ADDSPA D
--L- 1111111 xx L CCCC DDDDDDDDD x10000011 ADDSPB D
--L- 1111111 xx L CCCC DDDDDDDDD x10000100 SUBSPA D
--L- 1111111 xx L CCCC DDDDDDDDD x10000101 SUBSPB D
--L- 1111111 xx L CCCC DDDDDDDDD x10000110 SETQUAD D
--L- 1111111 xx L CCCC DDDDDDDDD x10000111 SETQUAZ D
--L- 1111111 xx L CCCC DDDDDDDDD x10001000 SETPTRA D
--L- 1111111 xx L CCCC DDDDDDDDD x10001001 SETPTRB D
--L- 1111111 xx L CCCC DDDDDDDDD x10001010 ADDPTRA D
--L- 1111111 xx L CCCC DDDDDDDDD x10001011 ADDPTRB D
--L- 1111111 xx L CCCC DDDDDDDDD x10001100 SUBPTRA D
--L- 1111111 xx L CCCC DDDDDDDDD x10001101 SUBPTRB D
--L- 1111111 xx L CCCC DDDDDDDDD x10001110 PASSCNT D (loops if (CNT - D) msb set)
--L- 1111111 xx L CCCC DDDDDDDDD x10001111 WAIT D (waits)
--L- 1111111 xx L CCCC DDDDDDDDD x10010000 CALLA D
--L- 1111111 xx L CCCC DDDDDDDDD x10010001 CALLB D
--L- 1111111 xx L CCCC DDDDDDDDD x10010010 CALLAR D
--L- 1111111 xx L CCCC DDDDDDDDD x10010011 CALLBR D
--L- 1111111 xx L CCCC DDDDDDDDD x10010100 CALLAD D
--L- 1111111 xx L CCCC DDDDDDDDD x10010101 CALLBD D
--L- 1111111 xx L CCCC DDDDDDDDD x10010110 CALLARD D
--L- 1111111 xx L CCCC DDDDDDDDD x10010111 CALLBRD D
--L- 1111111 xx L CCCC DDDDDDDDD x10011000 OFFP D
--L- 1111111 xx L CCCC DDDDDDDDD x10011001 NOTP D
--L- 1111111 xx L CCCC DDDDDDDDD x10011010 CLRP D
--L- 1111111 xx L CCCC DDDDDDDDD x10011011 SETP D
--L- 1111111 xx L CCCC DDDDDDDDD x10011100 SETPC D
--L- 1111111 xx L CCCC DDDDDDDDD x10011101 SETPNC D
--L- 1111111 xx L CCCC DDDDDDDDD x10011110 SETPZ D
--L- 1111111 xx L CCCC DDDDDDDDD x10011111 SETPNZ D
--L- 1111111 xx L CCCC DDDDDDDDD x10100000 DIV64D D
--L- 1111111 xx L CCCC DDDDDDDDD x10100001 SQRT32 D
--L- 1111111 xx L CCCC DDDDDDDDD x10100010 QLOG D
--L- 1111111 xx L CCCC DDDDDDDDD x10100011 QEXP D
--L- 1111111 xx L CCCC DDDDDDDDD x10100100 SETQI D
--L- 1111111 xx L CCCC DDDDDDDDD x10100101 SETQZ D
--L- 1111111 xx L CCCC DDDDDDDDD x10100110 CFGDACS D
--L- 1111111 xx L CCCC DDDDDDDDD x10100111 SETDACS D
--L- 1111111 xx L CCCC DDDDDDDDD x10101000 CFGDAC0 D
--L- 1111111 xx L CCCC DDDDDDDDD x10101001 CFGDAC1 D
--L- 1111111 xx L CCCC DDDDDDDDD x10101010 CFGDAC2 D
--L- 1111111 xx L CCCC DDDDDDDDD x10101011 CFGDAC3 D
--L- 1111111 xx L CCCC DDDDDDDDD x10101100 SETDAC0 D
--L- 1111111 xx L CCCC DDDDDDDDD x10101101 SETDAC1 D
--L- 1111111 xx L CCCC DDDDDDDDD x10101110 SETDAC2 D
--L- 1111111 xx L CCCC DDDDDDDDD x10101111 SETDAC3 D
--L- 1111111 xx L CCCC DDDDDDDDD x10110000 SETCTRA D
--L- 1111111 xx L CCCC DDDDDDDDD x10110001 SETWAVA D
--L- 1111111 xx L CCCC DDDDDDDDD x10110010 SETFRQA D
--L- 1111111 xx L CCCC DDDDDDDDD x10110011 SETPHSA D
--L- 1111111 xx L CCCC DDDDDDDDD x10110100 ADDPHSA D
--L- 1111111 xx L CCCC DDDDDDDDD x10110101 SUBPHSA D
--L- 1111111 xx L CCCC DDDDDDDDD x10110110 SETVID D
--L- 1111111 xx L CCCC DDDDDDDDD x10110111 SETVIDY D
--L- 1111111 xx L CCCC DDDDDDDDD x10111000 SETCTRB D
--L- 1111111 xx L CCCC DDDDDDDDD x10111001 SETWAVB D
--L- 1111111 xx L CCCC DDDDDDDDD x10111010 SETFRQB D
--L- 1111111 xx L CCCC DDDDDDDDD x10111011 SETPHSB D
--L- 1111111 xx L CCCC DDDDDDDDD x10111100 ADDPHSB D
--L- 1111111 xx L CCCC DDDDDDDDD x10111101 SUBPHSB D
--L- 1111111 xx L CCCC DDDDDDDDD x10111110 SETVIDI D
--L- 1111111 xx L CCCC DDDDDDDDD x10111111 SETVIDQ D
--L- 1111111 xx L CCCC DDDDDDDDD x11000000 SETPIX D
--L- 1111111 xx L CCCC DDDDDDDDD x11000001 SETPIXZ D
--L- 1111111 xx L CCCC DDDDDDDDD x11000010 SETPIXU D
--L- 1111111 xx L CCCC DDDDDDDDD x11000011 SETPIXV D
--L- 1111111 xx L CCCC DDDDDDDDD x11000100 SETPIXA D
--L- 1111111 xx L CCCC DDDDDDDDD x11000101 SETPIXR D
--L- 1111111 xx L CCCC DDDDDDDDD x11000110 SETPIXG D
--L- 1111111 xx L CCCC DDDDDDDDD x11000111 SETPIXB D
--L- 1111111 xx L CCCC DDDDDDDDD x11001000 SETPORA D
--L- 1111111 xx L CCCC DDDDDDDDD x11001001 SETPORB D
--L- 1111111 xx L CCCC DDDDDDDDD x11001010 SETPORC D
--L- 1111111 xx L CCCC DDDDDDDDD x11001011 SETPORD D
--L- 1111111 xx L CCCC DDDDDDDDD x11001100 SETRACE D
---- 1111111 xx x CCCC xxxxxxxxx x11001101 CLRACCA
---- 1111111 xx x CCCC xxxxxxxxx x11001110 CLRACCB
---- 1111111 xx x CCCC xxxxxxxxx x11001111 CLRACCS
ZC-- 1111111 ZC x CCCC xxxxxxxxx x11010000 RETA
ZC-- 1111111 ZC x CCCC xxxxxxxxx x11010001 RETB
ZC-- 1111111 ZC x CCCC xxxxxxxxx x11010010 RETAR
ZC-- 1111111 ZC x CCCC xxxxxxxxx x11010011 RETBR
ZC-- 1111111 ZC x CCCC xxxxxxxxx x11010100 RETAD
ZC-- 1111111 ZC x CCCC xxxxxxxxx x11010101 RETBD
ZC-- 1111111 ZC x CCCC xxxxxxxxx x11010110 RETARD
ZC-- 1111111 ZC x CCCC xxxxxxxxx x11010111 RETBRD
ZC-- 1111111 ZC x CCCC xxxxxxxxx x11011000 TESTSPA
ZC-- 1111111 ZC x CCCC xxxxxxxxx x11011001 TESTSPB
ZC-- 1111111 ZC x CCCC xxxxxxxxx x11011010 POLCTRA (ctra-rollover into !Z/C)
ZC-- 1111111 ZC x CCCC xxxxxxxxx x11011011 POLCTRB (ctra-rollover into !Z/C)
ZC-- 1111111 ZC x CCCC xxxxxxxxx x11011100 POLVID (vid-ready into !Z/C)
---- 1111111 xx x CCCC xxxxxxxxx x11011101 CAPCTRA
---- 1111111 xx x CCCC xxxxxxxxx x11011110 CAPCTRB
---- 1111111 xx x CCCC xxxxxxxxx x11011111 CAPCTRS
---- 1111111 xx x CCCC xxxxxxxxx x11100000 SYNCTRA (waits for ctra if single-task, loops if multi-task))
---- 1111111 xx x CCCC xxxxxxxxx x11100001 SYNCTRB (waits for ctrb if single-task, loops if multi-task))
---- 1111111 xx x CCCC xxxxxxxxx x11100010 CACHEX
x = don't care, use 0
---------------------------------------------------------------------------------------------------------------------------------------------------------------
ZC effects
------------------------------------------------------------------------------------------------
00 <none>
01 wc
10 wz
11 wz, wc
CCCC condition (easier-to-read list)
-------------------------------------------------------------------------------------------------
0000 never 1111 always (default)
0001 nc & nz 1100 if_c if_b
0010 nc & z 0011 if_nc if_ae
0011 nc 1010 if_z if_e
0100 c & nz 0101 if_nz if_ne
0101 nz 1000 if_c_and_z if_z_and_c
0110 c <> z 0100 if_c_and_nz if_nz_and_c
0111 nc | nz 0010 if_nc_and_z if_z_and_nc
1000 c & z 0001 if_nc_and_nz if_nz_and_nc if_a
1001 c = z 1110 if_c_or_z if_z_or_c if_be
1010 z 1101 if_c_or_nz if_nz_or_c
1011 nc | z 1011 if_nc_or_z if_z_or_nc
1100 c 0111 if_nc_or_nz if_nz_or_nc
1101 c | nz 1001 if_c_eq_z if_z_eq_c
1110 c | z 0110 if_c_ne_z if_z_ne_c
1111 always 0000 never
CCCC inda/indb - CCCC=1111 after first stage of pipeline if inda/indb used (indx=inda/indb)
-------------------------------------------------------------------------------------------------
xx00 source indx
xx01 source indx++
xx10 source indx--
xx11 source ++indx
00xx destination indx
01xx destination indx++
10xx destination indx--
11xx destination ++indx
I SSSSSSSSS source operand
-------------------------------------------------------------------------------------------------
0/na SSSSSSSSS register
1 #SSSSSSSSS immediate, zero-extended
L DDDDDDDDD destination operand
-------------------------------------------------------------------------------------------------
0/na DDDDDDDDD register
1 #DDDDDDDDD immediate, zero-extended
You're welcome, and thanks for your ongoing support.
No problem! It's fun being on the leading edge of this. So, I guess I'll press my luck and ask when we might see a new FPGA configuration that matches this new instruction set?
CRC16 and CRC32 are different algorithms and aren't implemented the same way. CRC32 has some reversing in it that CRC16 doesn't have.
I could be missing something obvious, but the algorithms are different.
pedward, I'm pretty sure you have more (and better) experience in this area than me, but from what I found during some research I did, I was led to the conclusion that the reversals (and inversions) in CRC32 are just a convention and not really an integral part of the algorithm. I mean, the reversals and inversions are employed in some implementations and not in others for the same reasons that different polynomials are employed in different situations based on the needs, nature of the data, nature of the noise possibilities, etc.
Am I wrong?
On a related note:
For what it's worth, I've been wanting instructions in the Propeller 2 for CRC as well and feel stupid for not thinking of it earlier on in the Propeller 2 development. By the time I had suggested it, Chip said it was too late to put it in. Things have changed now, but I'm not sure if he will be implementing it. My motivation is that they are incredibly handy for communications over untrustworthy mediums and if we had a very fast way to calculate them (even a limited set of key CRCs) we could easily harden communication protocols implemented with the Propeller.
Many years ago, I studied CRC-16-CCITT at a customer's prompting because he wanted to use a BS2-IC to parse and validate the CRC preset in a device's output. Making it reasonably fast but also small, in code size, was also a priority. After heavily studying it, I had a eureka moment and boiled it down to this PBASIC code (essentially two lines of executable code to implement CRC-16-CCITT's polynomial x^16+x^12+x^5+1):
[FONT=Courier New]'CRC must be cleared to 0 before start.
'Each byte must be put in CValue, then CalCRC should be called.
'CRC is then equal to current CRC value for all CValue bytes processed.
CRC VAR WORD 'Calculate CRC value
CRCL VAR CRC.LOWBYTE 'Low byte of calculated CRC value
CRCH VAR CRC.HIGHBYTE 'High byte of calculated CRC value
CValue VAR BYTE 'Temporary holder of value for CRC calculation
'-------------------------------- CRC Checksum Calculation Routine -------------------------------
CalcCRC:
CValue= CRCH^CValue>>4^(CRCH^CValue)
CRC = CValue^(CValue<<5)^(CValue<<12)^(CRC << 8)
RETURN[/FONT]
In the years since, I hadn't seen anyone else implement it this way and I was never fond of other implementations. Also, I lost my notes (and knowledge) of how the heck I ever came to the conclusion that I could boil it down to that.
Well I finally dove back into studying this last year and relearned what I had figured out that led me to that conclusion, then I applied it to CRC-32. (Incidentally, I did run across someone else implementing it this way since then, but haven't seen it formally documented anywhere).
My goal was to come up with something that could be implemented in Propeller 2 hardware very quickly, and I think I found it. I sent some of this information to Chip for his consideration, but as I said, it was too late at the time.
Without his feedback, I still don't know how few clock cycles it could really have been implemented in, and because it's limited to pre-defined polynomials, it's probably not the method everyone would agree with using anyway... but I thought I would share it here now.
(See Attached)
The CRC.exe was just a Win64 program I wrote to test out a few things and verify that my algorithm worked as I studied CRC-16-CCITT and CRC-32.
The CRC-32.spin code was what I created to test out my CRC-32 algorithm and wrap it to the smallest I could make it.
The two word documents (very wide format pages) "CRC-16-CCITT 8-bit Data (For Chip).doc" and "CRC-32, 8-bit Data (For Chip).doc" where a summary of the results of my studying this in order to show Chip that patterns that could be exploited in hardware for the given polynomials.
The \Other Info\... docs are the versions where I documented things a bit more fully as I studied the patterns.
The "Sample Delphi Code.txt" is an excerpt of what's used in thr CRC.exe as I tested things.
The result of the CRC-32.spin is this:
CON
_clkmode = xtal1 + pll16x ' Crystal and PLL settings.
_xinfreq = 5_000_000 ' 5 MHz crystal (5 MHz x 16 = 80 MHz).
VAR
long CRC
OBJ
pst : "Parallax Serial Terminal" ' Serial communication object
PUB go | value
pst.Start(115200) ' Start the Parallax Serial Terminal cog
CRC := $FFFF_FFFF 'Invert Initial CRC
CalcCRC("A")
' CalcCRC("B")
' CalcCRC("C")
' CalcCRC("D")
CRC ><= 32 'Reverse CRC Pre-Final
CRC ^= $FFFF_FFFF 'Invert Final CRC
pst.Hex(CRC, 8)
PUB CalcCRC(DataByte) | B
DataByte ><= 8 'Reverse Data Bytes
CRC ^= (B:=((CRC^=((CRC:=(CRC<-8)^DataByte)>>6)&$3)&$FF)<<1)^B<<1^B<<3^B<<4^B<<6^B<<7^B<<9^B<<10^B<<11^B<<15^B<<21^B<<22^B<<25
...where the real work is done by the single expression CRC ^=... at the very bottom of the code.
In the end, I figured if we could make a single-instruction-cycle instruction to do the core of the CRC calculation for a couple specific CRC polynomials, the reversing and inversion (options) sometimes applied could be handled by other instructions (leaving them "optional" and applied only as needed) without much loss and still a huge gain.
But nothing became of it, so I don't know if this was all sane or not.
I recently coded up a fast CRC algorithm on the prop. Since I use it to just validate large blocks of data it works great.
' /////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
' ///////////////////// CRC-16-CCITT Subrountine //////////////////////////////////////////////////////////////////////
crcTable long $0000, $1081, $2102, $3183
long $4204, $5285, $6306, $7387
long $8408, $9489, $a50a, $b58b
long $c60c, $d68d, $e70e, $f78f
h0fff long $0fff
hffff long $ffff
crcChecksumBegin mov crc, hffff
crcChecksumBegin_ret ret
crcChecksum mov i, #8
crcChecksumLoop mov j, crc
xor crc, c
and crc, #15
movs crcChecksumLookup, #crcTable
add crcChecksumLookup, crc
shr j, #4
and j, h0fff
crcChecksumLookup mov crc, crcTable
xor crc, j
shr c, #4
djnz i, #crcChecksumLoop
crcChecksum_ret ret
crcChecksumEnd xor crc, hffff
and crc, hffff
crcChecksumEnd_ret ret
i res 1
j res 1
c res 1
crc res 1
Start with calling the begin function. Put the 32-bit value in the c var and call the checksum function. Repeat for any other longs. You can self-modify the crcChecksum label to #4 to work on words and #2 to work on bytes. After you're done call the end function.
There's always the table-lookup method if you're willing to use 512 bytes for CRC16 and 1K for CRC32. Here's a CRC16 routine in Spin. It would be quite a bit faster in PASM.
dat
fcstab word
word $0000,$1021,$2042,$3063,$4084,$50a5,$60c6,$70e7
word $8108,$9129,$a14a,$b16b,$c18c,$d1ad,$e1ce,$f1ef
word $1231,$0210,$3273,$2252,$52b5,$4294,$72f7,$62d6
word $9339,$8318,$b37b,$a35a,$d3bd,$c39c,$f3ff,$e3de
word $2462,$3443,$0420,$1401,$64e6,$74c7,$44a4,$5485
word $a56a,$b54b,$8528,$9509,$e5ee,$f5cf,$c5ac,$d58d
word $3653,$2672,$1611,$0630,$76d7,$66f6,$5695,$46b4
word $b75b,$a77a,$9719,$8738,$f7df,$e7fe,$d79d,$c7bc
word $48c4,$58e5,$6886,$78a7,$0840,$1861,$2802,$3823
word $c9cc,$d9ed,$e98e,$f9af,$8948,$9969,$a90a,$b92b
word $5af5,$4ad4,$7ab7,$6a96,$1a71,$0a50,$3a33,$2a12
word $dbfd,$cbdc,$fbbf,$eb9e,$9b79,$8b58,$bb3b,$ab1a
word $6ca6,$7c87,$4ce4,$5cc5,$2c22,$3c03,$0c60,$1c41
word $edae,$fd8f,$cdec,$ddcd,$ad2a,$bd0b,$8d68,$9d49
word $7e97,$6eb6,$5ed5,$4ef4,$3e13,$2e32,$1e51,$0e70
word $ff9f,$efbe,$dfdd,$cffc,$bf1b,$af3a,$9f59,$8f78
word $9188,$81a9,$b1ca,$a1eb,$d10c,$c12d,$f14e,$e16f
word $1080,$00a1,$30c2,$20e3,$5004,$4025,$7046,$6067
word $83b9,$9398,$a3fb,$b3da,$c33d,$d31c,$e37f,$f35e
word $02b1,$1290,$22f3,$32d2,$4235,$5214,$6277,$7256
word $b5ea,$a5cb,$95a8,$8589,$f56e,$e54f,$d52c,$c50d
word $34e2,$24c3,$14a0,$0481,$7466,$6447,$5424,$4405
word $a7db,$b7fa,$8799,$97b8,$e75f,$f77e,$c71d,$d73c
word $26d3,$36f2,$0691,$16b0,$6657,$7676,$4615,$5634
word $d94c,$c96d,$f90e,$e92f,$99c8,$89e9,$b98a,$a9ab
word $5844,$4865,$7806,$6827,$18c0,$08e1,$3882,$28a3
word $cb7d,$db5c,$eb3f,$fb1e,$8bf9,$9bd8,$abbb,$bb9a
word $4a75,$5a54,$6a37,$7a16,$0af1,$1ad0,$2ab3,$3a92
word $fd2e,$ed0f,$dd6c,$cd4d,$bdaa,$ad8b,$9de8,$8dc9
word $7c26,$6c07,$5c64,$4c45,$3ca2,$2c83,$1ce0,$0cc1
word $ef1f,$ff3e,$cf5d,$df7c,$af9b,$bfba,$8fd9,$9ff8
word $6e17,$7e36,$4e55,$5e74,$2e93,$3eb2,$0ed1,$1ef0
PUB ComputeCRC(ptr, num)
repeat num
result := ((result << 8) & $ffff) ^ fcstab[(result >> 8) ^ byte[ptr++]]
I coded CRC16-IBM (x^16+x^15+x^2+1) in the early 80's. This is different to the CRC16-CCITT (x^16+x^12+x^5+1). However, there are also variants that invert the polynomial, and also reverse the CRC16 bits (likely a misinterpretation at the time eg xmodem). Also the initial value varies too (eg CRC16-IBM at least when used with USB starts with $FFFF and the poly $A001 is the XOR value). CRC16-CCITT seems to start with $FFFF and uses XOR $8408, whereas CRC16-CCITT/XMODEM uses XOR $1021 (because xmodem send the crc16 in reverse order).
CRC5 is also used in USB but much of this can be precalculated.
I have seen a generalised implementation block of a variable length CRC generator. I will try to find it again. Beware however, there are some errors on the internet about implementations which are incorrect.
Since we can set the initial value of "crcbits" and the value of "poly" this might/should/could??? work for other polynomial lengths from what I have seen.
The instruction could be simplified by using a register (eg ACCx and preset it with "poly" or else the crc initial value) such that the instruction was in the form
CRC D WC
With this method and using the REPx instruction, the CRC cold be calculated in 16 clocks (plus setup) for 8 bits.
For further investigation.....
1. Can we use the same instruction for other polynomial lengths?
2. Can we use a constant for the polynomial and do the inversion differences in sw?
Should we start a new thread just to discuss the CRC generation requested?
Here is an update to the instruction set (Excel format) in my previous post (includes the [#] options for D & S and the WC/WZ flags) P2_Instruction_Set_20131102a.zip
I have been out of the loop for a while, but it seems now that the instruction set is changing for the P2?
When then is the P2 expected to be in production?
Thanks,
Doug
I think these changes are being done in some down time that was required anyway before another spin of the chip. Supposedly, the changes don't introduce any additional delays in the production dates than were required by the change in fabrication and the need to resynthesize.
I've been tremendously busy with my new job, so I'll just throw out a couple thoughts here.
I already told Chip that I thought CRC instructions were compulsory for doing the stuff he wants, I couldn't see any way to really get a routine below 8 clocks per accumulate.
I've seen Verilog for doing HW CRC, it's not difficult to implement, there is even a web page that will take a polynomial and spit out static Verilog that does it all.
I want 2 CRC modes, a 16 bit CRC and 32 bit. The polynomial is user defined, so any number of CRC modes can be emulated.
Example:
SETCRC16 D/# ; set the polynomial for CRC16
CRC16 D, S ; accumulate S into D using polynomial
The proposal would then be 4 new instructions:
SETCRC16 ; set CRC16 polynomial
CRC16 ; CRC16 accumulate instruction
SETCRC32 ; set CRC32 polynomial
CRC32 ; CRC32 accumulate instruction
Perhaps a dedicated USB CRC5 instruction could be made? Is the CRC5 used as part of the fast path of USB comms, or just uses that have low performance requirements?
I see CRC5 is used in RFID too, but that's hardly a high speed communication protocol.
CRC5 in USB can mostly be precalculated because once you know your allocated USB address you can precalculate most of the CRC5. This is how BradC did USB LS in P1.
Back on the SETRACE topic, I think I already saw this suggested, but when SETRACE is enabled, could you add an input pin for clock gating so that you can halt a cog by pulling it high or low? Also, if when the cog was halted, the cog and stack ram got mapped to an unused hub region, that would enable us to emulate every debug feature that any other processor/controller has, that I'm aware of.
Back on the SETRACE topic, I think I already saw this suggested, but when SETRACE is enabled, could you add an input pin for clock gating so that you can halt a cog by pulling it high or low?
Yes, I have asked for this with an enable instruction. Would be fantastic to be able to slowly step through some buggy sw.
Also, if when the cog was halted, the cog and stack ram got mapped to an unused hub region, that would enable us to emulate every debug feature that any other processor/controller has, that I'm aware of.
I don't think this is possible as the blocks are completely separate to the hub ram. These blocks have been hand laid and are not part of the current changes which only involves the synthesis sections (the cog instruction sets and counters etc).
Comments
SETRACE can output to the XCH pins, which are completely internal and will work fine at the max frequency. All peripherals have been fully hooked into Port D now, so you could have the trace go to the XCH output so that another cog could view it.
To make a tracer, you could have a cog view the target cog's 16 bit state through his XCH (port D) window, while his XFR circuit read those words every clock and stuffed them into the AUX RAM. He would use a WAITPEQ on Port D to capture the trigger event. Once WAITPEQ released, he would disable his XFR and do a GETSPA to discover where the buffer ended. Then he would have a history in the AUX RAM of what the target cog was doing before the breakpoint. Or, he could wait up to 512 clocks after WAITPEQ to get a post-trigger history before stopping XFR.
Some of you are concerned about timeline and adding features. To see how innocuous something like SETTRACE is, look at the Verilog code to implement it:
'Setrace' goes high when the SETRACE instruction executes. The 4 LSB's of D are captured into 'trace' which was cleared to %0000 on cog start. If bit 3 is high, bits 2..0 determine which word of output pins gets the 16 bits of trace data. 'Trace_outp' gets OR'd into the 128 output pin signals.
Anyway, you can see that something like SETRACE is very simple.
I already told Chip that I thought CRC instructions were compulsory for doing the stuff he wants, I couldn't see any way to really get a routine below 8 clocks per accumulate.
I've seen Verilog for doing HW CRC, it's not difficult to implement, there is even a web page that will take a polynomial and spit out static Verilog that does it all.
I want 2 CRC modes, a 16 bit CRC and 32 bit. The polynomial is user defined, so any number of CRC modes can be emulated.
Example:
SETCRC16 D/# ; set the polynomial for CRC16
CRC16 D, S ; accumulate S into D using polynomial
The proposal would then be 4 new instructions:
SETCRC16 ; set CRC16 polynomial
CRC16 ; CRC16 accumulate instruction
SETCRC32 ; set CRC32 polynomial
CRC32 ; CRC32 accumulate instruction
How many clocks would be required by the CRC16/32 instruction? I presume you are feeding it a byte to be accumulated?
A lookup table in hub does work fairly well though. Of course memory is used. I think the CLUT is more valuable for other things than a lookup table.
Thanks for posting. I didn't think SETTRACE was much Verilog.
BTW Couldn't reply with quote because the (percent sign) causes editing problems.
Thanks!
David
Edit: I just went back to your earlier post of an instruction set dated 10/16. If that is still accurate then I guess there is no need to repost it. Can you confirm?
CRC16 and CRC32 are different algorithms and aren't implemented the same way. CRC32 has some reversing in it that CRC16 doesn't have.
I could be missing something obvious, but the algorithms are different.
A few things have changed in the last two weeks. Here is the latest. This will have to change just a little bit more to accommodate pixel blending:
Will these changes make the decoding of instructions simpler and thus synthesis less complicated?
The instruction decoding probably got a little more complicated in hardware, but this is simpler for us people to deal with.
You're welcome, and thanks for your ongoing support.
pedward, I'm pretty sure you have more (and better) experience in this area than me, but from what I found during some research I did, I was led to the conclusion that the reversals (and inversions) in CRC32 are just a convention and not really an integral part of the algorithm. I mean, the reversals and inversions are employed in some implementations and not in others for the same reasons that different polynomials are employed in different situations based on the needs, nature of the data, nature of the noise possibilities, etc.
Am I wrong?
On a related note:
For what it's worth, I've been wanting instructions in the Propeller 2 for CRC as well and feel stupid for not thinking of it earlier on in the Propeller 2 development. By the time I had suggested it, Chip said it was too late to put it in. Things have changed now, but I'm not sure if he will be implementing it. My motivation is that they are incredibly handy for communications over untrustworthy mediums and if we had a very fast way to calculate them (even a limited set of key CRCs) we could easily harden communication protocols implemented with the Propeller.
Many years ago, I studied CRC-16-CCITT at a customer's prompting because he wanted to use a BS2-IC to parse and validate the CRC preset in a device's output. Making it reasonably fast but also small, in code size, was also a priority. After heavily studying it, I had a eureka moment and boiled it down to this PBASIC code (essentially two lines of executable code to implement CRC-16-CCITT's polynomial x^16+x^12+x^5+1):
In the years since, I hadn't seen anyone else implement it this way and I was never fond of other implementations. Also, I lost my notes (and knowledge) of how the heck I ever came to the conclusion that I could boil it down to that.
Well I finally dove back into studying this last year and relearned what I had figured out that led me to that conclusion, then I applied it to CRC-32. (Incidentally, I did run across someone else implementing it this way since then, but haven't seen it formally documented anywhere).
My goal was to come up with something that could be implemented in Propeller 2 hardware very quickly, and I think I found it. I sent some of this information to Chip for his consideration, but as I said, it was too late at the time.
Without his feedback, I still don't know how few clock cycles it could really have been implemented in, and because it's limited to pre-defined polynomials, it's probably not the method everyone would agree with using anyway... but I thought I would share it here now.
(See Attached)
The result of the CRC-32.spin is this:
...where the real work is done by the single expression CRC ^=... at the very bottom of the code.
In the end, I figured if we could make a single-instruction-cycle instruction to do the core of the CRC calculation for a couple specific CRC polynomials, the reversing and inversion (options) sometimes applied could be handled by other instructions (leaving them "optional" and applied only as needed) without much loss and still a huge gain.
But nothing became of it, so I don't know if this was all sane or not.
Start with calling the begin function. Put the 32-bit value in the c var and call the checksum function. Repeat for any other longs. You can self-modify the crcChecksum label to #4 to work on words and #2 to work on bytes. After you're done call the end function.
CRC5 is also used in USB but much of this can be precalculated.
I have seen a generalised implementation block of a variable length CRC generator. I will try to find it again. Beware however, there are some errors on the internet about implementations which are incorrect.
I would favour a simple CRC 1-bit instruction - as I described here
http://forums.parallax.com/showthread.php/150685-P2-Serial-Shift-Register-discussion?p=1216081&viewfull=1#post1216081
CRC16 crcbits, poly wc ' C --> (crcbits + (crcbits[16] XOR $A001) ) >> 1
Since we can set the initial value of "crcbits" and the value of "poly" this might/should/could??? work for other polynomial lengths from what I have seen.
The instruction could be simplified by using a register (eg ACCx and preset it with "poly" or else the crc initial value) such that the instruction was in the form
CRC D WC
With this method and using the REPx instruction, the CRC cold be calculated in 16 clocks (plus setup) for 8 bits.
For further investigation.....
1. Can we use the same instruction for other polynomial lengths?
2. Can we use a constant for the polynomial and do the inversion differences in sw?
Should we start a new thread just to discuss the CRC generation requested?
http://outputlogic.com/?page_id=321
P2_Instruction_Set_20131102.zip
P2_Instruction_Set_20131102a.zip
When then is the P2 expected to be in production?
Thanks,
Doug
Should not setrace be also part of always sensitivity list?
No, because setrace does not instigate the action.
For USB even CRC5 are needed
I see CRC5 is used in RFID too, but that's hardly a high speed communication protocol.