Shop OBEX P1 Docs P2 Docs Learn Events
Propeller II update - BLOG - Page 96 — Parallax Forums

Propeller II update - BLOG

19394969899223

Comments

  • cgraceycgracey Posts: 14,155
    edited 2013-10-31 22:42
    Yanomani wrote: »
    How can SETTRACE operate, at the clock cycle level, @ 160 MHz?

    SETRACE can output to the XCH pins, which are completely internal and will work fine at the max frequency. All peripherals have been fully hooked into Port D now, so you could have the trace go to the XCH output so that another cog could view it.

    To make a tracer, you could have a cog view the target cog's 16 bit state through his XCH (port D) window, while his XFR circuit read those words every clock and stuffed them into the AUX RAM. He would use a WAITPEQ on Port D to capture the trigger event. Once WAITPEQ released, he would disable his XFR and do a GETSPA to discover where the buffer ended. Then he would have a history in the AUX RAM of what the target cog was doing before the breakpoint. Or, he could wait up to 512 clocks after WAITPEQ to get a post-trigger history before stopping XFR.

    Some of you are concerned about timeline and adding features. To see how innocuous something like SETTRACE is, look at the Verilog code to implement it:
    // trace output
    
    reg [3:0] trace;
    
    always @(posedge clk or negedge ena)
    if (!ena)
    	trace <= 4'b0;
    else if (setrace)
    	trace <= d[3:0];
    
    wire [15:0] tracew	= {z, c, go, cond, v, t[1:0], p[8:0]};
    
    wire [127:0] trace_outp	= {	trace == 4'b1_111 ? tracew : 16'b0,
    				trace == 4'b1_110 ? tracew : 16'b0,
    				trace == 4'b1_101 ? tracew : 16'b0,
    				trace == 4'b1_100 ? tracew : 16'b0,
    				trace == 4'b1_011 ? tracew : 16'b0,
    				trace == 4'b1_010 ? tracew : 16'b0,
    				trace == 4'b1_001 ? tracew : 16'b0,
    				trace == 4'b1_000 ? tracew : 16'b0	};
    


    'Setrace' goes high when the SETRACE instruction executes. The 4 LSB's of D are captured into 'trace' which was cleared to %0000 on cog start. If bit 3 is high, bits 2..0 determine which word of output pins gets the 16 bits of trace data. 'Trace_outp' gets OR'd into the 128 output pin signals.

    Anyway, you can see that something like SETRACE is very simple.
  • pedwardpedward Posts: 1,642
    edited 2013-10-31 22:52
    I've been tremendously busy with my new job, so I'll just throw out a couple thoughts here.

    I already told Chip that I thought CRC instructions were compulsory for doing the stuff he wants, I couldn't see any way to really get a routine below 8 clocks per accumulate.

    I've seen Verilog for doing HW CRC, it's not difficult to implement, there is even a web page that will take a polynomial and spit out static Verilog that does it all.

    I want 2 CRC modes, a 16 bit CRC and 32 bit. The polynomial is user defined, so any number of CRC modes can be emulated.

    Example:

    SETCRC16 D/# ; set the polynomial for CRC16
    CRC16 D, S ; accumulate S into D using polynomial

    The proposal would then be 4 new instructions:

    SETCRC16 ; set CRC16 polynomial
    CRC16 ; CRC16 accumulate instruction
    SETCRC32 ; set CRC32 polynomial
    CRC32 ; CRC32 accumulate instruction
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-10-31 23:50
    pedward wrote: »
    I've been tremendously busy with my new job, so I'll just throw out a couple thoughts here.

    I already told Chip that I thought CRC instructions were compulsory for doing the stuff he wants, I couldn't see any way to really get a routine below 8 clocks per accumulate.

    I've seen Verilog for doing HW CRC, it's not difficult to implement, there is even a web page that will take a polynomial and spit out static Verilog that does it all.

    I want 2 CRC modes, a 16 bit CRC and 32 bit. The polynomial is user defined, so any number of CRC modes can be emulated.

    Example:

    SETCRC16 D/# ; set the polynomial for CRC16
    CRC16 D, S ; accumulate S into D using polynomial

    The proposal would then be 4 new instructions:

    SETCRC16 ; set CRC16 polynomial
    CRC16 ; CRC16 accumulate instruction
    SETCRC32 ; set CRC32 polynomial
    CRC32 ; CRC32 accumulate instruction
    From looking at the various CRC implementations, it would seem that any length polynomial could be done without requiring different CRC16 and CRC32 instruction sets. At the end, the upper bits would just be cleared if the length was less than CRC32. This would then make only 2 instructions necessary.

    How many clocks would be required by the CRC16/32 instruction? I presume you are feeding it a byte to be accumulated?

    A lookup table in hub does work fairly well though. Of course memory is used. I think the CLUT is more valuable for other things than a lookup table.
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-10-31 23:57
    Chip,

    Thanks for posting. I didn't think SETTRACE was much Verilog.

    BTW Couldn't reply with quote because the (percent sign) causes editing problems.
  • Bill HenningBill Henning Posts: 6,445
    edited 2013-11-01 06:56
    Thanks for posting it Chip - nice and short!
  • David BetzDavid Betz Posts: 14,516
    edited 2013-11-01 07:00
    Chip: Could you post your revised instruction list? I'd like to start updating PropGCC and I've been waiting for the changes to settle down a bit. It sounds like you've pretty much finished revising the instruction encodings. I don't mind if a few new instructions are added but I'd like to get the bulk of the work done now. Could you post your most recent instruction list with the bit encodings?

    Thanks!
    David

    Edit: I just went back to your earlier post of an instruction set dated 10/16. If that is still accurate then I guess there is no need to repost it. Can you confirm?
  • pedwardpedward Posts: 1,642
    edited 2013-11-01 08:27
    Cluso99 wrote: »
    From looking at the various CRC implementations, it would seem that any length polynomial could be done without requiring different CRC16 and CRC32 instruction sets. At the end, the upper bits would just be cleared if the length was less than CRC32. This would then make only 2 instructions necessary.

    How many clocks would be required by the CRC16/32 instruction? I presume you are feeding it a byte to be accumulated?

    A lookup table in hub does work fairly well though. Of course memory is used. I think the CLUT is more valuable for other things than a lookup table.

    CRC16 and CRC32 are different algorithms and aren't implemented the same way. CRC32 has some reversing in it that CRC16 doesn't have.

    I could be missing something obvious, but the algorithms are different.
  • ElectrodudeElectrodude Posts: 1,658
    edited 2013-11-01 11:21
    Just wondering, will there be any way to get the current PC of a thread other than the current one? I want to do preemptive multitasking in a few cogs for a project I'm designing that will use the P2, and I plan on doing it by having one thread (#0) that only gets a turn every 16 instructions, by a SETTASK %%1111111111111110 (with the constant obviously in another register). This thread would sit in a djnz i, $ loop for a while and then SETTASK #0, so only it gets turns, and then record thread #1's PC and flags, swap thread 1 out to hub ram, load in a new thread, and put the task register back to only give thread 0 a turn every 16 clocks.The only thing I think is missing is an instruction to get a thread's PC and flags. This instruction wasn't necessary before threads were added because you could just use $ to get the current PC. But that only works with the current thread. Is there/can you please add a way to read other threads' PCs and flags?
  • cgraceycgracey Posts: 14,155
    edited 2013-11-01 11:35
    David Betz wrote: »
    Chip: Could you post your revised instruction list? I'd like to start updating PropGCC and I've been waiting for the changes to settle down a bit. It sounds like you've pretty much finished revising the instruction encodings. I don't mind if a few new instructions are added but I'd like to get the bulk of the work done now. Could you post your most recent instruction list with the bit encodings?

    Thanks!
    David

    Edit: I just went back to your earlier post of an instruction set dated 10/16. If that is still accurate then I guess there is no need to repost it. Can you confirm?


    A few things have changed in the last two weeks. Here is the latest. This will have to change just a little bit more to accommodate pixel blending:
    Propeller II Instructions as of 11/01/2013
    
    ZCDS (for D column: W=write, M=modify, R=read, L=read/immediate)
    ---------------------------------------------------------------------------------------------------------------------------------------------------------------
    ZCWS		0000000 ZC I CCCC DDDDDDDDD SSSSSSSSS		RDBYTE	D,S/PTRx		(waits for hub)
    ZCWS		0000001 ZC I CCCC DDDDDDDDD SSSSSSSSS		RDBYTEC	D,S/PTRx		(waits for hub if cache miss)
    ZCWS		0000010 ZC I CCCC DDDDDDDDD SSSSSSSSS		RDWORD	D,S/PTRx		(waits for hub)
    ZCWS		0000011 ZC I CCCC DDDDDDDDD SSSSSSSSS		RDWORDC	D,S/PTRx		(waits for hub if cache miss)
    ZCWS		0000100 ZC I CCCC DDDDDDDDD SSSSSSSSS		RDLONG	D,S/PTRx		(waits for hub)
    ZCWS		0000101 ZC I CCCC DDDDDDDDD SSSSSSSSS		RDLONGC	D,S/PTRx		(waits for hub if cache miss)
    ZCWS		0000110 ZC I CCCC DDDDDDDDD SSSSSSSSS		RDAUX	D,S/#0..FF/SPx
    ZCWS		0000111 ZC I CCCC DDDDDDDDD SSSSSSSSS		RDAUXR	D,S/#0..FF/SPx
    
    ZCMS		0001000 ZC I CCCC DDDDDDDDD SSSSSSSSS		ISOB	D,S
    ZCMS		0001001 ZC I CCCC DDDDDDDDD SSSSSSSSS		NOTB	D,S
    ZCMS		0001010 ZC I CCCC DDDDDDDDD SSSSSSSSS		CLRB	D,S
    ZCMS		0001011 ZC I CCCC DDDDDDDDD SSSSSSSSS		SETB	D,S
    ZCMS		0001100 ZC I CCCC DDDDDDDDD SSSSSSSSS		SETBC	D,S
    ZCMS		0001101 ZC I CCCC DDDDDDDDD SSSSSSSSS		SETBNC	D,S
    ZCMS		0001110 ZC I CCCC DDDDDDDDD SSSSSSSSS		SETBZ	D,S
    ZCMS		0001111 ZC I CCCC DDDDDDDDD SSSSSSSSS		SETBNZ	D,S
    
    ZCMS		0010000 ZC I CCCC DDDDDDDDD SSSSSSSSS		ANDN	D,S
    ZCMS		0010001 ZC I CCCC DDDDDDDDD SSSSSSSSS		AND	D,S
    ZCMS		0010010 ZC I CCCC DDDDDDDDD SSSSSSSSS		OR	D,S
    ZCMS		0010011 ZC I CCCC DDDDDDDDD SSSSSSSSS		XOR	D,S
    ZCMS		0010100 ZC I CCCC DDDDDDDDD SSSSSSSSS		MUXC	D,S
    ZCMS		0010101 ZC I CCCC DDDDDDDDD SSSSSSSSS		MUXNC	D,S
    ZCMS		0010110 ZC I CCCC DDDDDDDDD SSSSSSSSS		MUXZ	D,S
    ZCMS		0010111 ZC I CCCC DDDDDDDDD SSSSSSSSS		MUXNZ	D,S
    
    ZCMS		0011000 ZC I CCCC DDDDDDDDD SSSSSSSSS		ROR	D,S
    ZCMS		0011001 ZC I CCCC DDDDDDDDD SSSSSSSSS		ROL	D,S
    ZCMS		0011010 ZC I CCCC DDDDDDDDD SSSSSSSSS		SHR	D,S
    ZCMS		0011011 ZC I CCCC DDDDDDDDD SSSSSSSSS		SHL	D,S
    ZCMS		0011100 ZC I CCCC DDDDDDDDD SSSSSSSSS		RCR	D,S
    ZCMS		0011101 ZC I CCCC DDDDDDDDD SSSSSSSSS		RCL	D,S
    ZCMS		0011110 ZC I CCCC DDDDDDDDD SSSSSSSSS		SAR	D,S
    ZCMS		0011111 ZC I CCCC DDDDDDDDD SSSSSSSSS		REV	D,S
    
    ZCWS		0100000 ZC I CCCC DDDDDDDDD SSSSSSSSS		MOV	D,S
    ZCWS		0100001 ZC I CCCC DDDDDDDDD SSSSSSSSS		NOT	D,S
    ZCWS		0100010 ZC I CCCC DDDDDDDDD SSSSSSSSS		ABS	D,S
    ZCWS		0100011 ZC I CCCC DDDDDDDDD SSSSSSSSS		NEG	D,S
    ZCWS		0100100 ZC I CCCC DDDDDDDDD SSSSSSSSS		NEGC	D,S
    ZCWS		0100101 ZC I CCCC DDDDDDDDD SSSSSSSSS		NEGNC	D,S
    ZCWS		0100110 ZC I CCCC DDDDDDDDD SSSSSSSSS		NEGZ	D,S
    ZCWS		0100111 ZC I CCCC DDDDDDDDD SSSSSSSSS		NEGNZ	D,S
    
    ZCMS		0101000 ZC I CCCC DDDDDDDDD SSSSSSSSS		ADD	D,S
    ZCMS		0101001 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUB	D,S
    ZCMS		0101010 ZC I CCCC DDDDDDDDD SSSSSSSSS		ADDX	D,S
    ZCMS		0101011 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUBX	D,S
    ZCMS		0101100 ZC I CCCC DDDDDDDDD SSSSSSSSS		ADDS	D,S
    ZCMS		0101101 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUBS	D,S
    ZCMS		0101110 ZC I CCCC DDDDDDDDD SSSSSSSSS		ADDSX	D,S
    ZCMS		0101111 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUBSX	D,S
    
    ZCMS		0110000 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUMC	D,S
    ZCMS		0110001 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUMNC	D,S
    ZCMS		0110010 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUMZ	D,S
    ZCMS		0110011 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUMNZ	D,S
    ZCMS		0110100 ZC I CCCC DDDDDDDDD SSSSSSSSS		MIN	D,S
    ZCMS		0110101 ZC I CCCC DDDDDDDDD SSSSSSSSS		MAX	D,S
    ZCMS		0110110 ZC I CCCC DDDDDDDDD SSSSSSSSS		MINS	D,S
    ZCMS		0110111 ZC I CCCC DDDDDDDDD SSSSSSSSS		MAXS	D,S
    
    ZCMS		0111000 ZC I CCCC DDDDDDDDD SSSSSSSSS		ADDABS	D,S
    ZCMS		0111001 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUBABS	D,S
    ZCMS		0111010 ZC I CCCC DDDDDDDDD SSSSSSSSS		INCMOD	D,S
    ZCMS		0111011 ZC I CCCC DDDDDDDDD SSSSSSSSS		DECMOD	D,S
    ZCMS		0111100 ZC I CCCC DDDDDDDDD SSSSSSSSS		CMPSUB	D,S
    ZCMS		0111101 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUBR	D,S
    ZCMS		0111110 ZC I CCCC DDDDDDDDD SSSSSSSSS		MUL	D,S			(waits one clock)
    ZCMS		0111111 ZC I CCCC DDDDDDDDD SSSSSSSSS		SCL	D,S			(waits one clock)
    
    ZCWS		1000000 ZC I CCCC DDDDDDDDD SSSSSSSSS		DECOD3	D,S
    ZCWS		1000001 ZC I CCCC DDDDDDDDD SSSSSSSSS		DECOD4	D,S
    ZCWS		1000010 ZC I CCCC DDDDDDDDD SSSSSSSSS		DECOD5	D,S
    Z-WS		1000011 Z0 I CCCC DDDDDDDDD SSSSSSSSS		ENCOD	D,S
    Z-WS		1000011 Z1 I CCCC DDDDDDDDD SSSSSSSSS		BLMASK	D,S
    Z-WS		1000100 Z0 I CCCC DDDDDDDDD SSSSSSSSS		ONECNT	D,S			(waits one clock)
    Z-WS		1000100 Z1 I CCCC DDDDDDDDD SSSSSSSSS		ZERCNT	D,S			(waits one clock)
    -CWS		1000101 0C I CCCC DDDDDDDDD SSSSSSSSS		INCPAT	D,S			(waits three clocks)
    -CWS		1000101 1C I CCCC DDDDDDDDD SSSSSSSSS		DECPAT	D,S			(waits three clocks)
    --WS		1000110 00 I CCCC DDDDDDDDD SSSSSSSSS		BINGRY	D,S
    --WS		1000110 01 I CCCC DDDDDDDDD SSSSSSSSS		GRYBIN	D,S			(waits one clock)
    --WS		1000110 10 I CCCC DDDDDDDDD SSSSSSSSS		SPLITB	D,S
    --WS		1000110 11 I CCCC DDDDDDDDD SSSSSSSSS		MERGEB	D,S
    --WS		1000111 00 I CCCC DDDDDDDDD SSSSSSSSS		SPLITW	D,S
    --WS		1000111 01 I CCCC DDDDDDDDD SSSSSSSSS		MERGEW	D,S
    --WS		1000111 10 I CCCC DDDDDDDDD SSSSSSSSS		ESWAP4	D,S
    --WS		1000111 11 I CCCC DDDDDDDDD SSSSSSSSS		ESWAP8	D,S
    
    --MS		10010nn n0 I CCCC DDDDDDDDD SSSSSSSSS		GETNIB	D,S,#0..7
    --MS		10010nn n1 I CCCC DDDDDDDDD SSSSSSSSS		SETNIB	D,S,#0..7
    --MS		1001100 n0 I CCCC DDDDDDDDD SSSSSSSSS		GETWORD	D,S,#0..1
    --MS		1001100 n1 I CCCC DDDDDDDDD SSSSSSSSS		SETWORD	D,S,#0..1
    --MS		1001101 00 I CCCC DDDDDDDDD SSSSSSSSS		SWBYTES	D,S			(switch/copy bytes in D, S = %11_10_01_00 = D same)
    --MS		1001101 01 I CCCC DDDDDDDDD SSSSSSSSS		ROLNIB	D,S
    --MS		1001101 10 I CCCC DDDDDDDDD SSSSSSSSS		ROLBYTE	D,S
    --MS		1001101 11 I CCCC DDDDDDDDD SSSSSSSSS		ROLWORD	D,S
    --MS		1001110 00 I CCCC DDDDDDDDD SSSSSSSSS		SETS	D,S
    --MS		1001110 01 I CCCC DDDDDDDDD SSSSSSSSS		SETD	D,S
    --MS		1001110 10 I CCCC DDDDDDDDD SSSSSSSSS		SETX	D,S
    --MS		1001110 11 I CCCC DDDDDDDDD SSSSSSSSS		SETI	D,S
    --MS		1001111 00 I CCCC DDDDDDDDD SSSSSSSSS		PACKRGB	D,S			(8:8:8 -> 5:5:5 << 16 | D >> 16)
    --MS		1001111 01 I CCCC DDDDDDDDD SSSSSSSSS		UNPKRGB	D,S			(5:5:5 -> 8:8:8)
    --WS		1001111 10 I CCCC DDDDDDDDD SSSSSSSSS		SEUSSF	D,S
    --WS		1001111 11 I CCCC DDDDDDDDD SSSSSSSSS		SEUSSR	D,S
    
    --MS		101000n n0 I CCCC DDDDDDDDD SSSSSSSSS		GETBYTE	D,S,#0..3
    --MS		101000n n1 I CCCC DDDDDDDDD SSSSSSSSS		SETBYTE	D,S,#0..3
    -CMS		1010010 0C I CCCC DDDDDDDDD SSSSSSSSS		COGNEW	D,S			(waits for hub)
    -CMS		1010010 1C I CCCC DDDDDDDDD SSSSSSSSS		WAITCNT	D,S			(waits for CNT, +CNTX if WC)
    --MS		1010011 00 I CCCC DDDDDDDDD SSSSSSSSS		PIXBLND	D,S			(waits two clocks)
    --MS		1010011 01 I CCCC DDDDDDDDD SSSSSSSSS		PIXMUL1	D,S			(waits two clocks)
    --MS		1010011 10 I CCCC DDDDDDDDD SSSSSSSSS		PIXMUL2	D,S			(waits two clocks)
    --MS		1010011 11 I CCCC DDDDDDDDD SSSSSSSSS		PIXADD	D,S			(waits two clocks)
    
    ZCWS		1010100 ZC I CCCC DDDDDDDDD SSSSSSSSS		JMPRET	D,S			(set D to %1_1111_01xx for JMP/RET)
    ZCWS		1010101 ZC I CCCC DDDDDDDDD SSSSSSSSS		JMPRETD	D,S 			(set D to %1_1111_01xx for JMP/RET)
    --MS		1010110 00 I CCCC DDDDDDDDD SSSSSSSSS		IJZ	D,S
    --MS		1010110 01 I CCCC DDDDDDDDD SSSSSSSSS		IJZD	D,S
    --MS		1010110 10 I CCCC DDDDDDDDD SSSSSSSSS		IJNZ	D,S
    --MS		1010110 11 I CCCC DDDDDDDDD SSSSSSSSS		IJNZD	D,S
    --MS		1010111 00 I CCCC DDDDDDDDD SSSSSSSSS		DJZ	D,S
    --MS		1010111 01 I CCCC DDDDDDDDD SSSSSSSSS		DJZD	D,S
    --MS		1010111 10 I CCCC DDDDDDDDD SSSSSSSSS		DJNZ	D,S
    --MS		1010111 11 I CCCC DDDDDDDDD SSSSSSSSS		DJNZD	D,S
    
    ZCRS		1011000 ZC I CCCC DDDDDDDDD SSSSSSSSS		TESTB	D,S
    ZCRS		1011001 ZC I CCCC DDDDDDDDD SSSSSSSSS		TESTN	D,S
    ZCRS		1011010 ZC I CCCC DDDDDDDDD SSSSSSSSS		TEST	D,S
    ZCRS		1011011 ZC I CCCC DDDDDDDDD SSSSSSSSS		CMP	D,S
    ZCRS		1011100 ZC I CCCC DDDDDDDDD SSSSSSSSS		CMPX	D,S
    ZCRS		1011101 ZC I CCCC DDDDDDDDD SSSSSSSSS		CMPS	D,S
    ZCRS		1011110 ZC I CCCC DDDDDDDDD SSSSSSSSS		CMPSX	D,S
    ZCRS		1011111 ZC I CCCC DDDDDDDDD SSSSSSSSS		CMPR	D,S
    
    --RS		11000nn n0 I CCCC DDDDDDDDD SSSSSSSSS		COGINIT	D,S,#0..7		(waits for hub) (SETNIB :coginit,cog,#6)
    ---S		11000nn n1 I CCCC nnnnnnnnn SSSSSSSSS		WAITVID	#0..$DFF,S		(waits for vid if single-task, loops if multi-task)
    --RS		1100011 11 I CCCC DDDDDDDDD SSSSSSSSS		WAITVID	D,S			(waits for vid if single-task, loops if multi-task)
    -CRS		110010n nC I CCCC DDDDDDDDD SSSSSSSSS		WAITPEQ	D,S,#0..3		(waits for pins, +CNT if WC)
    -CRS		110011n nC I CCCC DDDDDDDDD SSSSSSSSS		WAITPNE	D,S,#0..3		(waits for pins, +CNT if WC)
    
    --LS		1101000 0L I CCCC DDDDDDDDD SSSSSSSSS		WRBYTE	D,S/PTR			(waits for hub)
    --LS		1101000 1L I CCCC DDDDDDDDD SSSSSSSSS		WRWORD	D,S/PTR			(waits for hub)
    --LS		1101001 0L I CCCC DDDDDDDDD SSSSSSSSS		WRLONG	D,S/PTR			(waits for hub)
    --LS		1101001 1L I CCCC DDDDDDDDD SSSSSSSSS		FRAC	D,S
    --LS		1101010 0L I CCCC DDDDDDDDD SSSSSSSSS		WRAUX	D,S/#0..FF/SPx
    --LS		1101010 1L I CCCC DDDDDDDDD SSSSSSSSS		WRAUXR	D,S/#0..FF/SPx
    --LS		1101011 0L I CCCC DDDDDDDDD SSSSSSSSS		SETACCA	D,S
    --LS		1101011 1L I CCCC DDDDDDDDD SSSSSSSSS		SETACCB	D,S
    --LS		1101100 0L I CCCC DDDDDDDDD SSSSSSSSS		MACA	D,S
    --LS		1101100 1L I CCCC DDDDDDDDD SSSSSSSSS		MACB	D,S
    --LS		1101101 0L I CCCC DDDDDDDDD SSSSSSSSS		MUL32	D,S
    --LS		1101101 1L I CCCC DDDDDDDDD SSSSSSSSS		MUL32U	D,S
    --LS		1101110 0L I CCCC DDDDDDDDD SSSSSSSSS		DIV32	D,S
    --LS		1101110 1L I CCCC DDDDDDDDD SSSSSSSSS		DIV32U	D,S
    --LS		1101111 0L I CCCC DDDDDDDDD SSSSSSSSS		DIV64	D,S
    --LS		1101111 1L I CCCC DDDDDDDDD SSSSSSSSS		DIV64U	D,S
    
    --LS		1110000 0L I CCCC DDDDDDDDD SSSSSSSSS		SQRT64	D,S
    --LS		1110000 1L I CCCC DDDDDDDDD SSSSSSSSS		QSINCOS	D,S
    --LS		1110001 0L I CCCC DDDDDDDDD SSSSSSSSS		QARCTAN	D,S
    --LS		1110001 1L I CCCC DDDDDDDDD SSSSSSSSS		QROTATE	D,S
    --LS		1110010 0L I CCCC DDDDDDDDD SSSSSSSSS		SETSERA	D,S			(config,baud)
    --LS		1110010 1L I CCCC DDDDDDDDD SSSSSSSSS		SETSERB	D,S			(config,baud)
    --LS		1110011 0L I CCCC DDDDDDDDD SSSSSSSSS		SETCTRS	D,S			(ctrb,ctra)
    --LS		1110011 1L I CCCC DDDDDDDDD SSSSSSSSS		SETWAVS	D,S			(ctrb,ctra)
    --LS		1110100 0L I CCCC DDDDDDDDD SSSSSSSSS		SETFRQS	D,S			(ctrb,ctra)
    --LS		1110100 1L I CCCC DDDDDDDDD SSSSSSSSS		SETPHSS	D,S			(ctrb,ctra)
    --LS		1110101 0L I CCCC DDDDDDDDD SSSSSSSSS		ADDPHSS	D,S			(ctrb,ctra)
    --LS		1110101 1L I CCCC DDDDDDDDD SSSSSSSSS		SUBPHSS	D,S			(ctrb,ctra)
    --LS		1110110 0L I CCCC DDDDDDDDD SSSSSSSSS		SETPIX0	D,S			(config, Z)
    --LS		1110110 1L I CCCC DDDDDDDDD SSSSSSSSS		SETPIX1	D,S			(U, V)
    --LS		1110111 0L I CCCC DDDDDDDDD SSSSSSSSS		SETPIX2	D,S			(A, R)
    --LS		1110111 1L I CCCC DDDDDDDDD SSSSSSSSS		SETPIX3	D,S			(G, B)
    
    --LS		111100n nL I CCCC DDDDDDDDD SSSSSSSSS		CFGPINS	D,S,#0..2		(waits for alt)
    --LS		1111001 1L I CCCC DDDDDDDDD SSSSSSSSS		JMPTASK	D,S			(mask,address)
    --LS		1111010 0L I CCCC DDDDDDDDD SSSSSSSSS		JP	D,S
    --LS		1111010 1L I CCCC DDDDDDDDD SSSSSSSSS		JPD	D,S
    --LS		1111011 0L I CCCC DDDDDDDDD SSSSSSSSS		JNP	D,S
    --LS		1111011 1L I CCCC DDDDDDDDD SSSSSSSSS		JNPD	D,S
    
    --RS		1111100 00 I CCCC DDDDDDDDD SSSSSSSSS		TJZ	D,S
    --RS		1111100 01 I CCCC DDDDDDDDD SSSSSSSSS		TJZD	D,S
    --RS		1111100 10 I CCCC DDDDDDDDD SSSSSSSSS		TJNZ	D,S
    --RS		1111100 11 I CCCC DDDDDDDDD SSSSSSSSS		TJNZD	D,S
    --RS		1111101 00 I CCCC DDDDDDDDD SSSSSSSSS		TJP	D,S
    --RS		1111101 01 I CCCC DDDDDDDDD SSSSSSSSS		TJPD	D,S
    --RS		1111101 10 I CCCC DDDDDDDDD SSSSSSSSS		TJN	D,S
    --RS		1111101 11 I CCCC DDDDDDDDD SSSSSSSSS		TJND	D,S
    
    ----		1111110 0n n nnnn nnnnnnnnn nnniiiiii		REPS	#1..$40000,#1..64
    
    ----		1111110 10 x BBAA ddddddddd sssssssss		SETINDA #s    / SETINDB #d    / SETINDS #d,#s
    ----		1111110 11 x 0B0A ddddddddd sssssssss		FIXINDA	#d,#s / FIXINDB	#d,#s / FIXINDS	#d,#s
    
    ZCW-		1111111 ZC x CCCC DDDDDDDDD x00000000		COGID	D			(waits for hub) (doesn't write if WC)
    ZCW-		1111111 ZC x CCCC DDDDDDDDD x00000001		LOCKNEW	D			(waits for hub)
    ZCW-		1111111 ZC x CCCC DDDDDDDDD x00000010		GETCNT	D
    ZCW-		1111111 ZC x CCCC DDDDDDDDD x00000011		GETCNTX	D
    ZCW-		1111111 ZC x CCCC DDDDDDDDD x00000100		GETLFSR	D
    ZCW-		1111111 ZC x CCCC DDDDDDDDD x00000101		GETTOPS	D
    ZCW-		1111111 ZC x CCCC DDDDDDDDD x00000110		GETACAL	D			(waits for mac)
    ZCW-		1111111 ZC x CCCC DDDDDDDDD x00000111		GETACAH	D			(waits for mac)
    ZCW-		1111111 ZC x CCCC DDDDDDDDD x00001000		GETACBL	D			(waits for mac)
    ZCW-		1111111 ZC x CCCC DDDDDDDDD x00001001		GETACBH	D			(waits for mac)
    ZCW-		1111111 ZC x CCCC DDDDDDDDD x00001010		GETPTRA	D
    ZCW-		1111111 ZC x CCCC DDDDDDDDD x00001011		GETPTRB	D
    ZCW-		1111111 ZC x CCCC DDDDDDDDD x00001100		GETSPA	D
    ZCW-		1111111 ZC x CCCC DDDDDDDDD x00001101		GETSPB	D
    ZCW-		1111111 ZC x CCCC DDDDDDDDD x00001110		SERINA	D			(waits for rx if single-task, loops if multi-task, releases if WC)
    ZCW-		1111111 ZC x CCCC DDDDDDDDD x00001111		SERINB	D			(waits for rx if single-task, loops if multi-task, releases if WC)
    ZCW-		1111111 ZC x CCCC DDDDDDDDD x00010000		GETMULL	D			(waits for mul if single-task, loops if multi-task)
    ZCW-		1111111 ZC x CCCC DDDDDDDDD x00010001		GETMULH	D			(waits for mul if single-task, loops if multi-task)
    ZCW-		1111111 ZC x CCCC DDDDDDDDD x00010010		GETDIVQ	D			(waits for div if single-task, loops if multi-task)
    ZCW-		1111111 ZC x CCCC DDDDDDDDD x00010011		GETDIVR	D			(waits for div if single-task, loops if multi-task)
    ZCW-		1111111 ZC x CCCC DDDDDDDDD x00010100		GETSQRT	D			(waits for sqrt if single-task, loops if multi-task)
    ZCW-		1111111 ZC x CCCC DDDDDDDDD x00010101		GETQX	D			(waits for cordic if single-task, loops if multi-task)
    ZCW-		1111111 ZC x CCCC DDDDDDDDD x00010110		GETQY	D			(waits for cordic if single-task, loops if multi-task)
    ZCW-		1111111 ZC x CCCC DDDDDDDDD x00010111		GETQZ	D			(waits for cordic if single-task, loops if multi-task)
    ZCW-		1111111 ZC x CCCC DDDDDDDDD x00011000		GETPHSA	D
    ZCW-		1111111 ZC x CCCC DDDDDDDDD x00011001		GETPHZA	D			(clears phsa)
    ZCW-		1111111 ZC x CCCC DDDDDDDDD x00011010		GETCOSA	D
    ZCW-		1111111 ZC x CCCC DDDDDDDDD x00011011		GETSINA	D
    ZCW-		1111111 ZC x CCCC DDDDDDDDD x00011100		GETPHSB	D
    ZCW-		1111111 ZC x CCCC DDDDDDDDD x00011101		GETPHZB	D			(clears phsb)
    ZCW-		1111111 ZC x CCCC DDDDDDDDD x00011110		GETCOSB	D
    ZCW-		1111111 ZC x CCCC DDDDDDDDD x00011111		GETSINB	D
    
    ZCM-		1111111 ZC x CCCC DDDDDDDDD x00100000		PUSHZC	D
    ZCM-		1111111 ZC x CCCC DDDDDDDDD x00100001		POPZC	D
    ZCM-		1111111 ZC x CCCC DDDDDDDDD x00100010		SUBCNT	D			(subtracts D from CNT, then CNTX if same thread)
    ZCM-		1111111 ZC x CCCC DDDDDDDDD x00100011		GETPIX	D			(waits two clocks, needs two clocks per two prior stages)
    --M-		1111111 xx x CCCC DDDDDDDDD x00100100		INCD	D			(D += $200)
    --M-		1111111 xx x CCCC DDDDDDDDD x00100101		DECD	D			(D -= $200)
    --M-		1111111 xx x CCCC DDDDDDDDD x00100110		INCDS	D			(D += $201)
    --M-		1111111 xx x CCCC DDDDDDDDD x00100111		DECDS	D			(D -= $201)
    
    --L-		1111111 xx L CCCC DDDDDDDDD x00101000		CLKSET	D			(waits for hub)
    --L-		1111111 xx L CCCC DDDDDDDDD x00101001		COGSTOP	D			(waits for hub)
    -CL-		1111111 xC L CCCC DDDDDDDDD x00101010		LOCKSET	D			(waits for hub)
    -CL-		1111111 xC L CCCC DDDDDDDDD x00101011		LOCKCLR	D			(waits for hub)
    --L-		1111111 xx L CCCC DDDDDDDDD x00101100		LOCKRET	D			(waits for hub)
    --L-		1111111 xx L CCCC DDDDDDDDD x00101101		RDQUADC	D/PTR			(waits for hub if cache miss)
    --L-		1111111 xx L CCCC DDDDDDDDD x00101110		RDQUAD	D/PTR			(waits for hub)
    --L-		1111111 xx L CCCC DDDDDDDDD x00101111		WRQUAD	D/PTR			(waits for hub)
    
    ZCL-		1111111 ZC L CCCC DDDDDDDDD x00110000		GETP	D			(pin into !Z/C via WZ/WC)
    ZCL-		1111111 ZC L CCCC DDDDDDDDD x00110001		GETNP	D			(pin into Z/!C via WZ/WC)
    -CL-		1111111 xC L CCCC DDDDDDDDD x00110010		SEROUTA	D			(waits for tx if single-task, loops if multi-task, releases if WC)
    -CL-		1111111 xC L CCCC DDDDDDDDD x00110011		SEROUTB	D			(waits for tx if single-task, loops if multi-task, releases if WC)
    -CL-		1111111 xC L CCCC DDDDDDDDD x00110100		CMPCNT	D			(subtracts D from CNT, then CNTX if same thread)
    -CL-		1111111 xC L CCCC DDDDDDDDD x00110101		WAITPX	D			(waits for any edge, +CNT if WC)
    -CL-		1111111 xC L CCCC DDDDDDDDD x00110110		WAITPR	D			(waits for pos edge, +CNT if WC)
    -CL-		1111111 xC L CCCC DDDDDDDDD x00110111		WAITPF	D			(waits for neg edge, +CNT if WC)
    
    ZCL-		1111111 ZC L CCCC DDDDDDDDD x00111000		SETZC	D			(D[1:0] into Z/C via WZ/WC)
    --L-		1111111 xx L CCCC DDDDDDDDD x00111001		SETTASK	D
    --L-		1111111 xx L CCCC DDDDDDDDD x00111010		SETMAP	D
    --L-		1111111 xx L CCCC DDDDDDDDD x00111011		SETXCH	D
    --L-		1111111 xx L CCCC DDDDDDDDD x00111100		SETXFR	D
    --L-		1111111 xx L CCCC DDDDDDDDD x00111101		SARACCA	D			(waits for mac)
    --L-		1111111 xx L CCCC DDDDDDDDD x00111110		SARACCB	D			(waits for mac)
    --L-		1111111 xx L CCCC DDDDDDDDD x00111111		SARACCS	D			(waits for mac)
    
    --L-		1111111 xx L CCCC DDDDDDDDD x01iiiiii		REPD	D,#1..64		(REPD $1FF,#1..64 = infinite repeat)
    
    --L-		1111111 xx L CCCC DDDDDDDDD x10000000		SETSPA	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10000001		SETSPB	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10000010		ADDSPA	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10000011		ADDSPB	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10000100		SUBSPA	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10000101		SUBSPB	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10000110		SETQUAD	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10000111		SETQUAZ	D
    
    --L-		1111111 xx L CCCC DDDDDDDDD x10001000		SETPTRA	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10001001		SETPTRB	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10001010		ADDPTRA	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10001011		ADDPTRB	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10001100		SUBPTRA	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10001101		SUBPTRB	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10001110		PASSCNT	D			(loops if (CNT - D) msb set)
    --L-		1111111 xx L CCCC DDDDDDDDD x10001111		WAIT	D			(waits)
    
    --L-		1111111 xx L CCCC DDDDDDDDD x10010000		CALLA	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10010001		CALLB	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10010010		CALLAR	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10010011		CALLBR	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10010100		CALLAD	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10010101		CALLBD	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10010110		CALLARD	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10010111		CALLBRD	D
    
    --L-		1111111 xx L CCCC DDDDDDDDD x10011000		OFFP	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10011001		NOTP	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10011010		CLRP	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10011011		SETP	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10011100		SETPC	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10011101		SETPNC	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10011110		SETPZ	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10011111		SETPNZ	D
    
    --L-		1111111 xx L CCCC DDDDDDDDD x10100000		DIV64D	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10100001		SQRT32	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10100010		QLOG	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10100011		QEXP	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10100100		SETQI	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10100101		SETQZ	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10100110		CFGDACS	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10100111		SETDACS	D
    
    --L-		1111111 xx L CCCC DDDDDDDDD x10101000		CFGDAC0	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10101001		CFGDAC1	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10101010		CFGDAC2	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10101011		CFGDAC3	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10101100		SETDAC0	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10101101		SETDAC1	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10101110		SETDAC2	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10101111		SETDAC3	D
    
    --L-		1111111 xx L CCCC DDDDDDDDD x10110000		SETCTRA	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10110001		SETWAVA	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10110010		SETFRQA	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10110011		SETPHSA	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10110100		ADDPHSA	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10110101		SUBPHSA	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10110110		SETVID	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10110111		SETVIDY	D
    
    --L-		1111111 xx L CCCC DDDDDDDDD x10111000		SETCTRB	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10111001		SETWAVB	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10111010		SETFRQB	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10111011		SETPHSB	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10111100		ADDPHSB	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10111101		SUBPHSB	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10111110		SETVIDI	D
    --L-		1111111 xx L CCCC DDDDDDDDD x10111111		SETVIDQ	D
    
    --L-		1111111 xx L CCCC DDDDDDDDD x11000000		SETPIX	D
    --L-		1111111 xx L CCCC DDDDDDDDD x11000001		SETPIXZ	D
    --L-		1111111 xx L CCCC DDDDDDDDD x11000010		SETPIXU	D
    --L-		1111111 xx L CCCC DDDDDDDDD x11000011		SETPIXV	D
    --L-		1111111 xx L CCCC DDDDDDDDD x11000100		SETPIXA	D
    --L-		1111111 xx L CCCC DDDDDDDDD x11000101		SETPIXR	D
    --L-		1111111 xx L CCCC DDDDDDDDD x11000110		SETPIXG	D
    --L-		1111111 xx L CCCC DDDDDDDDD x11000111		SETPIXB	D
    
    --L-		1111111 xx L CCCC DDDDDDDDD x11001000		SETPORA	D
    --L-		1111111 xx L CCCC DDDDDDDDD x11001001		SETPORB	D
    --L-		1111111 xx L CCCC DDDDDDDDD x11001010		SETPORC	D
    --L-		1111111 xx L CCCC DDDDDDDDD x11001011		SETPORD	D
    --L-		1111111 xx L CCCC DDDDDDDDD x11001100		SETRACE	D
    ----		1111111 xx x CCCC xxxxxxxxx x11001101		CLRACCA
    ----		1111111 xx x CCCC xxxxxxxxx x11001110		CLRACCB
    ----		1111111 xx x CCCC xxxxxxxxx x11001111		CLRACCS
    
    ZC--		1111111 ZC x CCCC xxxxxxxxx x11010000		RETA
    ZC--		1111111 ZC x CCCC xxxxxxxxx x11010001		RETB
    ZC--		1111111 ZC x CCCC xxxxxxxxx x11010010		RETAR
    ZC--		1111111 ZC x CCCC xxxxxxxxx x11010011		RETBR
    ZC--		1111111 ZC x CCCC xxxxxxxxx x11010100		RETAD
    ZC--		1111111 ZC x CCCC xxxxxxxxx x11010101		RETBD
    ZC--		1111111 ZC x CCCC xxxxxxxxx x11010110		RETARD
    ZC--		1111111 ZC x CCCC xxxxxxxxx x11010111		RETBRD
    
    ZC--		1111111 ZC x CCCC xxxxxxxxx x11011000		TESTSPA
    ZC--		1111111 ZC x CCCC xxxxxxxxx x11011001		TESTSPB
    ZC--		1111111 ZC x CCCC xxxxxxxxx x11011010		POLCTRA				(ctra-rollover into !Z/C)
    ZC--		1111111 ZC x CCCC xxxxxxxxx x11011011		POLCTRB				(ctra-rollover into !Z/C)
    ZC--		1111111 ZC x CCCC xxxxxxxxx x11011100		POLVID				(vid-ready into !Z/C)
    ----		1111111 xx x CCCC xxxxxxxxx x11011101		CAPCTRA
    ----		1111111 xx x CCCC xxxxxxxxx x11011110		CAPCTRB
    ----		1111111 xx x CCCC xxxxxxxxx x11011111		CAPCTRS
    
    ----		1111111 xx x CCCC xxxxxxxxx x11100000		SYNCTRA				(waits for ctra if single-task, loops if multi-task))
    ----		1111111 xx x CCCC xxxxxxxxx x11100001		SYNCTRB				(waits for ctrb if single-task, loops if multi-task))
    ----		1111111 xx x CCCC xxxxxxxxx x11100010		CACHEX
    
    x = don't care, use 0
    ---------------------------------------------------------------------------------------------------------------------------------------------------------------
    
    
    ZC	effects
    ------------------------------------------------------------------------------------------------
    00	<none>
    01	wc
    10	wz
    11	wz, wc
    
    
    CCCC	condition	(easier-to-read list)
    -------------------------------------------------------------------------------------------------
    0000	never		1111	always		(default)
    0001	nc  &  nz	1100	if_c				if_b
    0010	nc  &  z	0011	if_nc				if_ae
    0011	nc		1010	if_z				if_e
    0100	 c  &  nz	0101	if_nz				if_ne
    0101	nz		1000	if_c_and_z	if_z_and_c
    0110	 c  <> z	0100	if_c_and_nz	if_nz_and_c
    0111	nc  |  nz	0010	if_nc_and_z	if_z_and_nc
    1000	 c  &  z	0001	if_nc_and_nz	if_nz_and_nc	if_a
    1001	 c  =  z	1110	if_c_or_z	if_z_or_c	if_be
    1010	 z		1101	if_c_or_nz	if_nz_or_c
    1011	nc  |  z	1011	if_nc_or_z	if_z_or_nc
    1100	 c		0111	if_nc_or_nz	if_nz_or_nc
    1101	 c  |  nz	1001	if_c_eq_z	if_z_eq_c
    1110	 c  |  z	0110	if_c_ne_z	if_z_ne_c
    1111	always		0000	never
    
    CCCC	inda/indb - CCCC=1111 after first stage of pipeline if inda/indb used (indx=inda/indb)
    -------------------------------------------------------------------------------------------------
    xx00	source indx
    xx01	source indx++
    xx10	source indx--
    xx11	source ++indx
    
    00xx	destination indx
    01xx	destination indx++
    10xx	destination indx--
    11xx	destination ++indx
    
    
    I	SSSSSSSSS	source operand
    -------------------------------------------------------------------------------------------------
    0/na	SSSSSSSSS	register
    1	#SSSSSSSSS	immediate, zero-extended
    
    
    L	DDDDDDDDD	destination operand
    -------------------------------------------------------------------------------------------------
    0/na	DDDDDDDDD	register
    1	#DDDDDDDDD	immediate, zero-extended
    
  • David BetzDavid Betz Posts: 14,516
    edited 2013-11-01 11:39
    cgracey wrote: »
    A few things have changed in the last two weeks. Here is the latest. This will have to change just a little bit more to accommodate pixel blending:
    Thanks Chip! A few more changes won't make a big difference. At least this gives me something to get started on!
  • pedwardpedward Posts: 1,642
    edited 2013-11-01 11:41
    cgracey wrote: »
    A few things have changed in the last two weeks. Here is the latest. This will have to change just a little bit more to accommodate pixel blending:

    Will these changes make the decoding of instructions simpler and thus synthesis less complicated?
  • cgraceycgracey Posts: 14,155
    edited 2013-11-01 11:57
    pedward wrote: »
    Will these changes make the decoding of instructions simpler and thus synthesis less complicated?

    The instruction decoding probably got a little more complicated in hardware, but this is simpler for us people to deal with.
  • cgraceycgracey Posts: 14,155
    edited 2013-11-01 11:58
    David Betz wrote: »
    Thanks Chip! A few more changes won't make a big difference. At least this gives me something to get started on!

    You're welcome, and thanks for your ongoing support.
  • David BetzDavid Betz Posts: 14,516
    edited 2013-11-01 12:09
    cgracey wrote: »
    You're welcome, and thanks for your ongoing support.
    No problem! It's fun being on the leading edge of this. So, I guess I'll press my luck and ask when we might see a new FPGA configuration that matches this new instruction set?
  • Jeff MartinJeff Martin Posts: 758
    edited 2013-11-01 12:51
    pedward wrote: »
    CRC16 and CRC32 are different algorithms and aren't implemented the same way. CRC32 has some reversing in it that CRC16 doesn't have.

    I could be missing something obvious, but the algorithms are different.

    pedward, I'm pretty sure you have more (and better) experience in this area than me, but from what I found during some research I did, I was led to the conclusion that the reversals (and inversions) in CRC32 are just a convention and not really an integral part of the algorithm. I mean, the reversals and inversions are employed in some implementations and not in others for the same reasons that different polynomials are employed in different situations based on the needs, nature of the data, nature of the noise possibilities, etc.

    Am I wrong?

    On a related note:

    For what it's worth, I've been wanting instructions in the Propeller 2 for CRC as well and feel stupid for not thinking of it earlier on in the Propeller 2 development. By the time I had suggested it, Chip said it was too late to put it in. Things have changed now, but I'm not sure if he will be implementing it. My motivation is that they are incredibly handy for communications over untrustworthy mediums and if we had a very fast way to calculate them (even a limited set of key CRCs) we could easily harden communication protocols implemented with the Propeller.

    Many years ago, I studied CRC-16-CCITT at a customer's prompting because he wanted to use a BS2-IC to parse and validate the CRC preset in a device's output. Making it reasonably fast but also small, in code size, was also a priority. After heavily studying it, I had a eureka moment and boiled it down to this PBASIC code (essentially two lines of executable code to implement CRC-16-CCITT's polynomial x^16+x^12+x^5+1):
    [FONT=Courier New]'CRC must be cleared to 0 before start.
    'Each byte must be put in CValue, then CalCRC should be called.
    'CRC is then equal to current CRC value for all CValue bytes processed.
    
    CRC     VAR WORD              'Calculate CRC value
    CRCL    VAR CRC.LOWBYTE       'Low byte of calculated CRC value
    CRCH    VAR CRC.HIGHBYTE      'High byte of calculated CRC value
    CValue  VAR BYTE              'Temporary holder of value for CRC calculation
    
    
    '-------------------------------- CRC Checksum Calculation Routine -------------------------------
    
    CalcCRC:
      CValue= CRCH^CValue>>4^(CRCH^CValue)
      CRC = CValue^(CValue<<5)^(CValue<<12)^(CRC << 8)
    RETURN[/FONT]
    

    In the years since, I hadn't seen anyone else implement it this way and I was never fond of other implementations. Also, I lost my notes (and knowledge) of how the heck I ever came to the conclusion that I could boil it down to that.

    Well I finally dove back into studying this last year and relearned what I had figured out that led me to that conclusion, then I applied it to CRC-32. (Incidentally, I did run across someone else implementing it this way since then, but haven't seen it formally documented anywhere).

    My goal was to come up with something that could be implemented in Propeller 2 hardware very quickly, and I think I found it. I sent some of this information to Chip for his consideration, but as I said, it was too late at the time.

    Without his feedback, I still don't know how few clock cycles it could really have been implemented in, and because it's limited to pre-defined polynomials, it's probably not the method everyone would agree with using anyway... but I thought I would share it here now.

    (See Attached)
    • The CRC.exe was just a Win64 program I wrote to test out a few things and verify that my algorithm worked as I studied CRC-16-CCITT and CRC-32.
    • The CRC-32.spin code was what I created to test out my CRC-32 algorithm and wrap it to the smallest I could make it.
    • The two word documents (very wide format pages) "CRC-16-CCITT 8-bit Data (For Chip).doc" and "CRC-32, 8-bit Data (For Chip).doc" where a summary of the results of my studying this in order to show Chip that patterns that could be exploited in hardware for the given polynomials.
    • The \Other Info\... docs are the versions where I documented things a bit more fully as I studied the patterns.
    • The "Sample Delphi Code.txt" is an excerpt of what's used in thr CRC.exe as I tested things.

    The result of the CRC-32.spin is this:
    CON
      _clkmode = xtal1 + pll16x                             ' Crystal and PLL settings.
      _xinfreq = 5_000_000                                  ' 5 MHz crystal (5 MHz x 16 = 80 MHz).
    
    VAR
      long CRC
      
    OBJ
      pst    : "Parallax Serial Terminal"                   ' Serial communication object
    
    PUB go | value                                  
      pst.Start(115200)                                     ' Start the Parallax Serial Terminal cog
    
      CRC := $FFFF_FFFF    'Invert Initial CRC
    
      CalcCRC("A")
    '  CalcCRC("B")
    '  CalcCRC("C")
    '  CalcCRC("D")
    
      CRC ><= 32           'Reverse CRC Pre-Final
      CRC ^= $FFFF_FFFF    'Invert Final CRC
      
      pst.Hex(CRC, 8)
    
    PUB CalcCRC(DataByte) | B
    
      DataByte ><= 8      'Reverse Data Bytes
      CRC ^= (B:=((CRC^=((CRC:=(CRC<-8)^DataByte)>>6)&$3)&$FF)<<1)^B<<1^B<<3^B<<4^B<<6^B<<7^B<<9^B<<10^B<<11^B<<15^B<<21^B<<22^B<<25
    
    ...where the real work is done by the single expression CRC ^=... at the very bottom of the code.

    In the end, I figured if we could make a single-instruction-cycle instruction to do the core of the CRC calculation for a couple specific CRC polynomials, the reversing and inversion (options) sometimes applied could be handled by other instructions (leaving them "optional" and applied only as needed) without much loss and still a huge gain.

    But nothing became of it, so I don't know if this was all sane or not.
  • KyeKye Posts: 2,200
    edited 2013-11-01 13:31
    I recently coded up a fast CRC algorithm on the prop. Since I use it to just validate large blocks of data it works great.
    ' /////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
    
    
    ' ///////////////////// CRC-16-CCITT Subrountine //////////////////////////////////////////////////////////////////////
    
    
    crcTable                long    $0000, $1081, $2102, $3183
                            long    $4204, $5285, $6306, $7387
                            long    $8408, $9489, $a50a, $b58b
                            long    $c60c, $d68d, $e70e, $f78f
    
    
    h0fff                   long    $0fff
    hffff                   long    $ffff
    
    
    crcChecksumBegin        mov     crc,                hffff
    crcChecksumBegin_ret    ret
    
    
    crcChecksum             mov     i,                  #8
    crcChecksumLoop         mov     j,                  crc
    
    
                            xor     crc,                c
                            and     crc,                #15
                            movs    crcChecksumLookup,  #crcTable
                            add     crcChecksumLookup,  crc
    
    
                            shr     j,                  #4
                            and     j,                  h0fff
    
    
    crcChecksumLookup       mov     crc,                crcTable
                            xor     crc,                j
    
    
                            shr     c,                  #4
                            djnz    i,                  #crcChecksumLoop
    crcChecksum_ret         ret
    
    
    crcChecksumEnd          xor     crc,                hffff
                            and     crc,                hffff
    crcChecksumEnd_ret      ret
    
    i                       res     1
    j                       res     1
    c                       res     1
    crc                     res     1
    

    Start with calling the begin function. Put the 32-bit value in the c var and call the checksum function. Repeat for any other longs. You can self-modify the crcChecksum label to #4 to work on words and #2 to work on bytes. After you're done call the end function.
  • Dave HeinDave Hein Posts: 6,347
    edited 2013-11-01 14:22
    There's always the table-lookup method if you're willing to use 512 bytes for CRC16 and 1K for CRC32. Here's a CRC16 routine in Spin. It would be quite a bit faster in PASM.
    dat
      fcstab word
      word $0000,$1021,$2042,$3063,$4084,$50a5,$60c6,$70e7
      word $8108,$9129,$a14a,$b16b,$c18c,$d1ad,$e1ce,$f1ef
      word $1231,$0210,$3273,$2252,$52b5,$4294,$72f7,$62d6
      word $9339,$8318,$b37b,$a35a,$d3bd,$c39c,$f3ff,$e3de
      word $2462,$3443,$0420,$1401,$64e6,$74c7,$44a4,$5485
      word $a56a,$b54b,$8528,$9509,$e5ee,$f5cf,$c5ac,$d58d
      word $3653,$2672,$1611,$0630,$76d7,$66f6,$5695,$46b4
      word $b75b,$a77a,$9719,$8738,$f7df,$e7fe,$d79d,$c7bc
      word $48c4,$58e5,$6886,$78a7,$0840,$1861,$2802,$3823
      word $c9cc,$d9ed,$e98e,$f9af,$8948,$9969,$a90a,$b92b
      word $5af5,$4ad4,$7ab7,$6a96,$1a71,$0a50,$3a33,$2a12
      word $dbfd,$cbdc,$fbbf,$eb9e,$9b79,$8b58,$bb3b,$ab1a
      word $6ca6,$7c87,$4ce4,$5cc5,$2c22,$3c03,$0c60,$1c41
      word $edae,$fd8f,$cdec,$ddcd,$ad2a,$bd0b,$8d68,$9d49
      word $7e97,$6eb6,$5ed5,$4ef4,$3e13,$2e32,$1e51,$0e70
      word $ff9f,$efbe,$dfdd,$cffc,$bf1b,$af3a,$9f59,$8f78
      word $9188,$81a9,$b1ca,$a1eb,$d10c,$c12d,$f14e,$e16f
      word $1080,$00a1,$30c2,$20e3,$5004,$4025,$7046,$6067
      word $83b9,$9398,$a3fb,$b3da,$c33d,$d31c,$e37f,$f35e
      word $02b1,$1290,$22f3,$32d2,$4235,$5214,$6277,$7256
      word $b5ea,$a5cb,$95a8,$8589,$f56e,$e54f,$d52c,$c50d
      word $34e2,$24c3,$14a0,$0481,$7466,$6447,$5424,$4405
      word $a7db,$b7fa,$8799,$97b8,$e75f,$f77e,$c71d,$d73c
      word $26d3,$36f2,$0691,$16b0,$6657,$7676,$4615,$5634
      word $d94c,$c96d,$f90e,$e92f,$99c8,$89e9,$b98a,$a9ab
      word $5844,$4865,$7806,$6827,$18c0,$08e1,$3882,$28a3
      word $cb7d,$db5c,$eb3f,$fb1e,$8bf9,$9bd8,$abbb,$bb9a
      word $4a75,$5a54,$6a37,$7a16,$0af1,$1ad0,$2ab3,$3a92
      word $fd2e,$ed0f,$dd6c,$cd4d,$bdaa,$ad8b,$9de8,$8dc9
      word $7c26,$6c07,$5c64,$4c45,$3ca2,$2c83,$1ce0,$0cc1
      word $ef1f,$ff3e,$cf5d,$df7c,$af9b,$bfba,$8fd9,$9ff8
      word $6e17,$7e36,$4e55,$5e74,$2e93,$3eb2,$0ed1,$1ef0
    
    PUB ComputeCRC(ptr, num)
      repeat num
        result := ((result << 8) & $ffff) ^ fcstab[(result >> 8) ^ byte[ptr++]]
    
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-11-01 16:31
    I coded CRC16-IBM (x^16+x^15+x^2+1) in the early 80's. This is different to the CRC16-CCITT (x^16+x^12+x^5+1). However, there are also variants that invert the polynomial, and also reverse the CRC16 bits (likely a misinterpretation at the time eg xmodem). Also the initial value varies too (eg CRC16-IBM at least when used with USB starts with $FFFF and the poly $A001 is the XOR value). CRC16-CCITT seems to start with $FFFF and uses XOR $8408, whereas CRC16-CCITT/XMODEM uses XOR $1021 (because xmodem send the crc16 in reverse order).

    CRC5 is also used in USB but much of this can be precalculated.

    I have seen a generalised implementation block of a variable length CRC generator. I will try to find it again. Beware however, there are some errors on the internet about implementations which are incorrect.

    I would favour a simple CRC 1-bit instruction - as I described here
    http://forums.parallax.com/showthread.php/150685-P2-Serial-Shift-Register-discussion?p=1216081&viewfull=1#post1216081

    CRC16 crcbits, poly wc ' C --> (crcbits + (crcbits[16] XOR $A001) ) >> 1

    Since we can set the initial value of "crcbits" and the value of "poly" this might/should/could??? work for other polynomial lengths from what I have seen.
    The instruction could be simplified by using a register (eg ACCx and preset it with "poly" or else the crc initial value) such that the instruction was in the form
    CRC D WC

    With this method and using the REPx instruction, the CRC cold be calculated in 16 clocks (plus setup) for 8 bits.

    For further investigation.....
    1. Can we use the same instruction for other polynomial lengths?
    2. Can we use a constant for the polynomial and do the inversion differences in sw?

    Should we start a new thread just to discuss the CRC generation requested?
  • pedwardpedward Posts: 1,642
    edited 2013-11-01 18:48
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-11-01 23:49
    Here is the latest instruction set as posted by Chip today (in Excel format)
    P2_Instruction_Set_20131102.zip
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-11-02 04:43
    Here is an update to the instruction set (Excel format) in my previous post (includes the [#] options for D & S and the WC/WZ flags)
    P2_Instruction_Set_20131102a.zip
  • hinvhinv Posts: 1,255
    edited 2013-11-02 10:46
    I have been out of the loop for a while, but it seems now that the instruction set is changing for the P2?

    When then is the P2 expected to be in production?

    Thanks,
    Doug
  • David BetzDavid Betz Posts: 14,516
    edited 2013-11-02 15:00
    hinv wrote: »
    I have been out of the loop for a while, but it seems now that the instruction set is changing for the P2?

    When then is the P2 expected to be in production?

    Thanks,
    Doug
    I think these changes are being done in some down time that was required anyway before another spin of the chip. Supposedly, the changes don't introduce any additional delays in the production dates than were required by the change in fabrication and the need to resynthesize.
  • RamonRamon Posts: 484
    edited 2013-11-02 20:18
    cgracey wrote: »
    
    always @(posedge clk or negedge ena)
    if (!ena)
    	trace <= 4'b0;
    else if (setrace)
    	trace <= d[3:0];
    
    	
    

    Should not setrace be also part of always sensitivity list?
  • cgraceycgracey Posts: 14,155
    edited 2013-11-02 22:49
    Ramon wrote: »
    Should not setrace be also part of always sensitivity list?

    No, because setrace does not instigate the action.
  • SapiehaSapieha Posts: 2,964
    edited 2013-11-04 06:07
    Hi pedward.

    For USB even CRC5 are needed

    pedward wrote: »
    I've been tremendously busy with my new job, so I'll just throw out a couple thoughts here.

    I already told Chip that I thought CRC instructions were compulsory for doing the stuff he wants, I couldn't see any way to really get a routine below 8 clocks per accumulate.

    I've seen Verilog for doing HW CRC, it's not difficult to implement, there is even a web page that will take a polynomial and spit out static Verilog that does it all.

    I want 2 CRC modes, a 16 bit CRC and 32 bit. The polynomial is user defined, so any number of CRC modes can be emulated.

    Example:

    SETCRC16 D/# ; set the polynomial for CRC16
    CRC16 D, S ; accumulate S into D using polynomial

    The proposal would then be 4 new instructions:

    SETCRC16 ; set CRC16 polynomial
    CRC16 ; CRC16 accumulate instruction
    SETCRC32 ; set CRC32 polynomial
    CRC32 ; CRC32 accumulate instruction
  • pedwardpedward Posts: 1,642
    edited 2013-11-04 08:20
    Perhaps a dedicated USB CRC5 instruction could be made? Is the CRC5 used as part of the fast path of USB comms, or just uses that have low performance requirements?

    I see CRC5 is used in RFID too, but that's hardly a high speed communication protocol.
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-11-04 14:49
    CRC5 in USB can mostly be precalculated because once you know your allocated USB address you can precalculate most of the CRC5. This is how BradC did USB LS in P1.
  • CircuitsoftCircuitsoft Posts: 1,166
    edited 2013-11-07 20:30
    Back on the SETRACE topic, I think I already saw this suggested, but when SETRACE is enabled, could you add an input pin for clock gating so that you can halt a cog by pulling it high or low? Also, if when the cog was halted, the cog and stack ram got mapped to an unused hub region, that would enable us to emulate every debug feature that any other processor/controller has, that I'm aware of.
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-11-07 21:33
    Back on the SETRACE topic, I think I already saw this suggested, but when SETRACE is enabled, could you add an input pin for clock gating so that you can halt a cog by pulling it high or low?
    Yes, I have asked for this with an enable instruction. Would be fantastic to be able to slowly step through some buggy sw.
    Also, if when the cog was halted, the cog and stack ram got mapped to an unused hub region, that would enable us to emulate every debug feature that any other processor/controller has, that I'm aware of.
    I don't think this is possible as the blocks are completely separate to the hub ram. These blocks have been hand laid and are not part of the current changes which only involves the synthesis sections (the cog instruction sets and counters etc).
Sign In or Register to comment.