PROPELLER 2 MEMORY ------------------ In the Propeller 2, there are two primary types of memory: HUB MEMORY 256K bytes of 1-port main memory shared by all cogs - cogs launch from this memory - cogs can read/write this memory as bytes, words, longs, and wides (8 longs) - $00000..$00DFF is ROM - contains Booter, SHA-256/HMAC, and Monitor - $00E00..$3FFFF is RAM - for application usage COG MEMORY (8 instances) 512 longs of 4-port register RAM for code and data usage - simultaneous instruction, source, and destination reading, plus destination writing - %000..$1F1 = RAM - $1F2 = INDA, indirect window - $1F3 = INDB, indirect window - $1F4..$1F7 = PINA..PIND, pin input (read-only) - $1F8..$1FB = OUTA..OUTD, pin output state control - $1FC..$1FF = DIRA..DIRD, pin output drive control 256 longs of 2-port auxiliary RAM for data and video usage - readable and writeable via instructions or free-running pin-transfer circuit - video circuit can read pixel data asynchronously from second port 4 longs x 4 tasks' worth of LIFO stacks for CALL/RET/PUSH/POP instructions 8 longs x 1 line of data cache for RDBYTEC/RDWORDC/RDLONGC/RDWIDEC instructions 8 longs x 4 lines of instruction cache for executing from hub memory INSTRUCTION ENCODING -------------------- Cog instructions are 32 bits long and comprised of several bit fields. There are two main types of instructions: dual-operand and single-operand. Dual-operand instructions specify both a D register for read/write access or an immediate D value, and an S register for read access or an immediate S value. Single-operand instructions specify only a D register or immediate value. Dual-operand encoding: TTTTTTT ZC I CCCC DDDDDDDDD SSSSSSSSS IF_x MNEM D/#,S/# WZ,WC Single-operand encoding: 1111111 ZC x CCCC DDDDDDDDD TTTTTTTTT IF_x MNEM D/# WZ,WC TTTTTTT = Instruction according to MNEM Z = Z flag write control: 0=don't write Z, 1=write Z Defaults to 0, but may be set to 1 by adding WZ (Write Z) after operand(s) Unless specified otherwise, the value written to Z is the NOR of the 32-bit D result. C = C flag write control: 0=don't write C, 1=write C Defaults to 0, but may be set to 1 by adding WC (Write C) after operand(s) I = SSSSSSSSS is register or immediate, 0=register address (S), 1=immediate (#n) CCCC = Execution condition (expressed by IF_x prefix) Determines Z/C flag conditions upon which the instruction will execute CCCC condition CCCC mnemonic prefixes (in easy-to-read order) --------------------------------------------------------------------- 0000 never 1111 IF_ALWAYS (default) 0001 nc & nz 1100 IF_C IF_B 0010 nc & z 0011 IF_NC IF_AE 0011 nc 1010 IF_Z IF_E 0100 c & nz 0101 IF_NZ IF_NE 0101 nz 1000 IF_C_AND_Z IF_Z_AND_C 0110 c <> z 0100 IF_C_AND_NZ IF_NZ_AND_C 0111 nc | nz 0010 IF_NC_AND_Z IF_Z_AND_NC 1000 c & z 0001 IF_NC_AND_NZ IF_NZ_AND_NC IF_A 1001 c = z 1110 IF_C_OR_Z IF_Z_OR_C IF_BE 1010 z 1101 IF_C_OR_NZ IF_NZ_OR_C 1011 nc | z 1011 IF_NC_OR_Z IF_Z_OR_NC 1100 c 0111 IF_NC_OR_NZ IF_NZ_OR_NC 1101 c | nz 1001 IF_C_EQ_Z IF_Z_EQ_C 1110 c | z 0110 IF_C_NE_Z IF_Z_NE_C 1111 always 0000 IF_NEVER DDDDDDDDD = Destination register address (D) or zero-extended immediate value (#n) SSSSSSSSS = Source register address (S) or zero-extended immediate value (#n) HUB MEMORY INSTRUCTIONS ----------------------- These instructions read and write hub memory. All instructions use D as the data conduit, except RDWIDE/RDWIDEC/WRWIDE, which use the eight WIDE registers. The WIDEs can be mapped into cog register space using the SETWIDE instruction or kept hidden, in which case they are still useful as data conduit and as a read cache. If mapped, the WIDEs overlay eight contiguous cog registers and can be read or written as all other registers, though they cannot be executed from. Any write via D to the WIDE registers, when mapped, will affect the underlying cog registers, as well. A RDWIDE/RDWIDEC will affect the WIDE registers, but not the underlying cog registers. The cached reads RDBYTEC/RDWORDC/RDLONGC/RDWIDEC will do a RDWIDE if the current read address is outside of the 8-long hub window of the prior RDWIDE. Otherwise, they will immediately return cached data. The DCACHEX instruction invalidates the cache, forcing a fresh RDWIDE next time a cached read executes. Hub memory instructions must wait for their cog's hub cycle, which comes once every 8 clocks. The timing relationship between a cog's instruction stream and its hub cycle is generally indeterminant, causing these instructions to take varying numbers of clocks. Timing can be made determinant, though, by intentionally spacing these instructions apart so that after the first in a series executes, the subsequent hub memory instructions fall on hub cycles, making them take the minimal numbers of clocks. The trick is to write useful code to go in between them. WRBYTE/WRWORD/WRLONG/WRWIDE/RDWIDE complete on the hub cycle, making them take 1..8 clocks. RDBYTE/RDWORD/RDLONG complete on the 2nd clock after the hub cycle, making them take 3..10 clocks. RDBYTEC/RDWORDC/RDLONGC take only 1 clock if data is already cached, otherwise 3..10 clocks. RDWIDEC takes only 1 clock if data is cached, otherwise 1..8 clocks. After a RDWIDE, mapped WIDE registers are accessible via D and S after three clocks: RDWIDE hubaddress 'read a wide into the WIDE registers mapped at wide0..wide7 NOP 'do something for at least 3 clocks to allow WIDEs to update NOP NOP CMP wide0,wide1 'mapped WIDEs are now accessible via D and S After a SETWIDE/SETWIDZ, mapped WIDE registers are writable immediately at their new address, but their contents only become readable via D and S after 2 instructions: SETWIDE #wide0 'map WIDEs to wide0..wide7 (three LSB of address must be %000) NOP 'do at least two instructions to queue up WIDEs NOP CMP wide0,wide1 'mapped WIDEs are now accessible via D and S On cog startup, the WIDE registers are hidden and cleared to 0's. instructions (PTRx = PTRA/PTRB) clocks --------------------------------------------------------------------------------------------------------- 0000000 ZC 0 CCCC DDDDDDDDD SSSSSSSSS RDBYTE D,S 'read byte at S into D 3..10 0000000 ZC 1 CCCC DDDDDDDDD SUPNNNNNN RDBYTE D,PTRx 'read byte at PTRx into D 3..10 0000001 ZC 0 CCCC DDDDDDDDD SSSSSSSSS RDBYTEC D,S 'read cached byte at S into D 1, 3..10 0000001 ZC 1 CCCC DDDDDDDDD SUPNNNNNN RDBYTEC D,PTRx 'read cached byte at PTRx into D 1, 3..10 1101000 00 0 CCCC DDDDDDDDD SSSSSSSSS WRBYTE D,S 'write lower byte in D at S 1..8 1101000 00 1 CCCC DDDDDDDDD SUPNNNNNN WRBYTE D,PTRx 'write lower byte in D at PTRx 1..8 1101000 01 0 CCCC DDDDDDDDD SSSSSSSSS WRBYTE #D,S 'write immediate D at S 1..8 1101000 01 1 CCCC DDDDDDDDD SUPNNNNNN WRBYTE #D,PTRx 'write immediate D at PTRx 1..8 0000010 ZC 0 CCCC DDDDDDDDD SSSSSSSSS RDWORD D,S 'read word at S into D 3..10 0000010 ZC 1 CCCC DDDDDDDDD SUPNNNNNN RDWORD D,PTRx 'read word at PTRx into D 3..10 0000011 ZC 0 CCCC DDDDDDDDD SSSSSSSSS RDWORDC D,S 'read cached word at S into D 1, 3..10 0000011 ZC 1 CCCC DDDDDDDDD SUPNNNNNN RDWORDC D,PTRx 'read cached word at PTRx into D 1, 3..10 1101000 10 0 CCCC DDDDDDDDD SSSSSSSSS WRWORD D,S 'write lower word in D at S 1..8 1101000 10 1 CCCC DDDDDDDDD SUPNNNNNN WRWORD D,PTRx 'write lower word in D at PTRx 1..8 1101000 11 0 CCCC DDDDDDDDD SSSSSSSSS WRWORD #D,S 'write immediate D at S 1..8 1101000 11 1 CCCC DDDDDDDDD SUPNNNNNN WRWORD #D,PTRx 'write immediate D at PTRx 1..8 0000100 ZC 0 CCCC DDDDDDDDD SSSSSSSSS RDLONG D,S 'read long at S into D 3..10 0000100 ZC 1 CCCC DDDDDDDDD SUPNNNNNN RDLONG D,PTRx 'read long at PTRx into D 3..10 0000101 ZC 0 CCCC DDDDDDDDD SSSSSSSSS RDLONGC D,S 'read cached long at S into D 1, 3..10 0000101 ZC 1 CCCC DDDDDDDDD SUPNNNNNN RDLONGC D,PTRx 'read cached long at PTRx into D 1, 3..10 1101001 00 0 CCCC DDDDDDDDD SSSSSSSSS WRLONG D,S 'write long in D at S 1..8 1101001 00 1 CCCC DDDDDDDDD SUPNNNNNN WRLONG D,PTRx 'write long in D at PTRx 1..8 1101001 01 0 CCCC DDDDDDDDD SSSSSSSSS WRLONG #D,S 'write immediate D at S 1..8 1101001 01 1 CCCC DDDDDDDDD SUPNNNNNN WRLONG #D,PTRx 'write immediate D at PTRx 1..8 1111111 00 0 CCCC DDDDDDDDD 000101101 RDWIDEC D 'read cached wide at D into WIDEs 1, 1..8 1111111 00 1 CCCC SUPNNNNNN 000101101 RDWIDEC PTRx 'read cached wide at PTRx into WIDEs 1, 1..8 1111111 00 0 CCCC DDDDDDDDD 000101110 RDWIDE D 'read wide at D into WIDEs 1..8 1111111 00 1 CCCC SUPNNNNNN 000101110 RDWIDE PTRx 'read wide at PTRx into WIDEs 1..8 1111111 00 0 CCCC DDDDDDDDD 000101111 WRWIDE D 'write WIDEs at D 1..8 1111111 00 1 CCCC SUPNNNNNN 000101111 WRWIDE PTRx 'write WIDEs at PTRx 1..8 --------------------------------------------------------------------------------------------------------- PTRx expressions: INDEX = -32..+31 for simple offsets, 0..31 for ++'s, or 0..32 for --'s SCALE = 1 for byte, 2 for word, 4 for long, or 32 for wide S = 0 for PTRA, 1 for PTRB U = 0 to keep PTRx same, 1 to update PTRx P = 0 to use PTRx + INDEX*SCALE, 1 to use PTRx (post-modify) NNNNNN = INDEX nnnnnn = -INDEX SUPNNNNNN PTR expression ----------------------------------------------------------------------------- 000000000 PTRA 'use PTRA 100000000 PTRB 'use PTRB 011000001 PTRA++ 'use PTRA, PTRA += SCALE 111000001 PTRB++ 'use PTRB, PTRB += SCALE 011111111 PTRA-- 'use PTRA, PTRA -= SCALE 111111111 PTRB-- 'use PTRB, PTRB -= SCALE 010000001 ++PTRA 'use PTRA + SCALE, PTRA += SCALE 110000001 ++PTRB 'use PTRB + SCALE, PTRB += SCALE 010111111 --PTRA 'use PTRA - SCALE, PTRA -= SCALE 110111111 --PTRB 'use PTRB - SCALE, PTRB -= SCALE 000NNNNNN PTRA[INDEX] 'use PTRA + INDEX*SCALE 100NNNNNN PTRB[INDEX] 'use PTRB + INDEX*SCALE 011NNNNNN PTRA++[INDEX] 'use PTRA, PTRA += INDEX*SCALE 111NNNNNN PTRB++[INDEX] 'use PTRB, PTRB += INDEX*SCALE 011nnnnnn PTRA--[INDEX] 'use PTRA, PTRA -= INDEX*SCALE 111nnnnnn PTRB--[INDEX] 'use PTRB, PTRB -= INDEX*SCALE 010NNNNNN ++PTRA[INDEX] 'use PTRA + INDEX*SCALE, PTRA += INDEX*SCALE 110NNNNNN ++PTRB[INDEX] 'use PTRB + INDEX*SCALE, PTRB += INDEX*SCALE 010nnnnnn --PTRA[INDEX] 'use PTRA - INDEX*SCALE, PTRA -= INDEX*SCALE 110nnnnnn --PTRB[INDEX] 'use PTRB - INDEX*SCALE, PTRB -= INDEX*SCALE Examples: 0000000 00 1 1111 DDDDDDDDD 000000000 RDBYTE D,PTRA 'read byte at PTRA into D 1101000 10 1 1111 DDDDDDDDD 111000001 WRWORD D,PTRB++ 'write lower word in D at PTRB, PTRB += 1*2 0000100 00 1 1111 DDDDDDDDD 011111111 RDLONG D,PTRA-- 'read long at PTRA into D, PTRA -= 1*4 1111111 00 1 1111 110000001 000101110 RDWIDE ++PTRB 'read wide at PTRB+32 into WIDEs, PTRB += 1*32 1101000 00 1 1111 DDDDDDDDD 010111111 WRBYTE D,--PTRA 'write lower byte in D at PTRA-1, PTRA -= 1*1 1101000 10 1 1111 DDDDDDDDD 100000111 WRWORD D,PTRB[7] 'write lower word in D to PTRB+7*2 0000101 00 1 1111 DDDDDDDDD 011011111 RDLONGC D,PTRA++[31] 'read cached long at PTRA into D, PTRA += 31*4 1111111 00 1 1111 111111101 000101111 WRWIDE PTRB--[3] 'write WIDEs at PTRB, PTRB -= 3*32 1101000 00 1 1111 DDDDDDDDD 010000110 WRBYTE D,++PTRA[6] 'write lower byte in D to PTRA+6*1, PTRA += 6*1 0000010 00 1 1111 DDDDDDDDD 110110110 RDWORD D,--PTRB[10] 'read word at PTRB-10*2 into D, PTRB -= 10*2 Bytes, words, longs, and wides are addressed as follows: for RDBYTE/RDBYTEC/WRBYTE, address = %XXXXXXXXXXXXXXXXX (bits 17..0 are used) for RDWORD/RDWORDC/WRWORD, address = %XXXXXXXXXXXXXXXX- (bits 17..1 are used) for RDLONG/RDLONGC/WRLONG, address = %XXXXXXXXXXXXXXX-- (bits 17..2 are used) for RDWIDE/RDWIDEC/WRWIDE, address = %XXXXXXXXXXXX----- (bits 17..5 are used) address byte word long wide ------------------------------------------------------------------- 00000- 50 *7250 *706F7250 *0D7CC6010C7CC6010CFCB6E30DC1FE450C7CCC030C7C200020302E32706F7250 00001- 72 7250 706F7250 0D7CC6010C7CC6010CFCB6E30DC1FE450C7CCC030C7C200020302E32706F7250 00002- 6F *706F 706F7250 0D7CC6010C7CC6010CFCB6E30DC1FE450C7CCC030C7C200020302E32706F7250 00003- 70 706F 706F7250 0D7CC6010C7CC6010CFCB6E30DC1FE450C7CCC030C7C200020302E32706F7250 00004- 32 *2E32 *20302E32 0D7CC6010C7CC6010CFCB6E30DC1FE450C7CCC030C7C200020302E32706F7250 00005- 2E 2E32 20302E32 0D7CC6010C7CC6010CFCB6E30DC1FE450C7CCC030C7C200020302E32706F7250 00006- 30 *2030 20302E32 0D7CC6010C7CC6010CFCB6E30DC1FE450C7CCC030C7C200020302E32706F7250 00007- 20 2030 20302E32 0D7CC6010C7CC6010CFCB6E30DC1FE450C7CCC030C7C200020302E32706F7250 00008- 00 *2000 *0C7C2000 0D7CC6010C7CC6010CFCB6E30DC1FE450C7CCC030C7C200020302E32706F7250 00009- 20 2000 0C7C2000 0D7CC6010C7CC6010CFCB6E30DC1FE450C7CCC030C7C200020302E32706F7250 0000A- 7C *0C7C 0C7C2000 0D7CC6010C7CC6010CFCB6E30DC1FE450C7CCC030C7C200020302E32706F7250 0000B- 0C 0C7C 0C7C2000 0D7CC6010C7CC6010CFCB6E30DC1FE450C7CCC030C7C200020302E32706F7250 0000C- 03 *CC03 *0C7CCC03 0D7CC6010C7CC6010CFCB6E30DC1FE450C7CCC030C7C200020302E32706F7250 0000D- CC CC03 0C7CCC03 0D7CC6010C7CC6010CFCB6E30DC1FE450C7CCC030C7C200020302E32706F7250 0000E- 7C *0C7C 0C7CCC03 0D7CC6010C7CC6010CFCB6E30DC1FE450C7CCC030C7C200020302E32706F7250 0000F- 0C 0C7C 0C7CCC03 0D7CC6010C7CC6010CFCB6E30DC1FE450C7CCC030C7C200020302E32706F7250 00010- 45 *FE45 *0DC1FE45 0D7CC6010C7CC6010CFCB6E30DC1FE450C7CCC030C7C200020302E32706F7250 00011- FE FE45 0DC1FE45 0D7CC6010C7CC6010CFCB6E30DC1FE450C7CCC030C7C200020302E32706F7250 00012- C1 *0DC1 0DC1FE45 0D7CC6010C7CC6010CFCB6E30DC1FE450C7CCC030C7C200020302E32706F7250 00013- 0D 0DC1 0DC1FE45 0D7CC6010C7CC6010CFCB6E30DC1FE450C7CCC030C7C200020302E32706F7250 00014- E3 *B6E3 *0CFCB6E3 0D7CC6010C7CC6010CFCB6E30DC1FE450C7CCC030C7C200020302E32706F7250 00015- B6 B6E3 0CFCB6E3 0D7CC6010C7CC6010CFCB6E30DC1FE450C7CCC030C7C200020302E32706F7250 00016- FC *0CFC 0CFCB6E3 0D7CC6010C7CC6010CFCB6E30DC1FE450C7CCC030C7C200020302E32706F7250 00017- 0C 0CFC 0CFCB6E3 0D7CC6010C7CC6010CFCB6E30DC1FE450C7CCC030C7C200020302E32706F7250 00018- 01 *C601 *0C7CC601 0D7CC6010C7CC6010CFCB6E30DC1FE450C7CCC030C7C200020302E32706F7250 00019- C6 C601 0C7CC601 0D7CC6010C7CC6010CFCB6E30DC1FE450C7CCC030C7C200020302E32706F7250 0001A- 7C *0C7C 0C7CC601 0D7CC6010C7CC6010CFCB6E30DC1FE450C7CCC030C7C200020302E32706F7250 0001B- 0C 0C7C 0C7CC601 0D7CC6010C7CC6010CFCB6E30DC1FE450C7CCC030C7C200020302E32706F7250 0001C- 01 *C601 *0D7CC601 0D7CC6010C7CC6010CFCB6E30DC1FE450C7CCC030C7C200020302E32706F7250 0001D- C6 C601 0D7CC601 0D7CC6010C7CC6010CFCB6E30DC1FE450C7CCC030C7C200020302E32706F7250 0001E- 7C *0D7C 0D7CC601 0D7CC6010C7CC6010CFCB6E30DC1FE450C7CCC030C7C200020302E32706F7250 0001F- 0D 0D7C 0D7CC601 0D7CC6010C7CC6010CFCB6E30DC1FE450C7CCC030C7C200020302E32706F7250 * new word/long/wide PTRA/PTRB INSTRUCTIONS ---------------------- Each cog has two 18-bit pointers, PTRA and PTRB, which can be read, written, modified, and used to access hub memory. At cog startup, the PTRA and PTRB registers are initialized as follows: PTRA = %XX_XXXXXXXX_XXXXXXXX, data from launching cog, usually a pointer PTRB = %XX_XXXXXXXX_XXXXXX00, long address in hub where cog code was loaded from instructions clocks ------------------------------------------------------------------------------------------------- 1111111 ZC 0 CCCC DDDDDDDDD 000001010 GETPTRA D 'get PTRA into D 1 1111111 ZC 0 CCCC DDDDDDDDD 000001011 GETPTRB D 'get PTRB into D 1 1111111 00 0 CCCC DDDDDDDDD 010001000 SETPTRA D 'set PTRA to D 1 1111111 00 1 CCCC DDDDDDDDD 010001000 SETPTRA #D 'set PTRA to #D 1 1111111 00 0 CCCC DDDDDDDDD 010001001 SETPTRB D 'set PTRB to D 1 1111111 00 1 CCCC DDDDDDDDD 010001001 SETPTRB #D 'set PTRB to #D 1 1111111 00 0 CCCC DDDDDDDDD 010001010 ADDPTRA D 'add D into PTRA 1 1111111 00 1 CCCC DDDDDDDDD 010001010 ADDPTRA #D 'add #D into PTRA 1 1111111 00 0 CCCC DDDDDDDDD 010001011 ADDPTRB D 'add D into PTRB 1 1111111 00 1 CCCC DDDDDDDDD 010001011 ADDPTRB #D 'add #D into PTRB 1 1111111 00 0 CCCC DDDDDDDDD 010001100 SUBPTRA D 'subtract D from PTRA 1 1111111 00 1 CCCC DDDDDDDDD 010001100 SUBPTRA #D 'subtract #D from PTRA 1 1111111 00 0 CCCC DDDDDDDDD 010001101 SUBPTRB D 'subtract D from PTRB 1 1111111 00 1 CCCC DDDDDDDDD 010001101 SUBPTRB #D 'subtract #D from PTRB 1 ------------------------------------------------------------------------------------------------- WIDE-RELATED INSTRUCTIONS ------------------------- Each cog has eight WIDE registers which form a 256-bit conduit between the hub memory and the cog. This conduit can transfer eight longs every 8 clocks via the RDWIDE/WRWIDE instructions. It can also be used as an 8-long/16-word/32-byte read cache, by using RDBYTEC/RDWORDC/RDLONGC/RDWIDEC. Initially hidden and cleared to zero, the WIDE registers are mappable into cog register space by using the SETWIDE/SETWIDZ instructions to set an 8-even address range (%xxxxxx000) where the WIDE registers are to appear. If the three LSBs are not %000, the WIDEs will be hidden. SETWIDZ works just like SETWIDE, but also clears the eight WIDE registers. instructions clocks ------------------------------------------------------------------------------------------------- 1111111 00 0 CCCC DDDDDDDDD 010011110 SETWIDE D 'set WIDE base to D 1 1111111 00 1 CCCC DDDDDDDDD 010011110 SETWIDE #D 'set WIDE base to #D 1 1111111 00 0 CCCC DDDDDDDDD 010011111 SETWIDZ D 'set WIDE base to D, WIDEs=0 1 1111111 00 1 CCCC DDDDDDDDD 010011111 SETWIDZ #D 'set WIDE base to #D, WIDEs=0 1 1111111 00 0 CCCC 000000000 100011000 DCACHEX 'invalidate cache 1 ------------------------------------------------------------------------------------------------- HUB CONTROL INSTRUCTIONS ------------------------ These instructions are used to control hub circuits and cogs. Hub instructions must wait for their cog's hub cycle, which comes once every 8 clocks. In cases where there is no result to wait for (no Z, C, or D), these instructions complete on the hub cycle, making them take 1..8 clocks, depending on where the hub cycle is in relation to the instruction. In cases where a result is anticipated (Z, C, or D), these instructions complete on the 1st clock after the hub cycle, making them take 2..9 clocks. COGNEW D, S/# -------------- COGNEW starts the lowest-numbered idle cog. For COGNEW, D specifies a long address in hub memory that is the start of the program that is to be loaded into the idle cog, while S is a 18-bit parameter (usually an address) that will be conveyed to PTRA of that cog. PTRB of that cog will be set to the start address of its new program in hub memory, which is the same as the D value used in the COGNEW instruction, AND'd with $3FFFC to form a hub long address. COGNEW will return the number of the started cog (0..7) into D, with C=0 indicating success or C=1 indicating failure, in which case no cog was idle and so D is invalid. COGINIT D, S/#, #0..7 --------------------- COGINIT is used to start a cog by its number (0..7). Any cog can be (re)started, whether it is idle or running. A cog can even execute a COGINIT to restart itself with a new program. COGINIT uses D and S identically to COGNEW, but doesn't return anything in D or C, as its behavior is determinant. COGINIT uses a third operand to convey the number of the cog to be started (0..7). Those three bits, with a leading 0 bit, are located in nibble 6 of the COGINIT instruction. The SETNIB instruction can be used to make the cog number variable: COGID x 'get my cog number into x SETNIB :inst,x,#6 'install x into COGINIT NOP 'must execute two instruction before modified code can execute NOP '(NOPs are not required in 4-way round-robin multitasking) :inst COGINIT pgm,ptr,#0 'restart me When a cog is started, $1F4 contiguous longs are read from hub memory and written to cog registers $000..$1F3. The cog will then begin execution at $000. This process takes 1,017 clocks. CLKSET D --------- CLKSET writes the lower 9 bits of D to the hub clock register: %R_MMMM_XX_SS R = 1 for hardware reset, 0 for continued operation MMMM = PLL mode: %1111 for multiply XI by 16 %1110 for multiply XI by 15 %1101 for multiply XI by 14 %1100 for multiply XI by 13 %1011 for multiply XI by 12 %1010 for multiply XI by 11 %1001 for multiply XI by 10 %1000 for multiply XI by 9 %0111 for multiply XI by 8 %0110 for multiply XI by 7 %0101 for multiply XI by 6 %0100 for multiply XI by 5 %0011 for multiply XI by 4 %0010 for multiply XI by 3 %0001 for multiply XI by 2 %0000 for disabled, else XX must be set for XI input or XI/XO crystal oscillator XX = XI/XO pin mode: %11 for XI/XO crystal oscillator with 30pF internal loading and 1M-ohm feedback %10 for XI/XO crystal oscillator with 15pF internal loading and 1M-ohm feedback %01 for XI input, XO floats %00 for XI reads low, XO floats SS = Clock selector: %11 for PLL %10 for XTAL (10MHz-20MHz) %01 for RCSLOW (~20KHz) %00 for RCFAST (~20MHz) Because the the clock register is cleared to %0_0000_00_00 on reset, the chip starts up in RCFAST mode with both the crystal oscillator and the PLL disabled. Before switching to XTAL or PLL mode from RCFAST or RCSLOW, the crystal oscillator must be enabled and given 10ms to stabilize. The PLL stabilizes within 10us, so it can be enbled at the sime time as the crystal oscillator. Once the crystal is stabilized, you can switch between XTAL and RCFAST/RCSLOW without any stability concerns. If the PLL is also enabled, you can switch freely among PLL, XTAL, and RCFAST/RCSLOW modes. You can change the PLL multiplier while being in PLL mode, but beware that some frequency overshoot and undershoot will occur as the PLL settles to its new frequency. This only poses a hardware problem if you are switching upwards and the resulting overshoot might exceed the speed limit of the chip. COGID D --------- If WC is not specified, COGID returns the number of the cog (0..7) into D. If WC is specified, COGID returns the state of cog D into C, where 0=idle / 1=running, without writing D. COGSTOP D/# ----------- COGSTOP stops the cog specified in D/# (0..7). The stopped cog will return to a reset state in which all of its output signals will be held low, cancelling any effects it was having on I/O pins. LOCKNEW D LOCKRET D/# LOCKSET D/# LOCKCLR D/# ----------- There are eight semaphore locks available in the chip which can be borrowed with LOCKNEW, returned with LOCKRET, set with LOCKSET, and cleared with LOCKCLR. While any cog can set or clear any lock without using LOCKNEW or LOCKRET, LOCKNEW and LOCKRET are provided so that cog programs have a dynamic and simple means of acquiring and relinquishing the locks at run-time. When a lock is set with LOCKSET, its state is set to 1 and its prior state is returned in C. LOCKCLR works the same way, but clears the lock's state to 0. By having the hub perform the atomic operation of setting/ clearing and reporting the prior state, cogs can utilize locks to insure that only one cog has permission to do something at once. If a lock starts out cleared and multiple cogs vie for the lock by doing a 'LOCKSET locknum wc', the cog to get C=0 back 'wins' and he can have exclusive access to some shared resource while the other cogs get C=1 back. When the winning cog is done, he can do a 'LOCKCLR locknum' to clear the lock and give another cog the opportunity to get C=0 back. LOCKNEW returns the next available lock into D, with C=1 if no lock was free. LOCKRET frees the lock in D so that it can be checked out again by LOCKNEW. LOCKSET sets the lock in D and returns its prior state in C. LOCKCLR clears the lock in D and returns its prior state in C. instructions clocks --------------------------------------------------------------------------------------------------- 1001111 0C 0 CCCC DDDDDDDDD SSSSSSSSS COGNEW D,S 'launch new cog at D, cog PTRA = S 1..9 1001111 0C 1 CCCC DDDDDDDDD SSSSSSSSS COGNEW D,#S 'launch new cog at D, cog PTRA = #S 1..9 11000nn n0 0 CCCC DDDDDDDDD SSSSSSSSS COGINIT D,S,#n 'launch cog n at D, cog PTRA = S 1..9 11000nn n0 1 CCCC DDDDDDDDD SSSSSSSSS COGINIT D,#S,#n 'launch cog n at D, cog PTRA = #S 1..9 1111111 Z0 0 CCCC DDDDDDDDD 000000000 COGID D 'get cog number into D 2..9 1111111 Z1 0 CCCC DDDDDDDDD 000000000 COGID D WC 'get cog D state, C = running 2..9 1111111 ZC 0 CCCC DDDDDDDDD 000000010 LOCKNEW D 'get new lock into D, C = busy 2..9 1111111 00 0 CCCC DDDDDDDDD 010000000 CLKSET D 'set clock to D 1..8 1111111 00 1 CCCC DDDDDDDDD 010000000 CLKSET #D 'set clock to #D 1..8 1111111 00 0 CCCC DDDDDDDDD 010000001 COGSTOP D 'stop cog D 1..8 1111111 00 1 CCCC DDDDDDDDD 010000001 COGSTOP #D 'stop cog #D 1..8 1111111 0C 0 CCCC DDDDDDDDD 010000010 LOCKSET D 'set lock D, C = prev state 1..9 1111111 0C 1 CCCC DDDDDDDDD 010000010 LOCKSET #D 'set lock #D, C = prev state 1..9 1111111 0C 0 CCCC DDDDDDDDD 010000011 LOCKCLR D 'clear lock D, C = prev state 1..9 1111111 0C 1 CCCC DDDDDDDDD 010000011 LOCKCLR #D 'clear lock #D, C = prev state 1..9 1111111 00 0 CCCC DDDDDDDDD 010000100 LOCKRET D 'return lock D 1..8 1111111 00 1 CCCC DDDDDDDDD 010000100 LOCKRET #D 'return lock #D 1..8 --------------------------------------------------------------------------------------------------- INDIRECT REGISTERS ------------------ Each cog has two indirect registers: INDA and INDB. They are located at $1F2 and $1F3. By using INDA or INDB for D or S, the register pointed at by INDA or INDB is addressed. INDA and INDB each have three hidden 9-bit registers associated with them: the pointer, the bottom limit, and the top limit. The bottom and top limits are inclusive values which set automatic wrapping boundaries for the pointer. This way, circular buffers can be established within cog RAM and accessed using simple INDA/INDB references. FIXINDA/FIXINDB/FIXINDS sets the pointer(s) to an inital value, while setting the bottom limit(s) to the lower of the initial and terminal values and the top limit(s) to the higher. SETINDA/SETINDB/SETINDS is used to set or adjust the pointer value(s) while forcing the associated bottom and top limit(s) to $000 and $1FF, respectively. Because indirect addressing must occur in the 2nd stage of the pipeline, long before C and Z are valid for conditional execution in the 4th stage, all instructions which use indirect addressing are forced to always execute. This frees the conditional bit field (CCCC) for specifying indirect operations. The top two bits of CCCC are used for indirect D and the bottom two bits are used for indirect S. If only D or S is indirect, the other two bits in CCCC are ignored. Here is the INDA/INDB usage scheme which repurposes the CCCC field: TTTTTTT ZC I CCCC DDDDDDDDD SSSSSSSSS ------------------------------------- xxxxxxx xx x 00xx 111110010 xxxxxxxxx D = INDA 'use INDA xxxxxxx xx x 00xx 111110011 xxxxxxxxx D = INDB 'use INDB xxxxxxx xx x 01xx 111110010 xxxxxxxxx D = INDA++ 'use INDA, INDA += 1 xxxxxxx xx x 01xx 111110011 xxxxxxxxx D = INDB++ 'use INDB, INDB += 1 xxxxxxx xx x 10xx 111110010 xxxxxxxxx D = INDA-- 'use INDA, INDA -= 1 xxxxxxx xx x 10xx 111110011 xxxxxxxxx D = INDB-- 'use INDB INDB -= 1 xxxxxxx xx x 11xx 111110010 xxxxxxxxx D = ++INDA 'use INDA+1, INDA += 1 xxxxxxx xx x 11xx 111110011 xxxxxxxxx D = ++INDB 'use INDB+1, INDB += 1 xxxxxxx xx 0 xx00 xxxxxxxxx 111110010 S = INDA 'use INDA xxxxxxx xx 0 xx00 xxxxxxxxx 111110011 S = INDB 'use INDB xxxxxxx xx 0 xx01 xxxxxxxxx 111110010 S = INDA++ 'use INDA, INDA += 1 xxxxxxx xx 0 xx01 xxxxxxxxx 111110011 S = INDB++ 'use INDB, INDB += 1 xxxxxxx xx 0 xx10 xxxxxxxxx 111110010 S = INDA-- 'use INDA, INDA -= 1 xxxxxxx xx 0 xx10 xxxxxxxxx 111110011 S = INDB-- 'use INDB INDB -= 1 xxxxxxx xx 0 xx11 xxxxxxxxx 111110010 S = ++INDA 'use INDA+1, INDA += 1 xxxxxxx xx 0 xx11 xxxxxxxxx 111110011 S = ++INDB 'use INDB+1, INDB += 1 If both D and S are the same indirect register, the two 2-bit fields in CCCC are OR'd together to get the post-modifier effect: 0100000 00 0 0011 111110010 111110010 MOV INDA,++INDA 'Move @INDA+1 into @INDA, INDA += 1 0101000 00 0 1100 111110011 111110011 ADD ++INDB,INDB 'Add @INDB into @INDB+1, INDB += 1 Note that only '++INDx,INDx'/'INDx,++INDx' combinations can address different registers from the same INDx. Here are the instructions which are used to set the pointer and limit values for INDA and INDB: instructions * clocks ------------------------------------------------------------------------------------------------- 1111110 00 1 0001 TTTTTTTTT IIIIIIIII FIXINDA #terminal,#initial 1 1111110 00 1 0100 TTTTTTTTT IIIIIIIII FIXINDB #terminal,#initial 1 1111110 00 1 0101 TTTTTTTTT IIIIIIIII FIXINDS #terminal,#initial 1 1111110 00 1 0010 000000000 AAAAAAAAA SETINDA #addrA 1 1111110 00 1 0011 000000000 AAAAAAAAA SETINDA ++/--deltA 1 1111110 00 1 1000 BBBBBBBBB 000000000 SETINDB #addrB 1 1111110 00 1 1100 BBBBBBBBB 000000000 SETINDB ++/--deltB 1 1111110 00 1 1010 BBBBBBBBB AAAAAAAAA SETINDS #addrB,#addrA 1 1111110 00 1 1011 BBBBBBBBB AAAAAAAAA SETINDS #addrB,++/--deltA 1 1111110 00 1 1110 BBBBBBBBB AAAAAAAAA SETINDS ++/--deltB,#addrA 1 1111110 00 1 1111 BBBBBBBBB AAAAAAAAA SETINDS ++/--deltB,++/--deltA 1 ------------------------------------------------------------------------------------------------- * addrA/addrB/terminal/initial = register address (0..511), deltA/deltB = 9-bit signed delta --256..++255 Examples: 1111110 00 1 0010 000000000 000000101 SETINDA #5 'INDA = 5, bottom = 0, top = 511 1111110 00 1 0011 000000000 000000011 SETINDA ++3 'INDA += 3, bottom = 0, top = 511 1111110 00 1 1100 111111100 000000000 SETINDB --4 'INDB -= 4, bottom = 0, top = 511 1111110 00 1 1011 000000111 000001000 SETINDS #7,++8 'INDB = 7, INDA += 8, bottoms = 0, tops = 511 1111110 00 1 0001 000001111 000001000 FIXINDA #15,#8 'INDA = 8, bottom = 8, top = 15 1111110 00 1 0100 000010000 000011111 FIXINDB #16,#31 'INDB = 31, bottom = 16, top = 31 1111110 00 1 0101 001100011 000110010 FIXINDS #99,#50 'INDA/INDB = 50, bottoms = 50, tops = 99 AUXILIARY RAM -------------- Each cog has a 256-long auxiliary RAM called AUX that can be used for data, call/return (Z,C,PC) stacks, and streaming buffers for video pixels and pin-transfers. AUX's contents are not initialized at either reset or cog start. So, at cog (re)start, it will contain whatever it happened to power up with, or whatever was last written to it. There are two complementary sets of AUX read/write instructions. One set addresses AUX from 0..255, while the other addresses AUX in reverse order from 255..0. This scheme allows for simple operation of separate data/program stacks (LIFO's) which can grow towards each other. There are also two 8-bit AUX pointer registers, PTRX and PTRY, which can be used in AUX addressing expressions. Here are the forward-addressing (0..255) read/write instructions for AUX: RDAUX D,S read AUX[S] into D RDAUX D,#0..255 read AUX[0..255] into D RDAUX D,PTRX read AUX[PTRX] into D, can update PTRX RDAUX D,PTRY read AUX[PTRY] into D, can update PTRY WRAUX D/#,S write D/# to AUX[S] WRAUX D/#,#0..255 write D/# to AUX[0..255] WRAUX D/#,PTRX write D/# to AUX[PTRX], can update PTRX WRAUX D/#,PTRY write D/# to AUX[PTRX], can update PTRY The reverse-addressing (255..0) read/write instructions for AUX are just like those above, except that they have an "R" in their mnemonics and apply a 1's-complement (!) to the apparent address: RDAUXR D,S read AUX[!S] into D RDAUXR D,#0..255 read AUX[!0..255] into D RDAUXR D,PTRX read AUX[!PTRX] into D, can update PTRX RDAUXR D,PTRY read AUX[!PTRY] into D, can update PTRY WRAUXR D/#,S write D/# to AUX[!S] WRAUXR D/#,#0..255 write D/# to AUX[!0..255] WRAUXR D/#,PTRX write D/# to AUX[!PTRX], can update PTRX WRAUXR D/#,PTRY write D/# to AUX[!PTRX], can update PTRY There are also push/pop/call/ret instructions which use AUX. Those using PTRX are forward-addressing and those using PTRY are reverse-addressing: PUSHX D/# alias for 'WRAUX D/#,PTRX++' PUSHY D/# alias for 'WRAUXR D/#,PTRY++' POPX D alias for 'RDAUX D,--PTRX' POPY D alias for 'RDAUXR D,--PTRY' CALLX D/#/@ write {Z,C,PC} to AUX[PTRX++], jump to D/#/@, cancel same-task pipelined instructions CALLY D/#/@ write {Z,C,PC} to AUX[!PTRY++], jump to D/#/@, cancel same-task pipelined instructions CALLXD D/#/@ write {Z,C,PC} to AUX[PTRX++], jump to D/#/@, don't cancel same-task pipelined instructions CALLYD D/#/@ write {Z,C,PC} to AUX[!PTRY++], jump to D/#/@, don't cancel same-task pipelined instructions RETX read {Z,C,PC} from AUX[--PTRX], cancel same-task pipelined instructions RETY read {Z,C,PC} from AUX[!--PTRY], cancel same-task pipelined instructions RETXD read {Z,C,PC} from AUX[--PTRX], don't cancel same-task pipelined instructions RETYD read {Z,C,PC} from AUX[!--PTRY], don't cancel same-task pipelined instructions PTRX and PTRY can be set, added to, subtracted from, read, or checked using the following instructions: SETPTRX D/# set PTRX to D/# SETPTRY D/# set PTRY to D/# ADDPTRX D/# add D/# to PTRX ADDPTRY D/# add D/# to PTRY SUBPTRX D/# subtract D/# from PTRX SUBPTRY D/# subtract D/# from PTRY GETPTRX D get PTRX into D, PTRX==0 into Z, PTRX.7 into C GETPTRY D get PTRY into D, PTRY==0 into Z, PTRY.7 into C CHKPTRX get PTRX==0 into Z, PTRX.7 into C CHKPTRY get PTRY==0 into Z, PTRY.7 into C PTRX/PTRY expressions for RDAUX/RDAUXR/WRAUX/WRAUXR: INDEX = -16..+15 for simple offsets, 0..15 for ++'s, or 0..16 for --'s X = 1 for PTRX/PTRY expression, 0 for constant in 8 LSBs S = 0 for PTRX, 1 for PTRY U = 0 to keep PTRX/PTRY same, 1 to update PTRX/PTRY P = 0 to use PTRX/PTRY + INDEX, 1 to use PTRX/PTRY (post-modify) NNNNN = INDEX nnnnn = -INDEX XSUPNNNNN SPx expression ---------------------------------------------------------------------- 100000000 PTRX 'use PTRX 110000000 PTRY 'use PTRY 101100001 PTRX++ 'use PTRX, PTRX += 1 111100001 PTRY++ 'use PTRY, PTRY += 1 101111111 PTRX-- 'use PTRX, PTRX -= 1 111111111 PTRY-- 'use PTRY, PTRY -= 1 101000001 ++PTRX 'use PTRX + 1, PTRX += 1 111000001 ++PTRY 'use PTRY + 1, PTRY += 1 101011111 --PTRX 'use PTRX - 1, PTRX -= 1 111011111 --PTRY 'use PTRY - 1, PTRY -= 1 1000NNNNN PTRX[INDEX] 'use PTRX + INDEX 1100NNNNN PTRY[INDEX] 'use PTRY + INDEX 1011NNNNN PTRX++[INDEX] 'use PTRX, PTRX += INDEX 1111NNNNN PTRY++[INDEX] 'use PTRY, PTRY += INDEX 1011nnnnn PTRX--[INDEX] 'use PTRX, PTRX -= INDEX 1111nnnnn PTRY--[INDEX] 'use PTRY, PTRY -= INDEX 1010NNNNN ++PTRX[INDEX] 'use PTRX + INDEX, PTRX += INDEX 1110NNNNN ++PTRY[INDEX] 'use PTRY + INDEX, PTRY += INDEX 1010nnnnn --PTRX[INDEX] 'use PTRX - INDEX, PTRX -= INDEX 1110nnnnn --PTRY[INDEX] 'use PTRY - INDEX, PTRY -= INDEX Examples: 0000110 00 1 1111 DDDDDDDDD 100000000 RDAUX D,PTRX 'read AUX[PTRX] into D 0000111 00 1 1111 DDDDDDDDD 101111111 RDAUXR D,PTRX-- 'read AUX[!PTRX] into D, PTRX -= 1 1101010 00 1 1111 DDDDDDDDD 111000001 WRAUX D,++PTRY 'write D to AUX[PTRY+1], PTRY += 1 1101010 10 1 1111 DDDDDDDDD 110000111 WRAUXR D,PTRY[7] 'write D to AUX[!PTRY+7] 0000110 00 1 1111 DDDDDDDDD 101101111 RDAUX D,PTRX++[15] 'read AUX[PTRX] into D, PTRX += 15 1101010 00 1 1111 DDDDDDDDD 111010110 WRAUX D,--PTRY[10] 'write D to AUX[PTRY-10], PTRY -= 10 instructions clocks ------------------------------------------------------------------------------------------------------ 0000110 ZC 0 CCCC DDDDDDDDD SSSSSSSSS RDAUX D,S 'read AUX[S] into D 2 0000110 ZC 1 CCCC DDDDDDDDD 0SSSSSSSS RDAUX D,#S 'read AUX[#S] into D 1 0000110 ZC 1 CCCC DDDDDDDDD 1SUPNNNNN RDAUX D,PTRX/Y 'read AUX[PTRX/Y exp] into D 1 0000111 ZC 0 CCCC DDDDDDDDD SSSSSSSSS RDAUXR D,S 'read AUX[!S] into D 2 0000111 ZC 1 CCCC DDDDDDDDD 0SSSSSSSS RDAUXR D,#S 'read AUX[#!S] into D 1 0000111 ZC 1 CCCC DDDDDDDDD 1SUPNNNNN RDAUXR D,PTRX/Y 'read AUX[!PTRX/Y exp] into D 1 1101010 00 0 CCCC DDDDDDDDD SSSSSSSSS WRAUX D,S 'write D into AUX[S] 1 ** 1101010 00 1 CCCC DDDDDDDDD 0SSSSSSSS WRAUX D,#S 'write D into AUX[#S] 1 ** 1101010 00 1 CCCC DDDDDDDDD 1SUPNNNNN WRAUX D,PTRX/Y 'write D into AUX[PTRX/Y exp] 1 ** 1101010 01 0 CCCC DDDDDDDDD SSSSSSSSS WRAUX #D,S 'write #D into AUX[S] 1 ** 1101010 01 1 CCCC DDDDDDDDD 0SSSSSSSS WRAUX #D,#S 'write #D into AUX[#S] 1 ** 1101010 01 1 CCCC DDDDDDDDD 1SUPNNNNN WRAUX #D,PTRX/Y 'write #D into AUX[PTRX/Y exp] 1 ** 1101010 10 0 CCCC DDDDDDDDD SSSSSSSSS WRAUXR D,S 'write D into AUX[!S] 1 ** 1101010 10 1 CCCC DDDDDDDDD 0SSSSSSSS WRAUXR D,#S 'write D into AUX[#!S] 1 ** 1101010 10 1 CCCC DDDDDDDDD 1SUPNNNNN WRAUXR D,PTRX/Y 'write D into AUX[!PTRX/Y] 1 ** 1101010 11 0 CCCC DDDDDDDDD SSSSSSSSS WRAUXR #D,S 'write #D into AUX[!S] 1 ** 1101010 11 1 CCCC DDDDDDDDD 0SSSSSSSS WRAUXR #D,#S 'write #D into AUX[#!S] 1 ** 1101010 11 1 CCCC DDDDDDDDD 1SUPNNNNN WRAUXR #D,PTRX/Y 'write #D into AUX[!PTRX/Y] 1 ** 1111111 ZC 0 CCCC DDDDDDDDD 000001100 GETPTRX D 'get PTRX into D 1 1111111 ZC 0 CCCC DDDDDDDDD 000001101 GETPTRY D 'get PTRY into D 1 1111111 00 0 CCCC DDDDDDDDD 010000000 SETPTRX D 'set PTRX to D 1 1111111 00 1 CCCC DDDDDDDDD 010000000 SETPTRX #D 'set PTRX to #D 1 1111111 00 0 CCCC DDDDDDDDD 010000001 SETPTRY D 'set PTRY to D 1 1111111 00 1 CCCC DDDDDDDDD 010000001 SETPTRY #D 'set PTRY to #D 1 1111111 00 0 CCCC DDDDDDDDD 010000010 ADDPTRX D 'add D into PTRX 1 1111111 00 1 CCCC DDDDDDDDD 010000010 ADDPTRX #D 'add #D into PTRX 1 1111111 00 0 CCCC DDDDDDDDD 010000011 ADDPTRY D 'add D into PTRY 1 1111111 00 1 CCCC DDDDDDDDD 010000011 ADDPTRY #D 'add #D into PTRY 1 1111111 00 0 CCCC DDDDDDDDD 010000100 SUBPTRX D 'subtract D from PTRX 1 1111111 00 1 CCCC DDDDDDDDD 010000100 SUBPTRX #D 'subtract #D from PTRX 1 1111111 00 0 CCCC DDDDDDDDD 010000101 SUBPTRY D 'subtract D from PTRY 1 1111111 00 1 CCCC DDDDDDDDD 010000101 SUBPTRY #D 'subtract #D from PTRY 1 1111110 11 0 CCCC 00 nnnnnnnnnnnnnnnn CALLX #n 'write Z,C,PC* into [PTRX++], PC=n 4 ** 1111110 11 0 CCCC 01 nnnnnnnnnnnnnnnn CALLX @n 'write Z,C,PC* into [PTRX++], PC+=n 4 ** 1111110 11 0 CCCC 10 nnnnnnnnnnnnnnnn CALLXD #n 'write Z,C,PC* into [PTRX++], PC=n 4 ** 1111110 11 0 CCCC 11 nnnnnnnnnnnnnnnn CALLXD @n 'write Z,C,PC* into [PTRX++], PC+=n 4 ** 1111110 11 1 CCCC 00 nnnnnnnnnnnnnnnn CALLY #n 'write Z,C,PC* into [PTRY++], PC=n 4 ** 1111110 11 1 CCCC 01 nnnnnnnnnnnnnnnn CALLY @n 'write Z,C,PC* into [PTRY++], PC+=n 4 ** 1111110 11 1 CCCC 10 nnnnnnnnnnnnnnnn CALLYD #n 'write Z,C,PC* into [PTRY++], PC=n 4 ** 1111110 11 1 CCCC 11 nnnnnnnnnnnnnnnn CALLYD @n 'write Z,C,PC* into [PTRY++], PC+=n 4 ** 1111111 ZC 0 CCCC xxxxxxxxx 100000100 RETX 'read Z,C,PC* from [--PTRX] 4 1111111 ZC 0 CCCC xxxxxxxxx 100000101 RETXD 'read Z,C,PC* from [--PTRX] 4 1111111 ZC 0 CCCC xxxxxxxxx 100000110 RETY 'read Z,C,PC* from [!--PTRY] 4 1111111 ZC 0 CCCC xxxxxxxxx 100000111 RETYD 'read Z,C,PC* from [!--PTRY] 4 ------------------------------------------------------------------------------------------------------ * bit 17 is Z, bit 16 is C, bits 15..0 are PC, upper bits are ignored (RETx) or cleared (CALLx) ** if followed by 'RDAUX/RDAUXR D,#S/PTRX/PTRY' or RETAx/RETBx, add one clock MULTI-TASKING ------------- Each cog has four sets of flags and program counters (Z/C/PC), constituting four unique tasks that can execute and switch on each instruction cycle. At cog startup, the tasks are initialized as follows, with only task 0 enabled: task Z C PC ---------------- 0 0 0 $0000 1 0 0 $0001 2 0 0 $0002 3 0 0 $0003 The SETTASK instruction is used to set the number of time slots and the sequence of tasks within those time slots. SETTASK's 32-bit operand consists of 16 bit pairs which declare tasks (0..3) from the bottom bit pair, upwards, with any leading %00 bit pairs declaring unused time slots. This way, simple task sequences can be established with immediate values: SETTASK #%%3210 'set repeating task sequence of 0-1-2-3 4 time slots: - - - - - - - - - - - - 3 2 1 0 | | | | | | | | | | | | | | | | TASK register: %00_00_00_00_00_00_00_00_00_00_00_00_11_10_01_00 SETTASK #%%210 'set repeating task sequence of 0-1-2 3 time slots: - - - - - - - - - - - - - 2 1 0 | | | | | | | | | | | | | | | | TASK register: %00_00_00_00_00_00_00_00_00_00_00_00_00_10_01_00 By providing a 32-bit value via D, up to 16 time slots can be defined: 16 time slots: 3 0 1 0 2 0 1 0 2 0 1 0 2 0 1 0 | | | | | | | | | | | | | | | | TASK register: %11_00_01_00_10_00_01_00_10_00_01_00_10_00_01_00 In the case above, task 0 gets 1/2 of the time slots, task 1 gets 1/4, task 2 gets 3/16 and task 3 gets 1/16. It is generally a good idea to intermingle the tasks evenly so that I/O behavior is not lumpy in time. Below, tasks 0..3 get 1/3, 1/3, 1/6, and 1/6 of the time slots, all perfectly spaced: 6 time slots: - - - - - - - - - - 3 1 0 2 1 0 | | | | | | | | | | | | | | | | TASK register: %00_00_00_00_00_00_00_00_00_00_11_01_00_10_01_00 If you want task 0 to run most of the time, with task 1 running as seldom as possible: 16 time slots: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | | | | | | | | | | | | | | | | TASK register: %01_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00 The task identified in the bottom two bits of the SETTASK operand will be at the execution stage on the 5th instruction cycle after SETTASK. If a task is given no time slot, it doesn't execute and its flags and PC stay at initial values. If a task is given a time slot, it will execute and its Z/C/PC will be updated at every instruction cycle, or time slot, alloted to it. If an active task's time slots are all taken away, that task's Z/C/PC remain in the state where they left off, until it is given another time slot. To immediately force any of the four tasks' PC's to a new address, JMPTASK can be used. JMPTASK uses a 4-bit mask to select which PC's are going to be written. Mask bits 3..0 represent PC's 3..0. The mask value %1010 would write PC 3 and PC 1, while %0100 would write PC 2, only: JMPTASK D/#,S/# force PC's in mask D/# to address S/# For every task affected by a JMPTASK instruction, all affected-task instructions currently in the pipeline are cancelled. This insures that after JMPTASK executes, the next instruction from each affected task will be from the new address. Also, instruction block repeating will be cancelled for any affected task that was using REPS/REPD. Here is an example in which all four tasks are started and each task toggles an I/O pin at a different rate: ORG JMP #task0 'task 0 begins here when the cog starts (this JMP takes 4 clocks) JMP #task1 'task 1 begins here after task 0 executes SETTASK (this JMP takes 1 clock) JMP #task2 'task 2 begins here after task 0 executes SETTASK (this JMP takes 1 clock) JMP #task3 'task 3 begins here after task 0 executes SETTASK (this JMP takes 1 clock) task0 SETTASK #%%3210 'enable all tasks in 0-1-2-3 round-robin sequence :loop NOTP #0 'task 0, toggle pin 0 - loops every 8 clocks JMP #:loop '(this JMP takes 1 clock) task1 NOTP #1 'task 1, toggle pin 1 - loops every 12 clocks NOP JMP #task1 '(this JMP takes 1 clock) task2 NOTP #2 'task 2, toggle pin 2 - loops every 16 clocks NOP NOP JMP #task2 '(this JMP takes 1 clock) task3 NOTP #3 'task 3, toggle pin 3 - loops every 20 clocks NOP NOP NOP JMP #task3 '(this JMP takes 1 clock) ------------------------------------------------------------------------------------------------------------ NOTE: When a normal branch instruction (JMP, CALL, RET, etc.) executes in the 4th and final stage of the pipeline, all instructions progressing through the lower three stages which belong to the same task as the branch instruction are cancelled. This inhibits execution of incidental data that was trailing the branch instruction. The delayed branch instructions (JMPD, CALLD, RETD, etc.) don't do any pipeline instruction cancellation and exist to provide 1-clock branches, where three instructions belonging to the same task as the branch will execute before instructions begin executing from the location branched to. For single-task programs this is the natural consequence of allowing the three lower pipeline stages to advance to execution before the instructions from the new address start executing. For multi-task programs that may not have had three instructions in the pipeline from the branching task, the deficit of three instructions will be waited for before the new address takes effect. This way, all code may be written for optimal single-task execution, but it still works in all task modes. For normal (non-delayed) CALLs, PC+1 is stored as the return address. For delayed CALLs, PC+4 is stored, to accommodate three trailing instructions. For single-task programs, normal branches take 4 clocks: 1 clock for the branch and 3 clocks for the cancelled instructions to come through the pipeline before the new instruction stream begins to execute. ------------------------------------------------------------------------------------------------------------ Tips for coding multi-tasking programs -------------------------------------- While all tasks in a multi-tasking program can execute atomic instructions without any inter-task conflict, remember that there's only one of each of the following cog resources and only one task can use it at a time: Singular resource Some related instructions that could cause conflicts ---------------------------------------------------------------------------------------------------------- WIDE registers RDBYTEC/RDWORDC/RDLONGC/RDWIDEC/RDWIDE/WRWIDE/SETWIDE/SETWIDZ INDA FIXINDA/FIXINDS/SETINDA/SETINDS / INDA modification via INDA usage INDB FIXINDB/FIXINDS/SETINDB/SETINDS / INDB modification via INDB usage PTRA SETPTRA/ADDPTRA/SUBPTRA / PTRA modification via RDxxxx/WRxxxx PTRB SETPTRB/ADDPTRB/SUBPTRB / PTRB modification via RDxxxx/WRxxxx PTRX SETPTRX/ADDPTRX/SUBPTRX/CALLX/RETX/PUSHX/POPX / PTRX modification via RDAUXx/WRAUXx PTRY SETPTRY/ADDPTRY/SUBPTRY/CALLY/RETY/PUSHY/POPY / PTRY modification via RDAUXx/WRAUXx ACCA SETACCA/SETACCS/MACA/SARACCA/SARACCS/CLRACCA/CLRACCS ACCB SETACCB/SETACCS/MACB/SARACCB/SARACCS/CLRACCB/CLRACCS 32x32 multiplier MUL32/MUL32U 64/32 divider FRAC/DIV32/DIV32U/DIV64/DIV64U/DIV64D 64-bit square rooter SQRT64/SQRT32 CORDIC computer QSINCOS/QARCTAN/QROTATE/QLOG/QEXP/SETQI/SETQZ SERA SETSERA/SERINA/SEROUTA SERB SETSERB/SERINB/SEROUTB XFR SETXFR VID WAITVID/SETVID/SETVIDY/SETVIDI/SETVIDQ/POLVID CTRA SETCTRA/SETWAVA/SETPHSA/ADDPHSA/SUBPHSA/GETPHZA/POLCTRA/CAPCTRA/SYNCTRA CTRB SETCTRB/SETWAVB/SETPHSB/ADDPHSB/SUBPHSB/GETPHZB/POLCTRB/CAPCTRB/SYNCTRB PIX (not usable in multi-tasking, requires single-task timing) When writing multi-task programs, be aware that any multi-clock instructions will stall the pipeline, creating ripple effects in other tasks' timing. This may be impossible to avoid, as some task will likely need to access hub memory, and hub instructions are mostly multi-clock. For absolutely deterministic timing, it may be necessary to write a single-task program. Some instructions which stall the pipeline during single-task execution will, instead, jump back to themselves during multi-task execution (JMP #$), until their release condition is met. This way they avoid stalling the pipeline, allowing other tasks to execute in the interstitial time slots: WAITVID D/#,S/# wait for VID to grab new data SERINA D wait for serial input on SERA SERINB D wait for serial input on SERB SEROUTA D/# wait to send serial output on SERA SEROUTB D/# wait to send serial output on SERB GETMULL D wait for lower multiplier result GETMULH D wait for upper multiplier result GETDIVQ D wait for divider quotient result GETDIVR D wait for divider remainder result GETSQRT D wait for square root result GETQX D wait for CORDIC X result GETQY D wait for CORDIC Y result GETQZ D wait for CORDIC Z result SYNCTRA wait for PHSA to roll over SYNCTRB wait for PHSB to roll over For the above instructions, multi-tasking is considered to be active when SETTASK D/# has written a mixture of tasks to the time slots. Remember that in multi-tasking, the above instructions behave as branches, and therefore cannot be used in REPD/REPS instruction-repeat blocks. Also, you should not use INDx++/INDx--/++INDx with these instructions during multi-tasking, as they will cause INDA/INDB to increment or decrement each time they loop back to themselves, before the release condition is met. To avoid excessively stalling the pipeline during multi-tasking, the WAITCNT/WAITPEQ/WAITPNE instructions can be substituted with non-stalling alternatives: PASSCNT D/# jumps to itself if some amount of time has not passed, use instead of WAITCNT JP/JNP D/#,S/# jumps based on pin states, use instead of WAITPEQ/WAITPNE The following instruction will not work in a multi-tasking program: GETPIX needs 3 clocks in stages 2 and 3, takes 3 clocks in stage 4 - single-task only instructions clocks ------------------------------------------------------------------------------------------------- 1111001 10 0 CCCC DDDDDDDDD SSSSSSSSS JMPTASK D,S 'Set PC's in mask D to S 1 1111001 10 1 CCCC DDDDDDDDD SSSSSSSSS JMPTASK D,#S 'Set PC's in mask D to #S 1 1111001 11 0 CCCC DDDDDDDDD SSSSSSSSS JMPTASK #D,S 'Set PC's in mask #D to S 1 1111001 11 1 CCCC DDDDDDDDD SSSSSSSSS JMPTASK #D,#S 'Set PC's in mask #D to #S 1 1111111 00 0 CCCC DDDDDDDDD 010010011 SETTASK D 'Set TASK to D 1 1111111 00 1 CCCC DDDDDDDDD 010010011 SETTASK #D 'Set TASK to #D 1 ------------------------------------------------------------------------------------------------- PIPELINE -------- Each cog has a 4-stage pipeline which all instructions progress through, in order to execute: 1st stage - Read instruction from cog register RAM 2nd stage - Determine any indirect or remapped D and S addresses within instruction - Update INDA and INDB 3rd stage - Read D and S from cog register RAM 4th stage - Execute instruction using D and S - Write any D result to cog register RAM - Update Z/C/PC and any other results On every clock cycle, the instruction data in each stage advances to the next stage, unless the instruction in the 4th stage is stalling the pipeline because it's waiting for something (i.e. WRBYTE waits for the hub). To keep D and S data current within the pipeline, the resultant D from the 4th stage is passed back to the 3rd stage to substitute for any obsoleted D or S data currently being read from the cog register RAM. The same is done for instruction data currently being read in the 1st stage, but this still leaves a two- stage gap between when a register is modified and when it can be executed: 'single-task self-modifying code SETI :inst,top9 '(initially 4th stage) modify instruction NOP '(initially 3rd stage) 1... NOP '(initially 2nd stage) 2... at least two instructions in-between :inst ADD A,B '(initially 1st stage) modified instruction properly executes Tasks that execute no more frequently than every 3rd time slot don't need to observe this 2-instruction spacer rule when executing self-modifying code, because their instructions will always be sufficiently spread apart in the pipeline by other tasks' instructions, enabling a just-modified instruction to be properly read and executed in that task's next time slot. If less than two spacers are afforded to a modify-execute sequence in a single-task program, the old instruction will be read and executed, instead of the newly-modified one. This can be used to advantage for efficient overlapped modify-execute sequences. When a branch instruction executes, that task's program counter is abruptly changed from what had been a steadily incrementing course, requiring that the pipeline be reloaded, beginning at the new program counter address. This can leave up to three instructions in the pipeline which were trailing the branch instruction and belong to the same task as the branch. Normally, these trailing instructions are incidental data which are not intended for execution, and therefore must be cancelled within the pipeline, so that they pass through without doing anything. However, in the case of a single-task program, it may be desirable to allow those instrucions to execute, without cancellation, to increase pipeline efficiency. This will result in the branch taking just 1 clock cycle, but three trailing instructions will be executed before the branch appears to take effect: 'single-task delayed branch JMPD #somewhere '(initially 4th stage) do a delayed jmp, then toggle P0 and cycle P1 NOTP #0 '(initially 3rd stage) NOTP #1 '(initially 2nd stage) NOTP #1 '(initially 1st stage) next instruction is loaded from 'somewhere' To accommodate both cancelling and non-cancelling branches, branch instructions have two versions. The ones that end in the letter 'D' for 'delayed' are non-cancelling and take only one clock. The branch instructions that don't end in the letter 'D' are what would be considered 'normal' branches, as they cancel any same-task instructions in the pipeline, so that the next instruction to execute after the branch would be the instruction which was branched to. For code compatibility across all task modes, three trailing instructions from the same task as the delayed branch will always be executed before the delayed branch takes effect, regardless of whether a program is single- or multi-task. Here are all the branching instructions: 'normal' 'delayed' cancelling non-cancelling ---------- -------------- JMP JMPD jump to address CALL CALLD call subroutine using task's 4-level stack RET RETD return from subroutine using task's 4-level stack CALLA CALLAD call subroutine using HUB[PTRA++] RETA RETAD return from subroutine using HUB[--PTRA] CALLB CALLBD call subroutine using HUB[PTRB++] RETB RETBD return from subroutine using HUB[--PTRB] CALLX CALLXD call subroutine using AUX[PTRX++] RETX RETXD return from subroutine using AUX[--PTRX] CALLY CALLYD call subroutine using AUX[!PTRY++] RETY RETYD return from subroutine using AUX[!--PTRY] JMPSW JMPSWD jmp/call with Z/C/PC store SWITCH SWITCHD switch between threads (JMPSW/JMPSWD INDB,++INDB) IJZ IJZD increment D and jump if result zero IJNZ IJNZD increment D and jump if result not zero DJZ DJZD decrement D and jump if result zero DJNZ DJNZD decrement D and jump if result not zero JP JPD jump if pin D reads high JNP JNPD jump if pin D reads low JZ JZD jump if D zero JNZ JNZD jump if D not zero JMPLIST jump to position in jump list JMPTASK jump selected tasks to address Here is an example of a delayed branch: loop MOV X,#100 'toggle P0/P1/P2 100 times, then toggle P3 (single-task) loop2 DJNZD X,@loop2 'loop, delayed branch executes 3 trailing instructions NOTP #0 'toggle P0 NOTP #1 'toggle P1 NOTP #2 'toggle P2 NOTP #3 'now toggle P3 JMP #loop 'do it again INSTRUCTION-BLOCK REPEATING --------------------------- Each task within a cog has an instruction-block repeater that can variably repeat up to 64 instructions without any clock-cycle overhead. REPS and REPD are used to initiate block repeats. These instructions specify how many times the trailing instruction block will be executed and how many instructions are in the block: REPS #n,#i - execute 1..64 instructions 1..65536 times, requires 1 spacer instruction REPD #i - execute 1..64 instructions infinitely, requires 3 spacer instructions REPD D,#i - execute 1..64 instructions D+1 times, requires 3 spacer instructions REPD #n,#i - execute 1..64 instructions 1..512 times, requires 3 spacer instructions REPS differs from REPD by executing at the 2nd stage of the pipeline, instead of the 4th. By executing two stages earlier, it needs only one spacer instruction. Because of its earliness, no conditional execution is possible, so it is forced to always execute, allowing the CCCC bits to be repurposed, affording a contiguous 16-bit constant for the repeat count. The instruction-block repeater will quit repeating the block if a branch instruction executes within the block, or if a JMPTASK instruction affects the task which is using the repeater. The following instructions potentially jump to themselves (JMP #$) and, by branching, will cancel the block repeater if executed within a repeat block: PASSCNT - always SERINA/SERINB/SEROUTA/SEROUTB - only during multi-tasking GETMULL/GETMULH/GETDIVQ/GETDIVR/GETSQRT/GETQX/GETQY/GETQZ - only during multi-tasking WAITVID/SYNCTRA/SYNCTRB - only during multi-tasking Example (1-task): REPD D,#1 'execute 1 instruction D times (if D=0, same as D=1) NOP '3 spacer instructions needed (could do something useful) NOP NOP NOTP #0 'toggle P0, block repeats every 1 clock Example (1-task): REPS #20_000,#4 'execute 4 instructions 20,000 times NOP '1 spacer instruction needed (make the most of it) NOTP #0 'toggle P0 NOTP #1 'toggle P1 NOTP #2 'toggle P2 NOTP #3 'toggle P3, block repeats every 4 clocks instructions (iiiiii = #i-1, n_nnnn_nnnnnnnnn_nnn/nnnnnnnnn = #n-1) clocks ----------------------------------------------------------------------------------------------------- 1111101 01 1 nnnn nnnnnnnnn nnniiiiii REPS #n,#i 'execute 1..64 inst's 1..65536 times 1 1111111 00 0 CCCC 111111111 001iiiiii REPD #i 'execute 1..64 inst's infintely 1 1111111 00 0 CCCC DDDDDDDDD 001iiiiii REPD D,#i 'execute 1..64 inst's D times 1 1111111 00 1 CCCC nnnnnnnnn 001iiiiii REPD #n,#i 'execute 1..64 inst's 1..512 times 1 ----------------------------------------------------------------------------------------------------- HUB COUNTER ----------- The hub contains a 64-bit counter called CNT that increments on each clock cycle. Each cog can use CNT to mark time in various ways. On chip reset, the ROM Booter initializes CNT to $00000000_00000000, from which point it begins incrementing. Here are the instructions which relate to CNT: GETCNT D Get CNT[31..0] into D. GETCNTX D Get CNT[63..32], delayed by 1 clock, into D. A single-task program executing a GETCNT, immediately followed by a GETCNTX, would get a 64-bit snapshot of CNT. SUBCNT D Get CNT[31..0] minus D into D. If another SUBCNT is executed in the next clock cycle by the same task, it gets CNT[63..32], delayed by 1 clock, minus D minus the carry from the previous SUBCNT (not the C flag) into D. In either case, the logical not of the MSB of the D result (not the carry) goes into C, indicating by C=1 if CNT[31..0] or CNT[63..0] has exceeded the original D value(s). CMPCNT D Same as SUBCNT, but doesn't store the D result(s). Useful for periodic checking if a time target has been reached yet. PASSCNT D Jump to self if MSB of CNT[31..0] minus D is 1. In other words, loop until CNT[31..0] exceeds D. This is intended as a non-pipeline-stalling alternative to WAITCNT, for use in multi-task programs. WAITCNT D,S/# Wait for CNT[31..0] to be equal to D. Adds S/# into D. WAITCNT D,S/# WC Wait for CNT[63..0] to be equal to the concatenation of the last-written D value and the D expressed in the WAITCNT. Adds S/# into D. Carry from the addition goes into C. This instruction only works within single-task programs, as the last-written D needs to be from the same task. WAITPEQ D,S/#,#port WC Like WAITPEQ without WC, except the last-written D value becomes a CNT[31..0] timeout target, with C returning 0 if the WAITPEQ condition was met, or 1 if the timeout occurred first. This instruction only works within single-task programs, as the last-written D needs to be from the same task. WAITPNE D,S/#,#port WC Like WAITPNE without WC, except the last-written D value becomes a CNTL[31..0] timeout target, with C returning 0 if the WAITPNE condition was met, or 1 if the timeout occurred first. This instruction only works within single-task programs, as the last-written D needs to be from the same task. Examples: 'Measure time using lower 32 bits of CNT GETCNT ticks 'get CNT[31..0] into ticks 'execute some code SUBCNT ticks 'get CNT[31..0] minus ticks into ticks, took ticks-1 clocks 'Measure time using full 64 bits of CNT (single-task) GETCNT ticks_lo 'get CNT[63..0] into {ticks_hi, ticks_lo} GETCNTX ticks_hi 'execute some code SUBCNT ticks_lo 'get CNT[63..0] minus {ticks_hi, ticks_lo} into {ticks_hi, ticks_lo} SUBCNT ticks_hi ' took {ticks_hi, ticks_lo}-1 clocks 'Do something for some time GETCNT ticks 'get CNT[31..0] ADD ticks,#500 'add 500 loop 'execute some code CMPCNT ticks WC 'check if 500 clocks have elapsed yet if_nc JMP #loop 'if not, loop 'Do something every Nth clock (multi-task) GETCNT ticks 'get CNT[31..0] loop ADD ticks,#500 'add 500 PASSCNT ticks 'wait for next 500th clock 'execute some code jmp #loop 'loop 'Do something every Nth clock using CNT[31..0] (single-task) GETCNT ticks 'get CNT[31..0] ADD ticks,#500 'add initial 500 loop WAITCNT ticks,#500 'wait for next 500th clock, add next 500, jitter-free 'execute some code jmp #loop 'loop 'Do something every Nth clock using CNT[63..0] (single-task) GETCNT ticks_lo 'get CNT[63..0] into {ticks_hi, ticks_lo} GETCNTX ticks_hi loop ADD ticks_lo,lo WC 'add 64-bit clock offset ADDX ticks_hi,hi 'this last-written D value becomes the CNT[63..32] target WAITCNT ticks_lo,#0 WC 'wait for next {hi,lo}th clock, don't add here (easier above), jitter-free 'execute some code jmp #loop 'loop 'Wait for pins to equal a value, with time-out (single-task) GETCNT ticks 'get CNT[31..0] as timeout base ADD ticks,#200 'add timeout of 200 clocks, last-written D value is timeout target WAITPEQ value,mask,#0 WC 'wait for (PINA & mask) == value, with timeout if_c JMP #timeout 'if C=1 then timeout occurred, else pin condition was met instructions clocks --------------------------------------------------------------------------------------------------------------------------------------- 1111111 ZC 0 CCCC DDDDDDDDD 000000100 GETCNT D 'get CNT[31..0] into D, C=MSB 1 1111111 ZC 0 CCCC DDDDDDDDD 000000101 GETCNTX D 'get prior CNT[63..32] into D, C=MSB 1 1111111 ZC 0 CCCC DDDDDDDDD 000100010 SUBCNT D 'get CNT[31..0] (then prior CNT[63..32]) minus D into D, C=passed 1 1111111 0C 0 CCCC DDDDDDDDD 010001100 CMPCNT D 'compare CNT[31..0] (then prior CNT[63..32]) to D, C=passed 1 1111111 00 0 CCCC DDDDDDDDD 010100110 PASSCNT D 'loop until CNT[31..0] passes D 1* 1111111 00 1 CCCC DDDDDDDDD 010100110 PASSCNT #D 'loop until CNT[31..0] passes #D 1* 1001111 10 0 CCCC DDDDDDDDD SSSSSSSSS WAITCNT D,S 'wait for CNT[31..0] == D, D += S ? 1001111 10 1 CCCC DDDDDDDDD SSSSSSSSS WAITCNT D,#S 'wait for CNT[31..0] == D, D += #S ? 1001111 11 0 CCCC DDDDDDDDD SSSSSSSSS WAITCNT D,S WC 'wait for CNT[63..0] == {last D, D}, D += S, C=carry ? 1001111 11 1 CCCC DDDDDDDDD SSSSSSSSS WAITCNT D,#S WC 'wait for CNT[63..0] == {last D, D}, D += #S, C=carry ? 110010n n1 0 CCCC DDDDDDDDD SSSSSSSSS WAITPEQ D,S,#n WC 'wait for (PINn & S) == D, or CNT[31..0] == last D, C=timeout ? 110010n n1 1 CCCC DDDDDDDDD SSSSSSSSS WAITPEQ D,S,#n WC 'wait for (PINn & #S) == D, or CNT[31..0] == last D, C=timeout ? 110011n n1 0 CCCC DDDDDDDDD SSSSSSSSS WAITPNE D,S,#n WC 'wait for (PINn & S) <> D, or CNT[31..0] == last D, C=timeout ? 110011n n1 1 CCCC DDDDDDDDD SSSSSSSSS WAITPNE D,S,#n WC 'wait for (PINn & #S) <> D, or CNT[31..0] == last D, C=timeout ? --------------------------------------------------------------------------------------------------------------------------------------- * 1 + number of other instructions in the pipeline (0..3) which belong to the same task HUB EXECUTION ------------- When a cog is started, registers $000..$1F3 are loaded sequentially from hub memory and then execution commences at register $000. Executing code in this initial mode, from within the cog, is fastest and deterministic, though cog space is limited, with some of the registers invariably serving as data and variables, possibly limiting your code size. Large programs, or programs which don't need to be deterministic and would like to free up the cog register space for data, may be executed from hub memory, instead. These programs address the 256K byte hub memory as 64k longs, ranging from $0000..$FFFF. To accommodate this, all cog program counters are 16-bit, and there are 16-bit-constant 'jump', 'call', and 'return' instructions. To execute from the hub, simply branch outside of the cog address space of $000..$1FF to the executable hub address space of $0200..$FFFF. You can jump, call, and return to and from any address. If an instruction's address is $000..$1FF, it is fetched from cog memory. If an instruction's address is $0200..$FFFF, it is fetched from hub memory. Each cog has four instruction cache lines of eight longs, each, which serve as intermediaries between the hub memory and instruction pipeline. Whenever an instruction is needed from the hub that is not currently cached, a cache line is loaded on the next hub cycle, temporarily stalling the pipeline. Cache lines are reloaded on a least-recently-used basis. A prefetch mode, enabled on cog start, allows straight-line code without hub instructions to execute at full-speed, as if it was running in the cog memory. Prefetch may be turned off to speed up programs which have multiple tasks executing from the hub, and would be hindered by irrelevant prefetches. It may also be turned off to allow a single-task program to cache four lines that can be looped within, without cache disruption. Here are the instructions which govern the instruction cache: ICACHEX 'invalid instruction cache, forces reloads on next hub instructions ICACHEP 'enable prefetch (this mode is enabled on cog start) ICACHEN 'disable prefetch To help make hub execution practical, there are two instructions, AUGS and AUGD, which each provide 23 bits of data to extend 9-bit constants in subsequent instructions to 32 bits: AUGS #longvalue >> 9 MOV reg,#longvalue & $1FF AUGD #longvalue >> 9 SETXCH #longvalue & $1FF AUGS #frq32a >> 9 AUGD #frq32b >> 9 SETFRQS #frq32b & $1FF,#frq32a & $1FF For simplicity, these can be coded as such: MOV reg,##longvalue SETXCH ##longvalue SETFRQS ##frq32b,##frq32a AUGS is cancelled when a subsequent instruction expresses a constant S. AUGD is cancelled when a subsequent instruction expresses a constant D. There are separate AUGS/AUGD circuits for each of the four tasks within a cog. Remember that for every ##, you are generating an AUGS/AUGD instruction. All 'jump' and 'call' instructions have 16-bit-constant and D-register variants: (delayed '-D' versions omitted for brevity) JMP #absolute16 'jump to 16-bit absolute address JMP @relative16 'jump to 16-bit relative address JMP D 'jump to D[15:0], WZ/WC load Z/C from D[31:30] CALL #absolute16 'call to 16-bit absolute address, push {Z,C,PC+1} into task's 4-level stack CALL @relative16 'call to 16-bit relative address, push {Z,C,PC+1} into task's 4-level stack CALL D 'call to D[15:0], push {Z,C,PC+1} into task's 4-level stack, WZ/WC load Z/C from D[31:30] CALLA #absolute16 'call to 16-bit absolute address, WRLONG {Z,C,PC+1},PTRA++ CALLA @relative16 'call to 16-bit relative address, WRLONG {Z,C,PC+1},PTRA++ CALLA D 'call to D[15:0], WRLONG {Z,C,PC+1},PTRA++, WZ/WC load Z/C from D[31:30] CALLB #absolute16 'call to 16-bit absolute address, WRLONG {Z,C,PC+1},PTRB++ CALLB @relative16 'call to 16-bit relative address, WRLONG {Z,C,PC+1},PTRB++ CALLB D 'call to D[15:0], WRLONG {Z,C,PC+1},PTRB++, WZ/WC load Z/C from D[31:30] CALLX #absolute16 'call to 16-bit absolute address, WRAUX {Z,C,PC+1},PTRX++ CALLX @relative16 'call to 16-bit relative address, WRAUX {Z,C,PC+1},PTRX++ CALLX D 'call to D[15:0], WRAUX {Z,C,PC+1},PTRX++, WZ/WC load Z/C from D[31:30] CALLY #absolute16 'call to 16-bit absolute address, WRAUXR {Z,C,PC+1},PTRY++ CALLY @relative16 'call to 16-bit relative address, WRAUXR {Z,C,PC+1},PTRY++ CALLY D 'call to D[15:0], WRAUXR {Z,C,PC+1},PTRY++, WZ/WC load Z/C from D[31:30] The 'return' instructions can use WZ/WC to restore Z/C to the caller's states: RET 'return, pop {Z,C,PC} from task's 4-level stack RETA 'return, RDLONG {Z,C,PC},--PTRA RETB 'return, RDLONG {Z,C,PC},--PTRB RETX 'return, RDAUX {Z,C,PC},--PTRX RETY 'return, RDAUXR {Z,C,PC},--PTRY The 'push' and 'pop' instructions: PUSH D/# 'push D/# into task's 4-level stack PUSHA D/# 'WRLONG D/#,PTRA++ PUSHB D/# 'WRLONG D/#,PTRB++ PUSHX D/# 'WRAUX D/#,PTRX++ PUSHY D/# 'WRAUXR D/#,PTRY++ POP D 'pop D from task's 4-level stack POPA D 'RDLONG D,--PTRA POPB D 'RDLONG D,--PTRB POPX D 'RDAUX D,--PTRX POPY D 'RDAUXR D,--PTRY The conditional jumps, which specify a register or a 9-bit constant for their branch address, all sign-extend their 9-bit constants for use as a relative address - unless AUGS is used to expresses a full 16-bit relative address: IJZ D,@relative9 'increment D and jump to 9-bit relative address if zero IJZ D,@@relative16 'increment D and jump to 16-bit relative address if zero IJZ D,S 'increment D and jump to S[15:0] if zero IJNZ D,@relative9 'increment D and jump to 9-bit relative address if not zero IJNZ D,@@relative16 'increment D and jump to 16-bit relative address if not zero IJNZ D,S 'increment D and jump to S[15:0] if not zero DJZ D,@relative9 'decrement D and jump to 9-bit relative address if zero DJZ D,@@relative16 'decrement D and jump to 16-bit relative address if zero DJZ D,S 'decrement D and jump to S[15:0] if zero DJNZ D,@relative9 'decrement D and jump to 9-bit relative address if not zero DJNZ D,@@relative16 'decrement D and jump to 16-bit relative address if not zero DJNZ D,S 'decrement D and jump to S[15:0] if not zero JZ D,@relative9 'test D and jump to 9-bit relative address if zero JZ D,@@relative16 'test D and jump to 16-bit relative address if zero JZ D,S 'test D and jump to S[15:0] if zero JNZ D,@relative9 'test D and jump to 9-bit relative address if not zero JNZ D,@@relative16 'test D and jump to 16-bit relative address if not zero JNZ D,S 'test D and jump to S[15:0] if not zero JP D/#,@relative9 'jump to 9-bit relative address if pin D/# reads high JP D/#,@@relative16 'jump to 16-bit relative address if pin D/# reads high JP D/#,S 'jump to S[15:0] if pin D/# reads high JNP D/#,@relative9 'jump to 9-bit relative address if pin D/# reads low JNP D/#,@@relative16 'jump to 16-bit relative address if pin D/# reads low JNP D/#,S 'jump to S[15:0] if pin D/# reads low JMPSW jumps according to the S field and stores {Z,C,PC} into D. WZ and WC can be used to load {Z,C} from S[17:16]: JMPSW D,@relative9 'jump to 9-bit relative address, store [Z,C,PC} into D JMPSW D,@@relative16 'jump to 16-bit relative address, store [Z,C,PC} into D JMPSW D,S 'jump to S[15:0], store [Z,C,PC} into D JMPSW D,S WZ,WC 'jump to S[15:0], store [Z,C,PC} into D, Z=S[17], C=S[16] SWITCH 'alias for 'JMPSW INDB,++INDB WZ,WC' 'For round-robin switching among threads 'Use FIXINDB to set up a loop of {Z,C,PC) registers for threads 'Can be used with register remapping for multiple program instances 'Instructions trailing SWITCHD are contextually in the next thread JMPLIST jumps to a base address (S/@/@@) plus index (D). JMPLIST D,@relative9 'jump to D plus 9-bit relative address JMPLIST D,@@relative16 'jump to D plus 16-bit relative address JMPLIST D,S 'jump to D plus S LOCBASE converts a 16-bit hub instruction address into a normal 18-bit hub address for use with RDxxxx/WRxxxx instructions: LOCBASE D,@relative9 'get 18-bit hub address from 9-bit relative address into D LOCBASE D,@@relative16 'get 18-bit hub address from 16-bit relative address into D LOCBASE D,S 'get 18-bit hub address from S[15:0] into D LOCBYTE/LOCWORD/LOCLONG are like LOCBASE, but use the initial D value as an index which gets scaled and added to the normal 18-bit hub address: LOCBYTE D,@relative9 'get 18-bit byte-indexed hub address from 9-bit relative address into D LOCBYTE D,@@relative16 'get 18-bit byte-indexed hub address from 16-bit relative address into D LOCBYTE D,S 'get 18-bit byte-indexed hub address from S[15:0] into D LOCWORD D,@relative9 'get 18-bit word-indexed hub address from 9-bit relative address into D LOCWORD D,@@relative16 'get 18-bit word-indexed hub address from 16-bit relative address into D LOCWORD D,S 'get 18-bit word-indexed hub address from S[15:0] into D LOCLONG D,@relative9 'get 18-bit long-indexed hub address from 9-bit relative address into D LOCLONG D,@@relative16 'get 18-bit long-indexed hub address from 16-bit relative address into D LOCLONG D,S 'get 18-bit long-indexed hub address from S[15:0] into D Remember that @@ is going to generate an AUGS instruction. LOCPTRA/LOCPTRB convert 16-bit constant hub instruction addresses into normal 18-bit hub addresses and then store them into into PTRA/PTRB: LOCPTRA #absolute16 'get 18-bit hub address into PTRA from 16-bit absolute instruction address LOCPTRA @relative16 'get 18-bit hub address into PTRA from 16-bit relative instruction address LOCPTRB #absolute16 'get 18-bit hub address into PTRB from 16-bit absolute instruction address LOCPTRB @relative16 'get 18-bit hub address into PTRB from 16-bit relative instruction address There are five assembler directives which are used to position instructions and set cog vs hub assembly modes: ORGH absolute16 'set 16-bit-address hub mode, advances to absolute16 and sets origin ORGH 'set 16-bit-address hub mode, initial state in DAT block ORG absolute9 'set 9-bit-address cog mode, sets origin to absolute9 ORG 'set 9-bit-address cog mode, sets origin to 0 ORGF absolute9 'advances to absolute9, must be in cog mode RES regcount 'reserves regcount locations, must be in cog mode RES 'reserves 0 locations, must be in cog mode FIT address 'errors out if address exceeded, works in both modes FIT 'if cog mode, error if origin > $1F2; if hub mode, error if origin > $10000 Here is an example PASM application (use F11 to download) which demonstrates hub execution: orgh $380 '$380 = 18-bit load address $E00 org 'internal cog code jmp @go 'jump to hub memory x long 3 'cog register variable orgh $1000 'some hub code at $1000 go incmod x,#3 jmplist x,@@list orgh $1400 'some hub code at $1400 list jmp @z0 jmp @z1 jmp @z2 jmp @z3 orgh $1800 'some hub code at $1800 z0 notp #0 jmp @go z1 notp #1 jmp @go z2 notp #2 jmp @go z3 notp #3 jmp @go COUNTERS - this section is not done yet!!! -------- Each cog has two configurable counters. They are named CTRA and CTRB and are accessed by thirteen instructions each. The instructions which end in "A" are for CTRA and those that end in "B" are for CTRB. For brevity, only CTRA instructions are used in the definitions and examples that follow. GETPHSA D - Get PHSA into D GETPHZA D - Get PHSA into D, simultaneously clear PHSA to 0 GETCOSA D - Get COSA into D GETSINA D - Get SINA into D SETCTRA D/# - Set CTRA configuration SETWAVA D/# - Set WAVA SETFRQA D/# - Set FRQA SETPHSA D/# - Set PHSA ADDPHSA D/# - Add to PHSA SUBPHSA D/# - Subtract from PHSA SYNCTRA - Wait for PHSA to roll over POLCTRA WC - Check if PHSA has rolled over (C=1 if rolled over) CAPCTRA - Capture CTRA accumulators into COSA and SINA Modes: (QDR = PHS[31] XNOR PHS[30], or PHS[31] delayed by 90 degrees) Off Mode ------------------------------------------------------------------------------- %00000 = Counter off (initial state after cog start) NCO Modes ------------------------------------------------------------------------------- %00001 = NCO output + video PLL mode, PLL output = PHS[31] (reference signal) %00010 = NCO output + video PLL mode, PLL output = PHS[31] times 8 divide by 32 %00011 = NCO output + video PLL mode, PLL output = PHS[31] times 8 divide by 16 %00100 = NCO output + video PLL mode, PLL output = PHS[31] times 8 divide by 8 %00101 = NCO output + video PLL mode, PLL output = PHS[31] times 8 divide by 4 %00110 = NCO output + video PLL mode, PLL output = PHS[31] times 8 divide by 2 %00111 = NCO output + video PLL mode, PLL output = PHS[31] times 8 divide by 1 %01000 = NCO output DUAL Modes ------------------------------------------------------------------------------- %000_01001 = dual NCO outputs + dual COUNT_LOWS inputs %001_01001 = dual NCO outputs + dual COUNT_HIGHS inputs %010_01001 = dual NCO outputs + dual COUNT_NEGATIVE_EDGES inputs %011_01001 = dual NCO outputs + dual COUNT_POSITIVE_EDGES inputs %100_01001 = dual NCO outputs + dual TIME_LOWS inputs %101_01001 = dual NCO outputs + dual TIME_HIGHS inputs %110_01001 = dual NCO outputs + dual TIME_NEGATIVE_EDGES inputs %111_01001 = dual NCO outputs + dual TIME_POSITIVE_EDGES inputs %000_01010 = dual DUTY outputs + dual COUNT_LOWS inputs %001_01010 = dual DUTY outputs + dual COUNT_HIGHS inputs %010_01010 = dual DUTY outputs + dual COUNT_NEGATIVE_EDGES inputs %011_01010 = dual DUTY outputs + dual COUNT_POSITIVE_EDGES inputs %100_01010 = dual DUTY outputs + dual TIME_LOWS inputs %101_01010 = dual DUTY outputs + dual TIME_HIGHS inputs %110_01010 = dual DUTY outputs + dual TIME_NEGATIVE_EDGES inputs %111_01010 = dual DUTY outputs + dual TIME_POSITIVE_EDGES inputs %000_01011 = dual PWM outputs + dual COUNT_LOWS inputs %001_01011 = dual PWM outputs + dual COUNT_HIGHS inputs %010_01011 = dual PWM outputs + dual COUNT_NEGATIVE_EDGES inputs %011_01011 = dual PWM outputs + dual COUNT_POSITIVE_EDGES inputs %100_01011 = dual PWM outputs + dual TIME_LOWS inputs %101_01011 = dual PWM outputs + dual TIME_HIGHS inputs %110_01011 = dual PWM outputs + dual TIME_NEGATIVE_EDGES inputs %111_01011 = dual PWM outputs + dual TIME_POSITIVE_EDGES inputs WAVE modes ------------------------------------------------------------------------------- %01100 = dual SQR_WAVE output + GOERTZEL input %01101 = dual SAW_WAVE output + GOERTZEL input %01110 = dual TRI_WAVE output + GOERTZEL input %01111 = dual SIN_WAVE output + GOERTZEL input In the WAVE modes, FRQ is added into PHS on every clock cycle. The top nine bits of PHS are used to drive sine and cosine lookup tables which are used for sine output functions and GOERTZEL computations. While the sine/cosine output functions are the most useful for signal processing, triangle-, sawtooth-, and square-wave output functions are also selectable, being derived from the top nine bits of PHS, as well. The WAVE modes output both parallel DAC signals and duty-modulated pin signals. All output signals are nine bits in base quality with an additional nine sub-bits of dithering to maintain base quality after attenuative scaling. The dual outputs differ only in phase and are set up by the WAV register: WAV register in WAVE modes (can be changed by SETWAVA/SETWAVB instruction) ------------------------------------------------------------------------------- %PPPPPPPPP_xxxxx_TTTTTTTTT_AAAAAAAAA PPPPPPPPP = phase advance for OUTA (0 to 511/512 revolutions) xxxxx = unused for WAVE modes TTTTTTTTT = offset for OUTA and OUTB AAAAAAAAA = amplitude for OUTA and OUTB Initial value after cog start: %010000000_00000_100000000_111111111 010000000 = 90-degree phase advance for GOERTZEL use (OUTA=cosine, OUTB=sine) 00000 = unused 100000000 = mid-point offset (allows maximum amplitude) 111111111 = maximum amplitude The GOERTZEL computation works as follows, on every clock: Nine-bit sine and cosine values are looked up using the top nine bits of PHS. The sine and cosine values are negated if INA is 0, else they remain the same. The sine and cosine values are added into separate sine and cosine accumulators. This process measures the energy content of INA at the frequency of PHS rollover. To make this work, the INA pin should be configured for delta-sigma ADC mode, so that it streams back 1's and 0's that ratiometrically represent the voltage of the I/O pin. To make a GOERTZEL measurement: - The top nine bits of WAV should be set to %010000000 for proper cosine lookup. - FRQ must be set to generate the frequency of interest in PHS rollovers (SETFRQA). - PHS and the accumulators should be cleared to 0 (SETPHSA #0, then CAPCTRA). - Some number of complete PHS rollovers must be waited for (SYNCTRA/POLLCTRA). - The accumulators must be captured and read (CAPCTRA + GETCOSA + GETSINA). - The hypotenuse of the accumulators will indicate signal strength and phase. By making swept FRQ measurements in a closed loop, where OUTA is used to output a reference frequency of known phase to stimulate a system, and INA receives a signal back that is somehow coupled to OUTA, you can determine things such as spectral response, resonant frequency, and frequency vs. phase of a system. The more PHS rollovers in a measurement, the more selective the result will be. For open- loop measurements, this means tighter bandwidth. For closed-loop measurements, the angle of the hypotenuse becomes meaningful. The QARCTAN instruction can translate the sine and cosine accumulations into power and phase values. LOGIC Modes ------------------------------------------------------------------------------- %10000 = LOGIC_A_POSEDGE input INA & !INA previous %10001 = LOGIC_NA_AND_NB input !INA & !INB %10010 = LOGIC_A_AND_NB input INA & !INB %10011 = LOGIC_NB input !INB %10100 = LOGIC_NA_AND_B input !INA & INB %10101 = LOGIC_NA input !INA %10110 = LOGIC_A_NE_B input INA <> INB %10111 = LOGIC_NA_OR_NB input !INA | !INB %11000 = LOGIC_A_AND_B input INA & INB %11001 = LOGIC_A_EQ_B input INA == INB %11010 = LOGIC_A input INA %11011 = LOGIC_A_OR_NB input INA | !INB %11100 = LOGIC_B input INB %11101 = LOGIC_NA_OR_B input !INA | INB %11110 = LOGIC_A_OR_B input INA | INB %11111 = LOGIC_ENCODER input INA, INB encoder OUTA = ADD signal (condition met or LOGIC_ENCODER forward step) OUTB = SUB signal (LOGIC_ENCODER reverse step) In the LOGIC modes, FRQ is conditionally added to PHS on each clock cycle that meets that mode's requirement. In the case of the LOGIC_ENCODER mode, FRQ may be added or subtracted to/from PHS when a half-step is registered. OUTA and OUTB reflect the ADD and SUB states for each cycle, and are more likely to be useful by other CTR's, rather than being sent to output pins. DACS ---- Each cog outputs 4 channels of DAC data, named DAC0..DAC3. These DAC data channels can be set to values or actively driven from CTRA, CTRB, or VID. In all cases but VID, the source data is 18 bits and is dithered on every clock cycle for 9-bit DAC output. In the case of VID, the source data is just 9 bits, so no dithering is performed. Each I/O pin has a 75-ohm 9-bit DAC which can be configured using CFGPINS to output a fixed DAC channel from any cog. Every cog's DAC0..DAC3 are available, in that sequence, to P0..P3, then to P4..P7, then to the next four pins, and so on, as shown below: PortA PortB PortC DACx -------------------------------- P0 P32 P64 DAC0 from any cog P1 P33 P65 DAC1 from any cog P2 P34 P66 DAC2 from any cog P3 P35 P67 DAC3 from any cog P4 P36 P68 DAC0 from any cog P5 P37 P69 DAC1 from any cog P6 P38 P70 DAC2 from any cog P7 P39 P71 DAC3 from any cog P8 P40 P72 DAC0 from any cog P9 P41 P73 DAC1 from any cog P10 P42 P74 DAC2 from any cog P11 P43 P75 DAC3 from any cog P12 P44 P76 DAC0 from any cog P13 P45 P77 DAC1 from any cog P14 P46 P78 DAC2 from any cog P15 P47 P79 DAC3 from any cog P16 P48 P80 DAC0 from any cog P17 P49 P81 DAC1 from any cog P18 P50 P82 DAC2 from any cog P19 P51 P83 DAC3 from any cog P20 P52 P84 DAC0 from any cog P21 P53 P85 DAC1 from any cog P22 P54 P86 DAC2 from any cog P23 P55 P87 DAC3 from any cog P24 P56 P88 DAC0 from any cog P25 P57 P89 DAC1 from any cog P26 P58 P90 DAC2 from any cog P27 P59 P91 DAC3 from any cog P28 P60 P92 DAC0 from any cog P29 P61 P93 DAC1 from any cog P30 P62 P94 DAC2 from any cog P31 P63 P95 DAC3 from any cog Here are the instructions which configure DAC0..DAC3: CFGDAC0 D/# - Configure DAC0 %00 = Software controlled (default) %01 = CTRA SIGA %10 = CTRA SIGA + CTRB SIGA %11 = VID SIG0 CFGDAC1 D/# - Configure DAC1 %00 = Software controlled (default) %01 = CTRA SIGB %10 = CTRA SIGB + CTRB SIGB %11 = VID SIG1 CFGDAC2 D/# - Configure DAC2 %00 = Software controlled (default) %01 = CTRB SIGA %10 = CTRA SIGA + CTRB SIGA %11 = VID SIG2 CFGDAC3 D/# - Configure DAC3 %00 = Software controlled (default) %01 = CTRB SIGB %10 = CTRA SIGB + CTRB SIGB %11 = VID SIG3 CFGDACS D/# - Configure DAC3..DAC0 from four 2-bit fields: %33_22_11_00 For configurations %00..%10, the data sources are 18 bits wide, with the 9 lower bits being dithered by a 32-bit LFSR to realize more DAC resolution. This improves dynamic range, but introduces a white noise of one step in amplitude in the 9-bit DAC output. As dynamic signals get smaller in amplitude, they appear to sink into the dither noise, but actually remain very high-Q, as the dither noise is very low-Q. For configuration %11 (VID), the data is a straight 9 bits with no dithering. The dithering works by taking nine fixed bits from a 32-bit LFSR and sign-extending them to 18 bits. This yields a pseudo-random value ranging from %111111111_100000000 (negative) to %000000000_011111111 (positive) on every clock cycle. When added to the 18-bit source data, the lower 9 bits of source data are realized as a proportional toggling between two adjacent values in the top 9 bits of the sum, which form the DAC output data. It will take at least 512 (2^9) clocks for the DAC output to average to the intended 18-bit source value, assuming source data is static. On cog start, all configurations are cleared to %00 and the source values are set to %000000000_100000000, which is effectively zero, since dithering will never cause an output step toggle when the nine lower source bits are %100000000: source data %XXXXXXXXX_100000000 + minimum dither %111111111_100000000 -------------------- = %XXXXXXXXX_000000000 (top 9 bits are unchanged) source data %XXXXXXXXX_100000000 + maximum dither %000000000_011111111 -------------------- = %XXXXXXXXX_111111111 (top 9 bits are unchanged) Here are the instructions which set DAC0..DAC3 source values in software: SETDAC0 #n - Set DAC0 to %nnnnnnnnn_100000000, force configuration to %00 SETDAC0 D - Set DAC0 to D[31..14], force configuration to %00 * SETDAC1 #n - Set DAC1 to %nnnnnnnnn_100000000, force configuration to %00 SETDAC1 D - Set DAC1 to D[31..14], force configuration to %00 * SETDAC2 #n - Set DAC2 to %nnnnnnnnn_100000000, force configuration to %00 SETDAC2 D - Set DAC2 to D[31..14], force configuration to %00 * SETDAC3 #n - Set DAC3 to %nnnnnnnnn_100000000, force configuration to %00 SETDAC3 D - Set DAC3 to D[31..14], force configuration to %00 * SETDACS #n - Set DAC3..DAC0 to %nnnnnnnnn_100000000 Force DAC3..DAC0 configurations to %00 SETDACS D - Set DAC3 to %dddddddd0_100000000, where dddddddd is D[31..24] Set DAC2 to %dddddddd0_100000000, where dddddddd is D[23..16] Set DAC1 to %dddddddd0_100000000, where dddddddd is D[15..8] Set DAC0 to %dddddddd0_100000000, where dddddddd is D[7..0] Force DAC3..DAC0 configurations to %00 * Be aware when using SETDACx D, that if D < $00400000 or D > $FFC03FFF, full- scale toggling will occur, as the dither addition will cause wrapping. For ground-based DAC output, you can add $00400000 to each output sample to prevent this from happening. VIDEO ----- Each cog has a video generator (VID) that can stream pixel data and perform colorspace conversion and modulation, so that final video signals can be output to the 75-ohm DACs on the I/O pins. Pixel streaming, colorspace conversion, modulation, DAC channel driving, and DAC pin updating are all performed in a pipelined fashion on each cycle of VID's dot clock. VID gets it dot clock from CTRA's PLL. CTRA must be configured for PLL operation in order for VID to operate. The DAC channel(s) must be configured for video output by using CFGDAC0..CFGDAC3 or CFGDACS. To set all DAC channels to video, do 'CFGDACS #%11_11_11_11'. The I/O pins which will output the DAC channels must be configured to do so via CFGPINS. To turn on VID and configure its DAC channel outputs, the SETVID instruction is used: SETVID D/# - Set video configuration register (VCFG) %00xxxxx = off (default) SIG3 SIG2 SIG1 SIG0 ---------------------------- %01xxxxx = SDTV/HDTV/VGA Y_R I_G Q_B SYN %10xxxxx = NTSC/PAL S-VIDEO YIQ YIQ _IQ Y__ %11xxxxx = NTSC/PAL COMPOSITE YIQ YIQ YIQ YIQ %xx0xxxx = zero-extend Y/I/Q coefficients for VGA colorspace (allows +$80, or '+1.0') %xx1xxxx = sign-extend Y/I/Q coefficients for NTSC/PAL/SDTV/HDTV colorspace %xxx0xxx = no sync on Y_R (VGA) %xxx1xxx = sync on Y_R (SDTV/HDTV) %xxxx0xx = no sync on I_G (VGA) %xxxx1xx = sync on I_G (SDTV/HDTV) %xxxxx0x = no sync on Q_B (VGA) %xxxxx1x = sync on Q_B (SDTV/HDTV) %xxxxxx0 = positive sync on SYN (VGA) %xxxxxx1 = negative sync on SYN (VGA) Before any meaningful video signals can be output, you must set the colorspace coefficients and offset levels, which are each 8 bits: SETVIDY D/# - Set Y_R's offset level and RGB colorspace coefficients: $YO_YR_YG_YB SETVIDI D/# - Set I_G's offset level and RGB colorspace coefficients: $IO_IR_IG_IB SETVIDQ D/# - Set Q_B's offset level and RGB colorspace coefficients: $QO_QR_QG_QB All pixels are internally handled by VID as 8:8:8 bit R:G:B data. Colorspace conversion is performed as sum-of-products calculations on the R:G:B pixel data and the colorspace coefficients, yielding Y, I, and Q components: Where R, G, B are 8-bit pixel color components and Y, I, Q are 9-bit sums (MOD 512): Y = (R*YR + G*YG + B*YB)/64 Where YR, YG, YB are 8-bit Y coefficients I = (R*IR + G*IG + B*IB)/64 Where IR, IG, IB are 8-bit I coefficients Q = (R*QR + G*QG + B*QB)/64 Where QR, QG, QB are 8-bit Q coefficients For outputs Y_R, I_G, and Q_B, offset levels are added to the Y, I, and Q components to properly position the final signals for SDTV/HDTV. In the case of VGA outputs, the offset levels are set to 0, since they are ground-based. For modulated outputs YIQ and _IQ, the I and Q components, treated as (I,Q), are rotated around (0,0) by an angle that steps 1/16th of a revolution on each dot clock, yielding Q'. In the case of YIQ output, the Y component (luma) and Q' (chroma) are added to form a composite video signal. In the case of _IQ output, an offset level is added to Q' to form an s-video chroma signal. For Y__ output, the Y component (luma) is output alone to form an s-video luma signal. Below are some common colorspace coefficient sets. Note that these values are normalized to 1.0. In the sum-of-products calculations, 128 is equal to 1.0, so the values below should all be multiplied by 128 to get the proper 8-bit values for usage as coefficients. In practice, the values will need to be scaled down so that under 75-ohm load, they will peak at 1.0V (not 1.65V, which is 3.3V/2). This scaling will compromise DAC span by ~39%, leaving you with a still-sufficient ~8.3 bits of DAC resolution. However, if you'd like to keep DAC span maximal, you may leave the coefficients as originally computed and achieve the proper voltage under load by using an external voltage divider made from two resistors, being sure to maintain the 75 ohms source impedance. coefficient positions ----------------------- YR YG YB IR IG IB QR QG QB ----------------------- RGB (VGA) VCFG[4]=0 ----------------------- 1 0 0 R sums to 1 0 1 0 G sums to 1 0 0 1 B sums to 1 ----------------------- YPbPr (HDTV) VCFG[4]=1 x128 ----------------------- ------------- +.213 +.715 +.072 Y sums to 1 +27 +92 +9 -.115 -.385 +.500 Pb sums to 0 -15 -49 +64 +.500 -.454 -.046 Pr sums to 0 +64 -58 -6 ----------------------- YPbPr (SDTV) VCFG[4]=1 ----------------------- +.299 +.587 +.114 Y sums to 1 -.169 -.331 +.500 Pb sums to 0 +.500 -.419 -.081 Pr sums to 0 ----------------------- YIQ (NTSC) VCFG[4]=1 ----------------------- +.299 +.587 +.114 Y sums to 1 +.596 -.274 -.322 I sums to 0 * +.212 -.523 +.311 Q sums to 0 * ----------------------- YUV (PAL) VCFG[4]=1 ----------------------- +.299 +.587 +.114 Y sums to 1 -.147 -.289 +.436 U sums to 0 * +.615 -.515 -.100 V sums to 0 * ----------------------- * These sets of three coefficients must be scaled by 0.608 to pre-compensate for CORDIC rotator expansion which will occur in the video modulator. Once VID is configured, WAITVID instructions are used to issue contiguous commands to keep the pixel streamer busy: WAITVID --> pixel streamer --> colorspace/modulator --> DAC signals --> I/O pins VID double-buffers WAITVID commands to relax WAITVID timing requirements. In single-task mode (on cog start or after 'SETTASK zero'), WAITVID will stall the pipeline as it waits for VID to take the command. In multi-task mode (after 'SETTASK nonzero'), WAITVID will keep jumping back to itself until VID takes the command, in order to free up clock cycles for other tasks. In either case, the POLVID instruction may be used to test whether or not VID is ready for another command, in which case WAITVID will release immediately, taking only one clock. POLVID WC - Check if VID ready for another WAITVID, C=1 if ready Here is the WAITVID instruction: WAITVID D/#,S/# - Wait for VID ready, then give next command via D and S When WAITVID executes, the D and S values are captured by VID and used for the duration of the command. The WAITVID instruction has special encoding so that immediate D values can range from 0 to 3583, or $DFF. These large immediate D values are helpful in reducing code size when issuing WAITVIDs that generate sync signals. The D operand of WAITVID has four fields: %AAAAAAAA_MMMM_PPPPPPP_CCCCCCCCCCCCC %AAAAAAAA = AUX base address for pixel lookup (0..255) %MMMM = pixel mode (0..15), elaborated below %PPPPPPP = number of dot clocks per pixel (1..127, 0 acts as 128) %CCCCCCCCCCCCC = number of dot clocks in WAITVID (1..8191, 0 acts as 8192) The D operand's %MMMM field determines which pixel mode will be used for the WAITVID and what the S operand will be used for: %0000 = LIT_RGBS32 - S is used as a literal 8:8:8:8 bit R:G:B:SYNC pixel. This is the only mode which can generate sync signals. In this mode, only the %CCCCCCCCCCCCC bits of D are used, so all other bits can be 0. %0001 = CLU1_RGB24 - 32 1-bit offsets in S lookup 8:8:8 pixel longs in AUX %0010 = CLU2_RGB24 - 16 2-bit offsets in S lookup 8:8:8 pixel longs in AUX %0011 = CLU4_RGB24 - 8 4-bit offsets in S lookup 8:8:8 pixel longs in AUX %0100 = CLU8_RGB24 - 4 8-bit offsets in S lookup 8:8:8 pixel longs in AUX %0101 = CLU8_RGB15 - 4 8-bit offsets in S lookup 5:5:5 pixel words in AUX %0110 = CLU8_RGB16 - 4 8-bit offsets in S lookup 5:6:5 pixel words in AUX The CLUx modes use the 1/2/4/8-bit fields of S, lowest field first, as offsets for looking up pixels in AUX, starting at %AAAAAAAA. Upon completion of each pixel, the next higher bit field is used, with the highest field repeating. For CLU1_RGB24..CLU8_RGB24, the 1/2/4/8-bit fields are used as long offsets into AUX, yielding 8:8:8 pixel data from AUX data bits 23..0. For CLU8_RGB15 and CLU8_RGB16, bits 7..1 of each 8-bit field are used as the long offset into AUX, while bit 0 selects the low or high word containing the 5:5:5 (LSB-justified) or 5:6:5 pixel data. %0111 = STR1_RGB9 - 1-bit pixels streamed from AUX select between 3:3:3 colors in S[17..9] and S[26..18]. The stream start address in AUX is %AAAAAAAA plus S[7..0], with S[31..27] selecting the starting bit. %1000 = STR4_RGBI4 - 4-bit pixels are streamed from AUX starting at %AAAAAAAA plus S[7..0], with S[31..29] selecting the starting nibble. The pixels are colored as: %0000 = black %0001 = dark grey %0010 = dark blue %0011 = bright blue %0100 = dark green %0101 = bright green %0110 = dark cyan %0111 = bright cyan %1000 = dark red %1001 = bright red %1010 = dark magenta %1011 = bright magenta %1100 = olive %1101 = yellow %1110 = light grey %1111 = white %1001 = STR4_LUMA4 - 4-bit pixels are streamed from AUX starting at %AAAAAAAA plus S[7..0], with S[31..29] selecting the starting nibble. The pixels are used as brightness values for colors determined by S[11..9]: %000 = black..orange %001 = black..blue %010 = black..green %011 = black..cyan %100 = black..red %101 = black..magenta %110 = black..yellow %111 = black..white %1010 = STR8_RGBI8 - 8-bit pixels are streamed from AUX starting at %AAAAAAAA plus S[7..0], with S[31..30] selecting the starting byte. The pixels are colored as: $00..$1F = black..orange $20..$3F = black..blue $40..$5F = black..green $60..$7F = black..cyan $80..$9F = black..red $A0..$BF = black..magenta $C0..$DF = black..yellow $E0..$FF = black..white %1011 = STR8_LUMA8 - 8-bit pixels are streamed from AUX starting at %AAAAAAAA plus S[7..0], with S[31..30] selecting the starting byte. The pixels are used as brightness values for colors determined by S[11..9]: %000 = black..orange %001 = black..blue %010 = black..green %011 = black..cyan %100 = black..red %101 = black..magenta %110 = black..yellow %111 = black..white %1100 = STR8_RGB8 - 8-bit 3:3:2 pixels are streamed from AUX starting at %AAAAAAAA plus S[7..0], with S[31..30] selecting the starting byte. %1101 = STR16_RGB15 - 15-bit 5:5:5 pixels are streamed from AUX starting at %AAAAAAAA plus S[7..0], with S[31] selecting the starting word. %1110 = STR16_RGB16 - 16-bit 5:6:5 pixels are streamed from AUX starting at %AAAAAAAA plus S[7..0], with S[31] selecting the starting word. %1111 = STR32_RGB24 - 24-bit 8:8:8 pixels are streamed from AUX starting at %AAAAAAAA plus S[7..0]. For outputting SYNC signals, the LIT_RGBS32 mode must be used. Because WAITVID's D can be an immediate value up to 3583, and because S values that generate sync all fit within 9 bits, any fixed sync pattern can be coded directly with a few 'WAITVID #D,#S' instructions. DAC channel outputs (9 bits each, MOD 512) according to S input using LIT_RGBS32 mode -------------------------------------------------------------------------------------------------------- Y_R %RRRRRRRR_GGGGGGGG_BBBBBBBB_xxxxxxxx = YO*2 + Y component/vga pixel VCFG[3] = 0 %RRRRRRRR_GGGGGGGG_BBBBBBBB_SSSSSSSS = YO*2 + Y + SSSSSSSS*2 component sync VCFG[3] = 1 I_G %RRRRRRRR_GGGGGGGG_BBBBBBBB_xxxxxxxx = IO*2 + I component/vga pixel VCFG[2] = 0 %RRRRRRRR_GGGGGGGG_BBBBBBBB_SSSSSSSS = IO*2 + I + SSSSSSSS*2 component sync VCFG[2] = 1 Q_B %RRRRRRRR_GGGGGGGG_BBBBBBBB_xxxxxxxx = QO*2 + Q component/vga pixel VCFG[1] = 0 %RRRRRRRR_GGGGGGGG_BBBBBBBB_SSSSSSSS = QO*2 + Q + SSSSSSSS*2 component sync VCFG[1] = 1 SYN %xxxxxxxx_xxxxxxxx_xxxxxxxx_xxxxxxx0 = VCFG[0]*511 vga sync unasserted %xxxxxxxx_xxxxxxxx_xxxxxxxx_xxxxxxx1 = !VCFG[0]*511 vga sync asserted Y__ %RRRRRRRR_GGGGGGGG_BBBBBBBB_xxxxxx00 = YO*2 + Y s-video luma pixel %xxxxxxxx_xxxxxxxx_xxxxxxxx_xxxxxx01 = IO*2 s-video luma sync high %xxxxxxxx_xxxxxxxx_xxxxxxxx_xxxxxx1x = 0 s-video luma sync low _IQ %xxxxxxxx_xxxxxxxx_xxxxxxxx_xxxxxxxx = QO*2 + Q' s-video chroma YIQ %RRRRRRRR_GGGGGGGG_BBBBBBBB_xxxxxx00 = YO*2 + Y + Q' composite pixel %xxxxxxxx_xxxxxxxx_xxxxxxxx_xxxxxx01 = IO*2 + Q' composite sync high %xxxxxxxx_xxxxxxxx_xxxxxxxx_xxxxxx1x = Q' composite sync low The following example programs display luma-graduated color bars in various output modes: simple_VGA_1280x1024.spin simple_VGA_800x600.spin simple_VGA_640x480.spin simple_HDTV_1920x1080p.spin simple_HDTV_1280x720p.spin simple_NTSC_256x192.spin TEXTURE MAPPER -------------- Each cog has a texture mapper (PIX) which can navigate a rectangular 2D texture with Z-perspective correction to locate a texture pixel, translate that texture pixel into A:R:G:B (Alpha:Red:Green:Blue) pixel data, perform discrete scaling on those A:R:G:B components, and then mix the resulting pixel with another pixel for multi-layered 3D effects. A texture is stored in register RAM as a sequence of 1/2/4/8-bit texture pixels which build from the bottom bits of an initial register, upwards, and then into subsequent registers. They are ordered, in contiguous sequence, from top-left to top-right down to bottom-left to bottom-right. These texture pixels get used as offsets into AUX to look up A:R:G:B pixel data which may be either 8:8:8:8 bits (long) or 1:5:5:5 bits (word). Texture width and height are individually settable to 1/2/4/8/16/32/64/128 pixel(s). To configure PIX, the SETPIX instruction is used: SETPIX D/# - Set PIX configuration to %WWW_HHH_PP_S_H_V_xxxx_AAAAAAAA_RRRRRRRRR %WWW = texture map width, %HHH = texture map height %000 = 1 pixel %001 = 2 pixels %010 = 4 pixels %011 = 8 pixels %100 = 16 pixels %101 = 32 pixels %110 = 64 pixels %111 = 128 pixels %PP = texture pixel size %00 = 1 bit %01 = 2 bits %10 = 4 bits %11 = 8 bits %S = AUX pixel data size %0 = long, 8:8:8:8 bit A:R:G:B data %1 = word, 1:5:5:5 bit A:R:G:B data (gets expanded to 8:8:8:8) %H = horizontal mirroring %0 = OFF, image repeats when U'[15] is 1 %1 = ON, image mirrors when U'[15] is 1 %V = vertical mirroring %0 = OFF, image repeats when V'[15] is 1 %1 = ON, image mirrors when V'[15] is 1 %AAAAAAAA = base address in AUX of A:R:G:B pixel data %RRRRRRRRR = base address in register RAM of texture pixels Aside from SETPIX, which configures PIX's base metrics, there are seven other instructions which establish initial values and deltas for the Z perspective, U/V texture coordinates, and A/R/G/B scalers. These instructions are likely to be used before every sequence of GETPIX instructions. They each set the value of their respective 16-bit parameter to the high word of the operand, while the low word sets the 16-bit delta which gets added to the parameter upon every GETPIX instruction: SETPIXZ D/# - Set {Z,DZ} to D/# SETPIXU D/# - Set {U,DU} to D/# SETPIXV D/# - Set {V,DV} to D/# SETPIXA D/# - Set {A,DA} to D/# SETPIXR D/# - Set {R,DR} to D/# SETPIXG D/# - Set {G,DG} to D/# SETPIXB D/# - Set {B,DB} to D/# These instructions can be used to establish two settings at a time: SETPIX0 D/#,S/# - Set config to D/# and {Z,DZ} to S/# SETPIX1 D/#,S/# - Set {U,DU} to D/# and {V,DV} to S/# SETPIX2 D/#,S/# - Set {A,DA} to D/# and {R,DR} to S/# SETPIX3 D/#,S/# - Set {G,DG} to D/# and {B,DB} to S/# Once PIX is configured and initial parameters are set, the GETPIX instruction may be used to look up the current texture pixel, scale its A/R/G/B components, mix it with a pixel in D, and update the U/V/Z/A/R/G/B parameters with their deltas. GETPIX only works in single-task programs, as it requires 3 clocks in pipeline stages 2 and 3: WAIT #3 'ready pipeline, GETPIX needs 3 clocks in pipeline stage 2 WAIT #3 'ready pipeline, GETPIX needs 3 clocks in pipeline stage 3 GETPIX pixel 'execute GETPIX, GETPIX takes 3 clocks in pipeline stage 4 To make GETPIX more efficient, it can be repeated using REPD to perform a sequence of pixel operations, taking only 3 clocks per pixel: REPD #64,#1 'render 64 texture pixels and blend them with 'pixels' SETINDA #pixels 'point INDA to pixels WAIT #3 'ready pipeline, 3 clocks in initial pipeline stage 2 WAIT #3 'ready pipeline, 3 clocks in initial pipeline stage 3 GETPIX INDA++ 'execute GETPIX, 3 clocks per repeating GETPIX As GETPIX executes, the following sequence occurs over three pipeline stages: In pipeline stage 2: Z-perspective correction * ------------------------ Z' = 256 - Z[31..24] U' = (U[31..16] / Z') MOD 256 V' = (V[31..16] / Z') MOD 256 A texture pixel is read from register RAM at texture location (U',V'), with the U' and V' top-most bits being used as coordinates. For example, if the texture size is 32x8, then the top 5 bits of U' and the top 3 bits of V' would be used to locate the texture pixel. parameter updating ------------------ Z = Z + DZ U = U + DU V = V + DV In pipeline stage 3: The texture pixel is used as an offset to look up A:R:G:B pixel data in AUX. If the AUX data is a word (1:5:5:5 bit A:R:G:B), the fields get expanded so that %A_BCDEF_GHIJK_LMNOP becomes %AAAAAAAA_BCDEFBCD_GHIJKGHI_LMNOPLMN. If the AUX data is a long (8:8:8:8 bit A:R:G:B), it is used directly. These expanded or direct 8:8:8:8 bit fields become TA:TR:TG:TB. In pipeline stage 4: pixel scaling ------------- A' = (TA * A[31..24] + 255) / 256 R' = (TR * R[31..24] + 255) / 256 G' = (TG * G[31..24] + 255) / 256 B' = (TB * B[31..24] + 255) / 256 parameter updating ** ------------------ A = A + DA R = R + DR G = G + DG B = B + DB pixel mixing ------------ A':R':G':B' is mixed with the pixel in D according to the MIX configuration If WC is used with GETPIX, C will return 1 if A' is not 0. * Note that if Z[31..24] = 0, no scaling occurs, or (U',V') = (U[31..24],V[31..24]). The bigger Z[31..24] gets, the more compressed the texture rendering becomes, until when Z[31..24] = 255, (U',V') = (U[23..16],V[23..16]). ** A/R/G/B are actually updated in pipeline stage 2, but their original values are propagated to pipeline stage 4. The following program provides a simplistic example of how PIX is used: texture_NTSC_256x192.spin PIXEL MIXER ----------- Each cog has a pixel mixer called MIX that can combine two pixels in a sum-of-products operation, where: inputs: DA = D pixel A component (8 bits) DR = D pixel R component (8 bits) DG = D pixel G component (8 bits) DB = D pixel B component (8 bits) SA = S pixel A component or GETPIX A' component (8 bits) SR = S pixel R component or GETPIX R' component (8 bits) SG = S pixel G component or GETPIX G' component (8 bits) SB = S pixel B component or GETPIX B' component (8 bits) outputs: A' = ((DA * DAX + SA * SAX + 255) / 256) max 255 R' = ((DR * DRX + SR * SRX + 255) / 256) max 255 G' = ((DG * DGX + SG * SGX + 255) / 256) max 255 B' = ((DB * DBX + SB * SBX + 255) / 256) max 255 The DAX/DRX/DGX/DBX/SAX/SRX/SGX/SBX terms determine the type of mixing that will be done. The terms are configurable for the MIXPIX/GETPIX instructions, but fixed for the others: ADDPIX D,S/# - Add and clamp A:R:G:B components into D DAX = $FF SAX = $FF DRX = $FF SRX = $FF DGX = $FF SGX = $FF DBX = $FF SBX = $FF MULPIX D,S/# - Multiply A:R:G:B components into D DAX = SA SAX = $00 DRX = SR SRX = $00 DGX = SG SGX = $00 DBX = SB SBX = $00 BLNPIX D,S/# - Blend A:R:G:B components by SA into D DAX = !SA SAX = SA DRX = !SA SRX = SA DGX = !SA SGX = SA DBX = !SA SBX = SA Here is the general-purpose MIXPIX instruction: MIXPIX D,S/# - Mix A:R:G:B components according to SETMIX into D To configure for MIXPIX/GETPIX usage, the SETMIX instruction is used: SETMIX D/#,S/# - Set MIX configuration to D/#[8..0], S/#[31..0] D/#[8..0] sets M - initialized to $001 * S/#[31..24] sets DAB - initialized to $00 S/#[23..16] sets DCB - initialized to $00 S/#[15..8] sets SAB - initialized to $FF * S/#[7..0] sets SCB - initialized to $00 M[8] = 0 for long mode, where D and S pixels are 8:8:8:8 bit A:R:G:B M[8] = 1 for word mode, where D and S pixels are 1:5:5:5 bit A:R:G:B 1:5:5:5 pixels are expanded so that %A_BCDEF_GHIJK_LMNOP becomes %AAAAAAAA_BCDEFBCD_GHIJKGHI_LMNOPLMN for the mixing computation. When being packed back down to 1:5:5:5 bit A:R:G:B, the single A bit will be 1 if the resultant A was not 0, and the R:G:B fields will be set to the top 5 bits of the resultant R:G:B. In word mode, the low word in D will be operated on and the words will be swapped, leaving the mixed pixel in the new high word and the old high word in the new low word. Also, pixel data from S will be taken alternately from the low and high word with each operation, with SETMIX resetting the selector to the low word. Word mode affects all ADDPIX/MULPIX/BLNPIX/GETMIX/GETPIX. M field 000 001 010 011 100 101 110 111 -------------------------------------------------------------- M[7] DAX = DAB SA M[6..4] DRX = $00 $FF SA !SA DA !DA DCB SR M[6..4] DGX = $00 $FF SA !SA DA !DA DCB SG M[6..4] DBX = $00 $FF SA !SA DA !DA DCB SB -------------------------------------------------------------- M[3] SAX = SAB DA M[2..0] SRX = $00 $FF SA !SA DA !DA SCB DR M[2..0] SGX = $00 $FF SA !SA DA !DA SCB DG M[2..0] SBX = $00 $FF SA !SA DA !DA SCB DB * M and SAB are initialized on cog start so that GETPIX will return the scaled A:R:G:B texture pixel without any blending. The PIXADD/PIXMUL/PIXBLN/PIXMIX instructions all take 2 clocks, while GETPIX takes 3 clocks. PIN TRANSFER ------------ Each cog has a pin transfer (XFR) which can automatically move data between pins and WIDEs/AUX, in the background, while instructions execute normally. XFR is configured with the SETXFR instruction: SETXFR D/# - Set XFR configuration to %E_MMM_PPP %E = enable %0 = off (initial state after cog start) %1 = on %MMM = mode %000 = WIDEs_to_16_pins %001 = WIDEs_to_32_pins %010 = AUX_to_16_pins %011 = AUX_to_32_pins %100 = 16_pins_to_WIDEs %101 = 32_pins_to_WIDEs %110 = 16_pins_to_AUX %111 = 32_pins_to_AUX %PPP = pin group %000 = pins 15..0 for 16-pin modes, pins 31..0 for 32-pin modes %001 = pins 31..16 for 16-pin modes, pins 31..0 for 32-pin modes %010 = pins 47..32 for 16-pin modes, pins 63..32 for 32-pin modes %011 = pins 63..48 for 16-pin modes, pins 63..32 for 32-pin modes %100 = pins 79..64 for 16-pin modes, pins 95..64 for 32-pin modes %101 = pins 95..80 for 16-pin modes, pins 95..64 for 32-pin modes %110 = pins 111..96 for 16-pin modes, pins 127..96 for 32-pin modes %111 = pins 127..112 for 16-pin modes, pins 127..96 for 32-pin modes For WIDEs_to_16_pins mode (%000), on the cycle after SETXFR is executed, the following 8-clock pattern begins and then repeats indefinitely: 1st clock: WIDE0 low word is output to pins 2nd clock: WIDE0 high word is output to pins 3rd clock: WIDE1 low word is output to pins 4th clock: WIDE1 high word is output to pins 5th clock: WIDE2 low word is output to pins 6th clock: WIDE2 high word is output to pins 7th clock: WIDE3 low word is output to pins 8th clock: WIDE3 high word is output to pins 9th clock: WIDE4 low word is output to pins 10th clock: WIDE4 high word is output to pins 11th clock: WIDE5 low word is output to pins 12th clock: WIDE5 high word is output to pins 13th clock: WIDE6 low word is output to pins 14th clock: WIDE6 high word is output to pins 15th clock: WIDE7 low word is output to pins 16th clock: WIDE7 high word is output to pins For WIDEs_to_32_pins mode (%001), on the cycle after SETXFR is executed, the following 4-clock pattern begins and then repeats indefinitely: 1st clock: WIDE0 is output to pins 2nd clock: WIDE1 is output to pins 3rd clock: WIDE2 is output to pins 4th clock: WIDE3 is output to pins 5th clock: WIDE4 is output to pins 6th clock: WIDE5 is output to pins 7th clock: WIDE6 is output to pins 8th clock: WIDE7 is output to pins For AUX_to_16_pins mode (%010), on the second cycle after SETXFR is executed, the following 2-clock pattern begins and then repeats indefinitely: 1st clock: AUX[SPB] low word is output to pins 2nd clock: AUX[SPB++] high word is output to pins For AUX_to_32_pins mode (%011), on the second cycle after SETXFR is executed, the following 1-clock pattern begins and then repeats indefinitely: 1st clock: AUX[SPB++] is output to pins For 16_pins_to_WIDEs mode (%100), on the cycle after SETXFR is executed, the following 16-clock pattern begins and then repeats indefinitely: 1st clock: pins are sampled into low word 2nd clock: pins are sampled into high word, long is written to WIDE0 3rd clock: pins are sampled into low word 4th clock: pins are sampled into high word, long is written to WIDE1 5th clock: pins are sampled into low word 6th clock: pins are sampled into high word, long is written to WIDE2 7th clock: pins are sampled into low word 8th clock: pins are sampled into high word, long is written to WIDE3 9th clock: pins are sampled into low word 10th clock: pins are sampled into high word, long is written to WIDE4 11th clock: pins are sampled into low word 12th clock: pins are sampled into high word, long is written to WIDE5 13th clock: pins are sampled into low word 14th clock: pins are sampled into high word, long is written to WIDE6 15th clock: pins are sampled into low word 16th clock: pins are sampled into high word, long is written to WIDE7 For 32_pins_to_WIDEs mode (%101), on the cycle after SETXFR is executed, the following 8-clock pattern begins and then repeats indefinitely: 1st clock: pins are sampled and written to WIDE0 2nd clock: pins are sampled and written to WIDE1 3rd clock: pins are sampled and written to WIDE2 4th clock: pins are sampled and written to WIDE3 5th clock: pins are sampled and written to WIDE4 6th clock: pins are sampled and written to WIDE5 7th clock: pins are sampled and written to WIDE6 8th clock: pins are sampled and written to WIDE7 For 16_pins_to_AUX mode (%110), on the cycle after SETXFR is executed, the following 2-clock pattern begins and then repeats indefinitely: 1st clock: pins are sampled into low word 2nd clock: pins are sampled into high word, long is written to AUX[SPB++] For 32_pins_to_AUX mode (%111), on the cycle after SETXFR is executed, the following 1-clock pattern begins and then repeats indefinitely: 1st clock: pins are sampled and written to AUX[SPB++] While an AUX_to_pins or pins_to_AUX mode is active, you should not read or write AUX or modify SPB, as such attempts will likely interfere with XFR operation and cause unexpected results. VID, however, has an asynchronous second port to AUX, so it can, for example, stream pixels out at the same time XFR streams them in. To stop XFR, execute 'SETXFR #0' on the last cycle of desired XFR operation. An example of XFR usage is in the following program: balls.spin BIG MULTIPLIER -------------- Aside from the 1-clock MACA/MACB instructions and the 2-clock MUL/SCL instructions which perform 20x20-bit signed multiplications, each cog has a separate, larger multiplier that can do 32x32-bit signed or unsigned multiplication while other instructions execute. To start a 32x32-bit multiply, execute one of the following: MUL32 D/#,S/# - Begin 32x32-bit signed multiply of D/# and S/# MUL32U D/#,S/# - Begin 32x32-bit unsigned multiply of D/# and S/# You'll have 17 clock cycles to execute other code, if you wish, before GETMULL/GETMULH will return the low/high long(s) of the result: GETMULL D - Get low long of result GETMULH D - Get high long of result In single-task mode, GETMULL/GETMULH will stall the pipeline until the result is ready. In multi-task mode, GETMULL/GETMULH will jump to themselves until the result is ready, freeing clocks for other tasks. BIG DIVIDER ----------- Each cog has a 64-over-32-bit divider which can perform signed and unsigned divides, as well as calculate 32-bit fractions, while other instructions execute. For signed divides, the remainder result will have the sign of the numerator. Both the quotient and the remainder results are 32 bits. To start a 32/32-bit divide, execute one of the following: DIV32 D/#,S/# - Begin 32/32-bit signed divide of D/# over S/# DIV32U D/#,S/# - Begin 32/32-bit unsigned divide of D/# over S/# To start a 64/32-bit divide, first set the denominator: DIV64D D/# - Set the 32-bit denominator to D/# Then execute one of the following: DIV64 D/#,S/# - Set the 64-bit numerator to {S/#,D/#} and begin signed divide DIV64U D/#,S/# - Set the 64-bit numerator to {S/#,D/#} and begin unsigned divide To start a 32-bit fraction calculation, use FRAC: FRAC D/#,S/# - Begin calculating the unsigned fraction of D/# over S/#, where D/# and S/# are unsigned 32-bit values and D/# is less than S/#. Use GETDIVQ to get the result. Examples: FRAC #1,#2 yields $80000000 (1/2 of $1_00000000) FRAC #1,#3 yields $55555555 (1/3 of $1_00000000) FRAC #1,#4 yields $40000000 (1/4 of $1_00000000) FRAC #15,#16 yields $F0000000 FRAC $80000000,$90000000 yields $E38E38E3 FRAC 31_250,80_000_000 yields $00199999 After starting the divider, you'll have 17 clocks cycles to execute other code, if you wish, before GETDIVQ/GETDIVR will return the quotient/remainder long(s) of the result: GETDIVQ D - Get quotient result GETDIVR D - Get remainder result In single-task mode, GETDIVQ/GETDIVR will stall the pipeline until the result is ready. In multi-task mode, GETDIVQ/GETDIVR will jump to themselves until the result is ready, freeing clocks for other tasks. SQUARE ROOTER ------------- Each cog has a 32/64-bit square root calculator which can compute square roots from unsigned values, while other instructions execute. To start a square root computation, execute one of the following: SQRT32 D/# - Begin computing square root of 32-bit unsigned D/# SQRT64 D/#,S/# - Begin computing square root of 64-bit unsigned {S/#,D/#} In the case of SQRT32, you'll have 16 clock cycles to execute other code, if you wish, or 32 clock cycles in the case of SQRT64, before GETSQRT will return the result: GETSQRT D - Get root result In single-task mode, GETSQRT will stall the pipeline until the result is ready. In multi-task mode, GETSQRT will jump to itself until the result is ready, freeing clocks for other tasks. CORDIC ENGINE ------------- Each cog has a CORDIC engine which can perform trigonometric, logarithmic, exponential, and hyperbolic functions while other instructions execute. Here are the instructions associated with the CORDIC engine: QLOG D/# - Compute logarithm of D/# (unsigned number -> log-base-2) QEXP D/# - Compute exponential of D/# (log-base-2 -> unsigned number) QSINCOS D/#,S/# - Compute sine and cosine of D/# with amplitude S/# (polar -> cartesian) QARCTAN D/#,S/# - Compute distance and angle of (D/#,S/#) to (0,0) (cartesian -> polar) SETQZ D/# - Set CORDIC Z to D/#, used to set angle before QROTATE QROTATE D/#,S/# - Rotate (D/#,S/#) around (0,0) by an angle GETQX D - Get CORDIC X result GETQY D - Get CORDIC Y result GETQZ D - Get CORDIC Z result SETQI D/# - Set CORDIC trigonometric/hyperbolic and iteration modes In single-task mode, GETQX/GETQY/GETQZ will stall the pipeline until the result is ready. In multi-task mode, GETQX/GETQY/GETQZ will jump to themselves until the result is ready, freeing clocks for other tasks. QLOG/QEXP usage: To convert between 32-bit unsigned numbers and 32-bit log values, use QLOG or QEXP to set the input term and begin the computation. Then do GETQZ to get the result. Log values are encoded with the whole exponent in the top 5 bits and the fractional exponent in the bottom 27 bits. Here are some examples of numbers converted to log values, then back to numbers again using QLOG and QEXP: number -> QLOG -> QEXP --------------------------------- $00000000 $00000000 $00000001 (0 same as 1) $00000001 $00000000 $00000001 $00000002 $08000000 $00000002 $00000003 $0CAE00D2 $00000003 $00000004 $10000000 $00000004 $00000005 $12934F09 $00000005 $07ADCBD8 $D786F595 $07ADCBD9 (first lossy bidirectional conversion, +1) $20000000 $E8000000 $20000000 $40000000 $F0000000 $40000000 $80000000 $F8000000 $80000000 $FFFFFFFF $FFFFFFFF $FFFFFFE9 (last lossy bidirectional conversion, -22) QSINCOS/QARCTAN/QROTATE usage: For the circular functions, angles are 32-bits and roll over at 360-degrees: $00000000 = 0 degrees (360 * $00000000 / $1_00000000) $00000001 = ~0.000000083819 degrees (360 * $00000001 / $1_00000000) $00B60B61 = ~1 degree (360 * $00B60B61 / $1_00000000) $20000000 = 45 degrees (360 * $20000000 / $1_00000000) $40000000 = 90 degrees (360 * $40000000 / $1_00000000) $80000000 = 180 degrees (360 * $80000000 / $1_00000000) $C0000000 = 270 degrees (360 * $C0000000 / $1_00000000) $FFFFFFFF = ~359.9999999162 degrees (360 * $FFFFFFFF / $1_00000000) The X and Y inputs to the circular functions are signed 30-bit values, ranging from -$2000_0000..+$1FFF_FFFF, conveyed by D and S (top two bits are ignored). No matter the sizes of X and Y, the pair is internally MSB-justified to achieve maximal precision during the CORDIC iterations, after which they are shifted back down and rounded to form the X and Y results. The circular functions will return X and Y results that are scaled by constant K, which is ~1.64676025812 for trigonometric mode or ~0.82815936096 for hyperbolic mode. This CORDIC scaling can be compensated for, if necessary, by pre- or post-scaling X and/or Y by 1/K. To compute sine and cosine simultaneously, the 'QSINCOS D/#,S/#' instruction can be used, with the angle supplied in D/# and the amplitude in S/#. Immediate values of S/# are special cases which produce the following amplitudes, where n is the immediate value: #$00..$1F produces +/- 2^(n[4..0]-1) #$20..$3F produces +/- 2^(n[4..0]-1) * 255/256 #$40..$5F produces +/- 2^(n[4..0]-1) * 7/8 #$60..$7F produces +/- 2^(n[4..0]-1) * 3/4 For example, #$09 will yield results ranging from -$100..$100 and #$29 will yield results ranging from -$FF..$FF. Use GETQX and GETQY to retrieve the cosine and sine results. To convert an (X,Y) coordinate into a distance and angle relative to (0,0), do 'QARCTAN D/#,S/#' with the X in D/# and the Y in S/#. Use GETQX to get the distance and GETQZ to get the angle. To rotate an (X,Y) coordinate around (0,0), first do SETQZ to set the rotation angle, then do 'QROTATE D/#,S/#', with the X in D/# and the Y in S/#. Use GETQX and GETQY to retrieve the rotated (X,Y) coordinate. CORDIC modes: The SETQI instruction is used to switch between trigonometric and hyperbolic modes, and to select between adaptive and fixed iterations: SETQI D/# - Set CORDIC configuration to %M_IIIII (%0_00000 on cog start) %M = mode %0 = trigonometric (K = ~1.64676025812) %1 = hyperbolic (K = ~0.82815936096) %IIIII = iterations %00000 = adaptive iterations (adaptive resolution, variable time) %00001..%11111 = 1..31 fixed iterations (fixed resolution, constant time) Hyperbolic mode changes the functionality of the QSINCOS/QARCTAN/QROTATE instructions so that hyperbolics can be computed. When in hyperbolic mode, the CORDIC engine uses different internal constants to track the angle, it skips the zeroth iteration, and the fourth and thirteenth iterations are repeated to ensure convergence. Hence, K differs between trigonometric and hyperbolic modes, as well as clock cycles. When %IIIII is %00000, the CORDIC engine selects an iteration count based on the magnitude of the X and Y inputs to ensure an efficient computation which preserves initial precision. For very exact QARCTAN computations, setting %IIIII to %11111 will ensure calculator-like precision, even though (X,Y) may be small. In some cases, you may want to fix the iteration count to ensure good-enough precision, but with budgeted or exact timing. CORDIC timing: Here is a table that shows how many free clocks are available for other instructions to execute between QLOG/QEXP/QSINCOS/QARCTAN/QROTATE and GETQX/GETQY/GETQZ: i = %IIIII i = 0 (adaptive) i = 1..31 (fixed) operation clocks free clocks free --------------------------------------------------------------------------- QLOG D/# 35 2 + i + h QEXP D/# 35 2 + i + h Trigonometric mode QSINCOS D/#,#n 2 + n 2 + i QSINCOS D/#,S 5 + mag(abs(D/#) | abs(S)) 3 + i QARCTAN D/#,S/# 5 + mag(abs(D/#) | abs(S/#)) 3 + i QROTATE D/#,S/# 5 + mag(abs(D/#) | abs(S/#)) 3 + i Hyperbolic mode QSINCOS D/#,#n 1 + n + j 1 + i + h QSINCOS D/#,S 4 + mag(abs(D/#) | abs(S)) + k 2 + i + h QARCTAN D/#,S/# 4 + mag(abs(D/#) | abs(S/#)) + k 2 + i + h QROTATE D/#,S/# 4 + mag(abs(D/#) | abs(S/#)) + k 2 + i + h -------------------------------------------------------------------------- h = 0 if i is 0..3 j = 0 if n is 1..3 k = 0 if mag is 0..1 1 if i is 4..12 1 if n is 4..12 1 if mag is 2..10 2 if i is 13..31 2 if n is 13..31 2 if mag is 11..30 MULTIPLY AND ACCUMULATE ----------------------- Each cog has two 64-bit accumulators, ACCA and ACCB, which accumulate products from the MACA/MACB instructions. The accumulators can also be cleared, set to arbitrary values, arithmetically shifted right, and read back. On cog start, ACCA and ACCB are both cleared to $00000000_00000000. The MACA/MACB instructions each perform a 20x20-bit signed multiply and then add the resultant 40-bit product into ACCA or ACCB in a single clock: MACA D/#,S/# - multiply D/#[19..0] by S/#[19..0] and accumulate into ACCA MACB D/#,S/# - multiply D/#[19..0] by S/#[19..0] and accumulate into ACCB By using MACA/MACB with indirect addressing in a REPS/REPD loop, tap-per-clock FIR filters can be realized in a few instructions: FIXINDA #buff+15,#buff 'set circular sample buffer FIXINDB #taps+15,#taps 'set circular tap buffer :loop REPS #16,#1 'ready for 16-tap FIR CLRACCA 'clear ACCA MACA INDB++,INDA++ 'multiply and accumulate buff and taps (16 clocks) GETACCA result 'get result ' 'use result ' 'get new sample MOV --INDA,sample 'enter new sample, buff scrolls against taps JMP #:loop 'loop The accumulators may be cleared by the following instructions: CLRACCA - clear ACCA to $00000000_00000000 CLRACCB - clear ACCB to $00000000_00000000 CLRACCS - clear ACCA and ACCB to $00000000_00000000 The accumulators may be set to arbitrary values by these instructions: SETACCA D/#,S/# - set ACCA to {S/#,D/#} SETACCB D/#,S/# - set ACCB to {S/#,D/#} To make post-MACA/MACB computations simpler, the SARACCA/SARACCB/SARACCS instructions can be used to arithmetically shift the accumulators downward, in order to consolidate their leading bits into the lower long. This shifting can be performed on ACCA and ACCB individually, or together. The SARACCA/SARACCB/SARACCS instructions take 1 clock, but won't execute until 2 clocks after MACA/MACB. So, if SARACCA immediately follows MACA, SARACCA will take 3 clocks: SARACCA D/# - arithmetically right-shift ACCA by D/#[5..0] (0..63) SARACCB D/# - arithmetically right-shift ACCB by D/#[5..0] (0..63) SARACCS D/# - arithmetically right-shift ACCA and ACCB by D/#[5..0] (0..63) To read back the contents of the accumulators, GETACAL/GETACAH/GETACBL/GETACBH instructions are used. These instructions take 1 clock, but won't execute until 2 clocks after MACA/MACB. So, if GETACAL immediately follows MACA, GETACAL will take three clocks: GETACAL D - get lower long of ACCA into D GETACAH D - get upper long of ACCA into D GETACBL D - get lower long of ACCB into D GETACBH D - get upper long of ACCB into D REGISTER REMAPPING ------------------ The SETMAP instruction is used to remap a 2^n-sized block of registers starting at $000, so that direct accesses to those registers will be redirected to a range of identically-sized blocks, which also build from $000. This feature allows a single program to run multiple instances of itself by having unique sets of statically-addressable registers which switch according to either INDB or the current task. When using remapping, you must locate your program code above the last used block of registers which the upper-most block of registers will be remapped to. For example, if you select 8 blocks of 16 registers, but are only using 6 of those blocks, your program code must not start below register 96 (6*16), to avoid encroaching into the registers which are going to be the recipients of remapping. Here is the SETMAP instruction: SETMAP D/# - Configure register remapping to %M_BBB_RRR %M = mode %0 = INDB selects the block %1 = task number selects the block %BBB = block count %000 = 1 block remapping disabled for %000 %001 = 2 blocks remapping enabled for %001..%111 %010 = 4 blocks %011 = 8 blocks %100 = 16 blocks %101 = 32 blocks %110 = 64 blocks %111 = 128 blocks %RRR = register count %000 = 1 register remap $000 %001 = 2 registers remap $000..$001 %010 = 4 registers remap $000..$003 %011 = 8 registers remap $000..$007 %100 = 16 registers remap $000..$00F %101 = 32 registers remap $000..$01F %110 = 64 registers remap $000..$03F %111 = 128 registers remap $000..$07F The new mapping scheme will be in effect on the third instruction after SETMAP. After that, changes to INDB or the task number will have an immediate effect on block selection. The remapping mechanism only works with hard-coded D and S addresses which range from $000 to the remapped-register-count minus 1 (see %RRR above), not via INDA and INDB accesses. Below is an elaboration of all uniquely-useful remapping schemes: S/D addresses %M_BBB_RRR blocks regs initial -> remapped block selector ----------------------------------------------------------------------------- %x_000_xxx 1 x %0_001_000 2 1 %000000000 -> %00000000P P = INDB[0] %0_001_001 2 2 %00000000X -> %0000000PX %0_001_010 2 4 %0000000XX -> %000000PXX (2 threads) %0_001_011 2 8 %000000XXX -> %00000PXXX %0_001_100 2 16 %00000XXXX -> %0000PXXXX %0_001_101 2 32 %0000XXXXX -> %000PXXXXX %0_001_110 2 64 %000XXXXXX -> %00PXXXXXX %0_001_111 2 128 %00XXXXXXX -> %0PXXXXXXX %0_010_000 4 1 %000000000 -> %0000000PP PP = INDB[1..0] %0_010_001 4 2 %00000000X -> %000000PPX %0_010_010 4 4 %0000000XX -> %00000PPXX (4 threads) %0_010_011 4 8 %000000XXX -> %0000PPXXX %0_010_100 4 16 %00000XXXX -> %000PPXXXX %0_010_101 4 32 %0000XXXXX -> %00PPXXXXX %0_010_110 4 64 %000XXXXXX -> %0PPXXXXXX %0_010_111 4 128 %00XXXXXXX -> %PPXXXXXXX %0_011_000 8 1 %000000000 -> %000000PPP PPP = INDB[2..0] %0_011_001 8 2 %00000000X -> %00000PPPX %0_011_010 8 4 %0000000XX -> %0000PPPXX (8 threads) %0_011_011 8 8 %000000XXX -> %000PPPXXX %0_011_100 8 16 %00000XXXX -> %00PPPXXXX %0_011_101 8 32 %0000XXXXX -> %0PPPXXXXX %0_011_110 8 64 %000XXXXXX -> %PPPXXXXXX %0_100_000 16 1 %000000000 -> %00000PPPP PPPP = INDB[3..0] %0_100_001 16 2 %00000000X -> %0000PPPPX %0_100_010 16 4 %0000000XX -> %000PPPPXX (16 threads) %0_100_011 16 8 %000000XXX -> %00PPPPXXX %0_100_100 16 16 %00000XXXX -> %0PPPPXXXX %0_100_101 16 32 %0000XXXXX -> %PPPPXXXXX %0_101_000 32 1 %000000000 -> %0000PPPPP PPPPP = INDB[4..0] %0_101_001 32 2 %00000000X -> %000PPPPPX %0_101_010 32 4 %0000000XX -> %00PPPPPXX (32 threads) %0_101_011 32 8 %000000XXX -> %0PPPPPXXX %0_101_100 32 16 %00000XXXX -> %PPPPPXXXX %0_110_000 64 1 %000000000 -> %000PPPPPP PPPPPP = INDB[5..0] %0_110_001 64 2 %00000000X -> %00PPPPPPX %0_110_010 64 4 %0000000XX -> %0PPPPPPXX (64 threads) %0_110_011 64 8 %000000XXX -> %PPPPPPXXX %0_111_000 128 1 %000000000 -> %00PPPPPPP PPPPPPP = INDB[6..0] %0_111_001 128 2 %00000000X -> %0PPPPPPPX %0_111_010 128 4 %0000000XX -> %PPPPPPPXX (128 threads) %1_001_000 2 1 %000000000 -> %00000000T T = bit 0 of the task number %1_001_001 2 2 %00000000X -> %0000000TX %1_001_010 2 4 %0000000XX -> %000000TXX (2 tasks) %1_001_011 2 8 %000000XXX -> %00000TXXX %1_001_100 2 16 %00000XXXX -> %0000TXXXX %1_001_101 2 32 %0000XXXXX -> %000TXXXXX %1_001_110 2 64 %000XXXXXX -> %00TXXXXXX %1_001_111 2 128 %00XXXXXXX -> %0TXXXXXXX %1_010_000 4 1 %000000000 -> %0000000TT TT = task number %1_010_001 4 2 %00000000X -> %000000TTX %1_010_010 4 4 %0000000XX -> %00000TTXX (4 tasks) %1_010_011 4 8 %000000XXX -> %0000TTXXX %1_010_100 4 16 %00000XXXX -> %000TTXXXX %1_010_101 4 32 %0000XXXXX -> %00TTXXXXX %1_010_110 4 64 %000XXXXXX -> %0TTXXXXXX %1_010_111 4 128 %00XXXXXXX -> %TTXXXXXXX Here is an example program which uses remapping with multi-threading: DAT org period long 2-1 '$000, thread 0 (20 longs initally execute as NOPs) time long 0 '$001, thread 0 pin_x long 0 '$002, thread 0 pin_y long 1 '$003, thread 0 long 4-1 '$000, thread 1 long 0 '$001, thread 1 long 2 '$002, thread 1 long 3 '$003, thread 1 long 8-1 '$000, thread 2 long 0 '$001, thread 2 long 4 '$002, thread 2 long 5 '$003, thread 2 long 16-1 '$000, thread 3 long 0 '$001, thread 3 long 6 '$002, thread 3 long 7 '$003, thread 3 pc long loop[4] '$010..$013, all threads start at loop setmap #%0_010_010 'remap 4 blocks of 4 regs by INDA[1..0] fixindb #pc+3,#pc 'set INDA to cycle through blocks and threads nop 'allow SETMAP to take effect before 'switch' loop switch 'switch to next thread incmod time,period wc 'increment time and reset if period reached (C=1) if_c notp pin_x 'if period reached, toggle pin_x setpc pin_y 'if period reached, pin_y high jmp #loop '(4 threads executing same code with unique variables) Here is an example program which uses remapping with multi-tasking: DAT org period long 2-1 '$000, task 0 (16 longs initally execute like NOPs) time long 0 '$001, task 0 pin_x long 0 '$002, task 0 pin_y long 1 '$003, task 0 long 4-1 '$000, task 1 long 0 '$001, task 1 long 2 '$002, task 1 long 3 '$003, task 1 long 8-1 '$000, task 2 long 0 '$001, task 2 long 4 '$002, task 2 long 5 '$003, task 2 long 16-1 '$000, task 3 long 0 '$001, task 3 long 6 '$002, task 3 long 7 '$003, task 3 setmap #%1_010_010 'remap 4 blocks of 4 regs by task settask #%%3210 'set all 4 tasks in motion jmptask #%1111,#loop 'herd tasks to loop loop incmod time,period wc 'increment time and reset if period reached (C=1) if_c notp pin_x 'if period reached, toggle pin_x setpc pin_y 'if period reached, pin_y high jmp #loop '(4 tasks executing same code with unique registers) PORT D INTER-COG EXCHANGE ------------------------- Port A, associated with PINA/OUTA/DIRA, connects to external pins 0..31. *** SAME Port B, associated with PINB/OUTB/DIRB, connects to external pins 32..63. *** SAME Port C, associated with PINC/OUTC/DIRC, connects to external pins 64..91. *** SAME Port D, associated with PIND/OUTD/DIRD, connects to internal pins 96..127. *** DIFFERENT!!! The internal pins of port D differ from the external pins of ports A/B/C in regard to both outputs and inputs: Each cog generates its port D outputs in the same pattern it generates its port A/B/C outputs: OUTD is OR'd with SERA/SERB/CTRA/CTRB/XFR/TRACE outputs 127..96, then those 32 bits get AND'd with DIRD to form the port D outputs. The difference is that all the cogs' port D outputs are not OR'd together before going to a set of 32 I/O pins. Instead, each cog's port D outputs are kept separated, and every cog can determine which other cogs' port D outputs it wants to see in its own PIND input, which also feeds SERA/SERB/CTRA/CTRB/XFR inputs 127..96. The SETXCH instruction is used to set the PIND input filter: SETXCH D/# - Set PIND input filter to %DDDDDDDD_CCCCCCCC_BBBBBBBB_AAAAAAAA %DDDDDDDD = filter for PIND[31..24] %xxxxxxx1 = cog 0's port D output [31..24] will be OR'd into PIND[31..24] input %xxxxxx1x = cog 1's port D output [31..24] will be OR'd into PIND[31..24] input %xxxxx1xx = cog 2's port D output [31..24] will be OR'd into PIND[31..24] input %xxxx1xxx = cog 3's port D output [31..24] will be OR'd into PIND[31..24] input %xxx1xxxx = cog 4's port D output [31..24] will be OR'd into PIND[31..24] input %xx1xxxxx = cog 5's port D output [31..24] will be OR'd into PIND[31..24] input %x1xxxxxx = cog 6's port D output [31..24] will be OR'd into PIND[31..24] input %1xxxxxxx = cog 7's port D output [31..24] will be OR'd into PIND[31..24] input %CCCCCCCC = filter for PIND[23..16] %xxxxxxx1 = cog 0's port D output [23..16] will be OR'd into PIND[23..16] input %xxxxxx1x = cog 1's port D output [23..16] will be OR'd into PIND[23..16] input %xxxxx1xx = cog 2's port D output [23..16] will be OR'd into PIND[23..16] input %xxxx1xxx = cog 3's port D output [23..16] will be OR'd into PIND[23..16] input %xxx1xxxx = cog 4's port D output [23..16] will be OR'd into PIND[23..16] input %xx1xxxxx = cog 5's port D output [23..16] will be OR'd into PIND[23..16] input %x1xxxxxx = cog 6's port D output [23..16] will be OR'd into PIND[23..16] input %1xxxxxxx = cog 7's port D output [23..16] will be OR'd into PIND[23..16] input %BBBBBBBB = filter for PIND[15..8] %xxxxxxx1 = cog 0's port D output [15..8] will be OR'd into PIND[15..8] input %xxxxxx1x = cog 1's port D output [15..8] will be OR'd into PIND[15..8] input %xxxxx1xx = cog 2's port D output [15..8] will be OR'd into PIND[15..8] input %xxxx1xxx = cog 3's port D output [15..8] will be OR'd into PIND[15..8] input %xxx1xxxx = cog 4's port D output [15..8] will be OR'd into PIND[15..8] input %xx1xxxxx = cog 5's port D output [15..8] will be OR'd into PIND[15..8] input %x1xxxxxx = cog 6's port D output [15..8] will be OR'd into PIND[15..8] input %1xxxxxxx = cog 7's port D output [15..8] will be OR'd into PIND[15..8] input %AAAAAAAA = filter for PIND[7..0] %xxxxxxx1 = cog 0's port D output [7..0] will be OR'd into PIND[7..0] input %xxxxxx1x = cog 1's port D output [7..0] will be OR'd into PIND[7..0] input %xxxxx1xx = cog 2's port D output [7..0] will be OR'd into PIND[7..0] input %xxxx1xxx = cog 3's port D output [7..0] will be OR'd into PIND[7..0] input %xxx1xxxx = cog 4's port D output [7..0] will be OR'd into PIND[7..0] input %xx1xxxxx = cog 5's port D output [7..0] will be OR'd into PIND[7..0] input %x1xxxxxx = cog 6's port D output [7..0] will be OR'd into PIND[7..0] input %1xxxxxxx = cog 7's port D output [7..0] will be OR'd into PIND[7..0] input To input only cog 0's port D output into PIND, you would use the filter value $01_01_01_01. To input the logical OR of cog 0's and cog 1's port D outputs into PIND, you would use $03_03_03_03. In most cases, it may be desirable to just see one other cog's full port D output in a PIND input, but many other arrangements are possible. SETBYTE and GETBYTE instructions can be used to efficiently move bytes via OUTD/PIND windows. After SETXCH, PIND can be read for newly-filtered data on the third clock: SETXCH #$00000001 'change filter MOV X,PIND 'data from old filter MOV X,PIND 'data from old filter MOV X,PIND 'data from new filter Writes to an OUTD are readable from a PIND on the third clock, as well. SERIAL TRANSCEIVERS ------------------- Each cog has two asynchronous full-duplex serial transceivers, called SERA and SERB, which can transmit and receive 8-bit and/or 32-bit data, with an optionally-appended 4-bit ID to enable automatic data filtering on the receiver side. To use SERA/SERB: - Configure the transceiver and set the baud rate(s) using SETSERA/SETSERB. - Make the TX pin an output if you are going to transmit. - Execute SEROUTA/SEROUTB instructions to transmit data. - Execute SERINA/SERINB instructions to receive data. Baud rates are established in terms of clocks per bit, or by the bit period. Valid bit periods range from 1..65535 (160Mbps..2441bps @160MHz). The practical minimum bit period between same-frequency Propeller chips is 3, which yields 53.333Mbps @160MHz. Before transmitting or receiving data, SERA/SERB must be configured: SETSERA D/#,S/# - Set SERA configuration to %KKKK_NNNN_MMMM_R_T_DD_CCCCCCC_BB_AAAAAAA using D/#. Set SERA transmit period to S/#[15..0]. Set SERA receive period to S/#[31..16], unless value is 0, in which case use S/#[15..0]. SETSERB D/#,S/# - Set SERB configuration to %KKKK_NNNN_MMMM_R_T_DD_CCCCCCC_BB_AAAAAAA using D/#. Set SERB transmit period to S/#[15..0]. Set SERB receive period to S/#[31..16], unless value is 0, in which case use S/#[15..0]. %KKKK = transmitter ID %NNNN = receiver ID target %MMMM = receiver ID mask %R = receiver ID mode %0 = receiver ID disabled, only 8 or 32 data bits will be received %1 = receiver ID enabled, four additional ID bits (%JJJJ) will be received, received data will only by captured if (%JJJJ & %MMMM) = %NNNN %T = transmitter ID mode %0 = transmitter ID disabled, only 8 or 32 data bits will be transmitted %1 = transmitter ID enabled, %KKKK will be appended to the transmit data %DD = receiver mode %00 = receiver disabled %01 = 32-bit data, inverse RX polarity (STOP=L, START=H) %10 = 8-bit data, true RX polarity (STOP=H, START=L) %11 = 8-bit data, inverse RX polarity (STOP=L, START=H) %CCCCCCC = RX pin, 0..127 %BB = transmitter mode %00 = transmitter disabled %01 = 32-bit data, inverse TX polarity (STOP=L, START=H) %10 = 8-bit data, true TX polarity (STOP=H, START=L) %11 = 8-bit data, inverse TX polarity (STOP=L, START=H) %AAAAAAA = TX pin, 0..127 The SERA/SERB configuration registers are initialized to $00000000 on cog start. Once a transmitter is enabled, the following instructions may be used to transmit data: SEROUTA D/# - wait to transmit D/# on SERA - if single-task, stalls pipeline until D/# captured - if multi-task, loops until D/# captured (frees pipeline) SEROUTA D/# WC - try to transmit D/# on SERA, C=1 if D/# captured - always takes 1 clock SEROUTB D/# - wait to transmit D/# on SERB - if single-task, stalls pipeline until D/# captured - if multi-task, loops until D/# captured (frees pipeline) SEROUTB D/# WC - try to transmit D/# on SERB, C=1 if D/# captured - always takes 1 clock The transmitters operate by capturing data from a SEROUTA/SEROUTB instruction, and then outputting timed states on TX. First, a STOP state is output, then a START state, followed by the data bits (and optional ID bits), LSB first, with a STOP state being output at the end, but not timed, as the transmitter is no longer busy and it is ready to receive more data from another SEROUTA/SEROUTB command. Once a receiver is enabled, the following instructions may be used to receive data: SERINA D - wait to receive data from SERA into D - if single-task, stalls pipeline until data captured - if multi-task, loops until data captured (frees pipeline) SERINA D WC - try to receive new data from SERA into D, C=1 if new data - always takes 1 clock SERINB D - wait to receive data from SERB into D - if single-task, stalls pipeline until data captured - if multi-task, loops until data captured (frees pipeline) SERINB D WC - try to receive new data from SERB into D, C=1 if new data - always takes 1 clock The receivers wait for a STOP state on RX, then a START state, and then they sample the data bits (and optional ID bits), LSB first, on the center of each bit period, until the last bit is sampled. At that point, the received data is captured and made available via SERINA/SERINB, and the receiver goes back to waiting for another STOP state. To transmit "Hello" at 2M baud, if you're running at 80MHz: SETSERA #%10<<7 + 3,#40 'set SERA for 8-bit transmit on pin 3 at 40 clocks/bit CLRP #3 'make pin3 an output, SERA drives it high SEROUTA #"H" 'send message SEROUTA #"e" SEROUTA #"l" SEROUTA #"l" SEROUTA #"o" JMP #$ Here is an example which receives 32-bit data and outputs it to pins 31..0: SETSERA _sera,#3 'set 32-bit data, pin 33, use fast bit period of 3 NEG DIRA,#1 'make P31..P0 outputs LOOP SERINA OUTA 'receive 32 bits into P31..P0 JMP #LOOP 'loop _sera LONG %01<<16 + 33<<9 '32-bit data, pin 33 To do the same thing, but with filtering, just change _sera: _sera LONG %0110_1110_1<<19 + %01<<16 + 33<<9 'only allow ID's %0110 and %0111 TRACE ----- A cog can cause its execution state (from pipeline stage 4) to be output to pins on every clock cycle by using the SETRACE instruction: SETRACE D/# - Set trace configuration to %TTTT %TTTT = trace configuration %0xx0 = trace output disabled (initial state on cog start) %0001 = output 32-bit trace to pins 31..0 %0011 = output 32-bit trace to pins 63..32 %0101 = output 32-bit trace to pins 95..64 %0111 = output 32-bit trace to pins 127..96 %1000 = output 16-bit trace to pins 15..0 %1001 = output 16-bit trace to pins 31..16 %1010 = output 16-bit trace to pins 47..32 %1011 = output 16-bit trace to pins 63..48 %1100 = output 16-bit trace to pins 79..64 %1101 = output 16-bit trace to pins 95..80 (pins 95..92 don't exist) %1110 = output 16-bit trace to pins 111..96 %1111 = output 16-bit trace to pins 127..112 The 32-bit trace output is comprised of the following signals, from MSB to LSB: TASK[1..0] - the executing task, 0..3 HUB - hub cycle that comes once every 8 clocks FETCH - pipeline stall due to hub instruction fetch GO - pipeline not stalled and instruction done COND - execution condition JUMP - a jump is executing VID_ACK - WAITVID able to execute CTRA_SYNC - CTRA is rolling over CTRB_SYNC - CTRB is rolling over SERA_RX_RDY - SERA's receive buffer is full, ready for SERINA SERA_TX_RDY - SERA's transmit buffer is empty, ready for SEROUTA SERB_RX_RDY - SERB's receive buffer is full, ready for SERINB SERB_TX_RDY - SERB's transmit buffer is empty, ready for SEROUTB PC[15..0] - full 16 bits of the program counter The 16-bit trace output is comprised of the following signals, from MSB to LSB: TASK[1..0] - the executing task, 0..3 HUB - hub cycle that comes once every 8 clocks FETCH - pipeline stall due to hub instruction fetch GO - pipeline not stalled and instruction done COND - execution condition JUMP - a jump is executing PC[8..0] - lower 9 bits of the program counter For the output to appear, the DIR bits corresponding to the trace pins must be set. Idea: By outputting trace data to the internal port D pins (%PPP = %11x), and having another cog trigger using WAITPEQ before logging trace data, a trace debugger could be made. INSTRUCTION LIST ---------------- ZCDS (for D column: W=write, M=modify, R=read, L=read/immediate) ---------------------------------------------------------------------------------------------------------------------- ZCWS 0000000 ZC I CCCC DDDDDDDDD SSSSSSSSS RDBYTE D,S/PTRA/PTRB (waits for hub) ZCWS 0000001 ZC I CCCC DDDDDDDDD SSSSSSSSS RDBYTEC D,S/PTRA/PTRB (waits for hub if dcache miss) ZCWS 0000010 ZC I CCCC DDDDDDDDD SSSSSSSSS RDWORD D,S/PTRA/PTRB (waits for hub) ZCWS 0000011 ZC I CCCC DDDDDDDDD SSSSSSSSS RDWORDC D,S/PTRA/PTRB (waits for hub if dcache miss) ZCWS 0000100 ZC I CCCC DDDDDDDDD SSSSSSSSS RDLONG D,S/PTRA/PTRB (waits for hub) ZCWS 0000101 ZC I CCCC DDDDDDDDD SSSSSSSSS RDLONGC D,S/PTRA/PTRB (waits for hub if dcache miss) ZCWS 0000110 ZC I CCCC DDDDDDDDD SSSSSSSSS RDAUX D,S/#0..$FF/PTRX/PTRY ZCWS 0000111 ZC I CCCC DDDDDDDDD SSSSSSSSS RDAUXR D,S/#0..$FF/PTRX/PTRY ZCMS 0001000 ZC I CCCC DDDDDDDDD SSSSSSSSS ISOB D,S/# ZCMS 0001001 ZC I CCCC DDDDDDDDD SSSSSSSSS NOTB D,S/# ZCMS 0001010 ZC I CCCC DDDDDDDDD SSSSSSSSS CLRB D,S/# ZCMS 0001011 ZC I CCCC DDDDDDDDD SSSSSSSSS SETB D,S/# ZCMS 0001100 ZC I CCCC DDDDDDDDD SSSSSSSSS SETBC D,S/# ZCMS 0001101 ZC I CCCC DDDDDDDDD SSSSSSSSS SETBNC D,S/# ZCMS 0001110 ZC I CCCC DDDDDDDDD SSSSSSSSS SETBZ D,S/# ZCMS 0001111 ZC I CCCC DDDDDDDDD SSSSSSSSS SETBNZ D,S/# ZCMS 0010000 ZC I CCCC DDDDDDDDD SSSSSSSSS ANDN D,S/# ZCMS 0010001 ZC I CCCC DDDDDDDDD SSSSSSSSS AND D,S/# ZCMS 0010010 ZC I CCCC DDDDDDDDD SSSSSSSSS OR D,S/# ZCMS 0010011 ZC I CCCC DDDDDDDDD SSSSSSSSS XOR D,S/# ZCMS 0010100 ZC I CCCC DDDDDDDDD SSSSSSSSS MUXC D,S/# ZCMS 0010101 ZC I CCCC DDDDDDDDD SSSSSSSSS MUXNC D,S/# ZCMS 0010110 ZC I CCCC DDDDDDDDD SSSSSSSSS MUXZ D,S/# ZCMS 0010111 ZC I CCCC DDDDDDDDD SSSSSSSSS MUXNZ D,S/# ZCMS 0011000 ZC I CCCC DDDDDDDDD SSSSSSSSS ROR D,S/# ZCMS 0011001 ZC I CCCC DDDDDDDDD SSSSSSSSS ROL D,S/# ZCMS 0011010 ZC I CCCC DDDDDDDDD SSSSSSSSS SHR D,S/# ZCMS 0011011 ZC I CCCC DDDDDDDDD SSSSSSSSS SHL D,S/# ZCMS 0011100 ZC I CCCC DDDDDDDDD SSSSSSSSS RCR D,S/# ZCMS 0011101 ZC I CCCC DDDDDDDDD SSSSSSSSS RCL D,S/# ZCMS 0011110 ZC I CCCC DDDDDDDDD SSSSSSSSS SAR D,S/# ZCMS 0011111 ZC I CCCC DDDDDDDDD SSSSSSSSS REV D,S/# ZCWS 0100000 ZC I CCCC DDDDDDDDD SSSSSSSSS MOV D,S/# ZCWS 0100001 ZC I CCCC DDDDDDDDD SSSSSSSSS NOT D,S/# ZCWS 0100010 ZC I CCCC DDDDDDDDD SSSSSSSSS ABS D,S/# ZCWS 0100011 ZC I CCCC DDDDDDDDD SSSSSSSSS NEG D,S/# ZCWS 0100100 ZC I CCCC DDDDDDDDD SSSSSSSSS NEGC D,S/# ZCWS 0100101 ZC I CCCC DDDDDDDDD SSSSSSSSS NEGNC D,S/# ZCWS 0100110 ZC I CCCC DDDDDDDDD SSSSSSSSS NEGZ D,S/# ZCWS 0100111 ZC I CCCC DDDDDDDDD SSSSSSSSS NEGNZ D,S/# ZCMS 0101000 ZC I CCCC DDDDDDDDD SSSSSSSSS ADD D,S/# ZCMS 0101001 ZC I CCCC DDDDDDDDD SSSSSSSSS SUB D,S/# ZCMS 0101010 ZC I CCCC DDDDDDDDD SSSSSSSSS ADDX D,S/# ZCMS 0101011 ZC I CCCC DDDDDDDDD SSSSSSSSS SUBX D,S/# ZCMS 0101100 ZC I CCCC DDDDDDDDD SSSSSSSSS ADDS D,S/# ZCMS 0101101 ZC I CCCC DDDDDDDDD SSSSSSSSS SUBS D,S/# ZCMS 0101110 ZC I CCCC DDDDDDDDD SSSSSSSSS ADDSX D,S/# ZCMS 0101111 ZC I CCCC DDDDDDDDD SSSSSSSSS SUBSX D,S/# ZCMS 0110000 ZC I CCCC DDDDDDDDD SSSSSSSSS SUMC D,S/# ZCMS 0110001 ZC I CCCC DDDDDDDDD SSSSSSSSS SUMNC D,S/# ZCMS 0110010 ZC I CCCC DDDDDDDDD SSSSSSSSS SUMZ D,S/# ZCMS 0110011 ZC I CCCC DDDDDDDDD SSSSSSSSS SUMNZ D,S/# ZCMS 0110100 ZC I CCCC DDDDDDDDD SSSSSSSSS MIN D,S/# ZCMS 0110101 ZC I CCCC DDDDDDDDD SSSSSSSSS MAX D,S/# ZCMS 0110110 ZC I CCCC DDDDDDDDD SSSSSSSSS MINS D,S/# ZCMS 0110111 ZC I CCCC DDDDDDDDD SSSSSSSSS MAXS D,S/# ZCMS 0111000 ZC I CCCC DDDDDDDDD SSSSSSSSS ADDABS D,S/# ZCMS 0111001 ZC I CCCC DDDDDDDDD SSSSSSSSS SUBABS D,S/# ZCMS 0111010 ZC I CCCC DDDDDDDDD SSSSSSSSS INCMOD D,S/# ZCMS 0111011 ZC I CCCC DDDDDDDDD SSSSSSSSS DECMOD D,S/# ZCMS 0111100 ZC I CCCC DDDDDDDDD SSSSSSSSS CMPSUB D,S/# ZCMS 0111101 ZC I CCCC DDDDDDDDD SSSSSSSSS SUBR D,S/# ZCMS 0111110 ZC I CCCC DDDDDDDDD SSSSSSSSS MUL D,S/# (waits one clock) ZCMS 0111111 ZC I CCCC DDDDDDDDD SSSSSSSSS SCL D,S/# (waits one clock) ZCWS 1000000 ZC I CCCC DDDDDDDDD SSSSSSSSS DECOD2 D,S/# ZCWS 1000001 ZC I CCCC DDDDDDDDD SSSSSSSSS DECOD3 D,S/# ZCWS 1000010 ZC I CCCC DDDDDDDDD SSSSSSSSS DECOD4 D,S/# ZCWS 1000011 ZC I CCCC DDDDDDDDD SSSSSSSSS DECOD5 D,S/# Z-WS 1000100 Z0 I CCCC DDDDDDDDD SSSSSSSSS ENCOD D,S/# Z-WS 1000100 Z1 I CCCC DDDDDDDDD SSSSSSSSS BLMASK D,S/# Z-WS 1000101 Z0 I CCCC DDDDDDDDD SSSSSSSSS ONECNT D,S/# (waits one clock) Z-WS 1000101 Z1 I CCCC DDDDDDDDD SSSSSSSSS ZERCNT D,S/# (waits one clock) -CWS 1000110 0C I CCCC DDDDDDDDD SSSSSSSSS INCPAT D,S/# -CWS 1000110 1C I CCCC DDDDDDDDD SSSSSSSSS DECPAT D,S/# --WS 1000111 00 I CCCC DDDDDDDDD SSSSSSSSS SPLITB D,S/# (also MERGEN) --WS 1000111 01 I CCCC DDDDDDDDD SSSSSSSSS MERGEB D,S/# (also SPLITN) --WS 1000111 10 I CCCC DDDDDDDDD SSSSSSSSS SPLITW D,S/# --WS 1000111 11 I CCCC DDDDDDDDD SSSSSSSSS MERGEW D,S/# --MS 10010nn n0 I CCCC DDDDDDDDD SSSSSSSSS GETNIB D,S/#,#0..7 --MS 10010nn n1 I CCCC DDDDDDDDD SSSSSSSSS SETNIB D,S/#,#0..7 --MS 1001100 n0 I CCCC DDDDDDDDD SSSSSSSSS GETWORD D,S/#,#0..1 --MS 1001100 n1 I CCCC DDDDDDDDD SSSSSSSSS SETWORD D,S/#,#0..1 --MS 1001101 00 I CCCC DDDDDDDDD SSSSSSSSS SETWRDS D,S/# --MS 1001101 01 I CCCC DDDDDDDDD SSSSSSSSS ROLNIB D,S/# --MS 1001101 10 I CCCC DDDDDDDDD SSSSSSSSS ROLBYTE D,S/# --MS 1001101 11 I CCCC DDDDDDDDD SSSSSSSSS ROLWORD D,S/# --MS 1001110 00 I CCCC DDDDDDDDD SSSSSSSSS SETS D,S/# --MS 1001110 01 I CCCC DDDDDDDDD SSSSSSSSS SETD D,S/# --MS 1001110 10 I CCCC DDDDDDDDD SSSSSSSSS SETX D,S/# --MS 1001110 11 I CCCC DDDDDDDDD SSSSSSSSS SETI D,S/# -CMS 1001111 0C I CCCC DDDDDDDDD SSSSSSSSS COGNEW D,S/# (waits for hub) -CMS 1001111 1C I CCCC DDDDDDDDD SSSSSSSSS WAITCNT D,S/# (waits for CNT, +CNTX if WC) --MS 101000n n0 I CCCC DDDDDDDDD SSSSSSSSS GETBYTE D,S/#,#0..3 --MS 101000n n1 I CCCC DDDDDDDDD SSSSSSSSS SETBYTE D,S/#,#0..3 --WS 1010010 00 I CCCC DDDDDDDDD SSSSSSSSS SETBYTS D,S/# --MS 1010010 01 I CCCC DDDDDDDDD SSSSSSSSS MOVBYTS D,S/# (move bytes in D, S = %11_10_01_00 = D same) --MS 1010010 10 I CCCC DDDDDDDDD SSSSSSSSS PACKRGB D,S/# (S 8:8:8 -> D 5:5:5 << 16 | D >> 16) --WS 1010010 11 I CCCC DDDDDDDDD SSSSSSSSS UNPKRGB D,S/# (S 5:5:5 -> D 8:8:8) --MS 1010011 00 I CCCC DDDDDDDDD SSSSSSSSS ADDPIX D,S/# (waits one clock) --MS 1010011 01 I CCCC DDDDDDDDD SSSSSSSSS MULPIX D,S/# (waits one clock) --MS 1010011 10 I CCCC DDDDDDDDD SSSSSSSSS BLNPIX D,S/# (waits one clock) --MS 1010011 11 I CCCC DDDDDDDDD SSSSSSSSS MIXPIX D,S/# (waits one clock) ZCMS 1010100 ZC I CCCC DDDDDDDDD SSSSSSSSS JMPSW D,S/@ ZCMS 1010101 ZC I CCCC DDDDDDDDD SSSSSSSSS JMPSWD D,S/@ --MS 1010110 00 I CCCC DDDDDDDDD SSSSSSSSS IJZ D,S/@ --MS 1010110 01 I CCCC DDDDDDDDD SSSSSSSSS IJZD D,S/@ --MS 1010110 10 I CCCC DDDDDDDDD SSSSSSSSS IJNZ D,S/@ --MS 1010110 11 I CCCC DDDDDDDDD SSSSSSSSS IJNZD D,S/@ --MS 1010111 00 I CCCC DDDDDDDDD SSSSSSSSS DJZ D,S/@ --MS 1010111 01 I CCCC DDDDDDDDD SSSSSSSSS DJZD D,S/@ --MS 1010111 10 I CCCC DDDDDDDDD SSSSSSSSS DJNZ D,S/@ --MS 1010111 11 I CCCC DDDDDDDDD SSSSSSSSS DJNZD D,S/@ ZCRS 1011000 ZC I CCCC DDDDDDDDD SSSSSSSSS TESTB D,S/# ZCRS 1011001 ZC I CCCC DDDDDDDDD SSSSSSSSS TESTN D,S/# ZCRS 1011010 ZC I CCCC DDDDDDDDD SSSSSSSSS TEST D,S/# ZCRS 1011011 ZC I CCCC DDDDDDDDD SSSSSSSSS CMP D,S/# ZCRS 1011100 ZC I CCCC DDDDDDDDD SSSSSSSSS CMPX D,S/# ZCRS 1011101 ZC I CCCC DDDDDDDDD SSSSSSSSS CMPS D,S/# ZCRS 1011110 ZC I CCCC DDDDDDDDD SSSSSSSSS CMPSX D,S/# ZCRS 1011111 ZC I CCCC DDDDDDDDD SSSSSSSSS CMPR D,S/# --RS 11000nn n0 I CCCC DDDDDDDDD SSSSSSSSS COGINIT D,S/#,#0..7 (waits for hub) (use SETNIB :coginit,cog,#6 before) ---S 11000nn n1 I CCCC nnnnnnnnn SSSSSSSSS WAITVID #0..$DFF,S/# (waits for vid if single-task, loops if multi-task) --RS 1100011 11 I CCCC DDDDDDDDD SSSSSSSSS WAITVID D,S/# (waits for vid if single-task, loops if multi-task) -CRS 110010n nC I CCCC DDDDDDDDD SSSSSSSSS WAITPEQ D,S/#,#0..3 (waits for pins, plus CNT if WC) -CRS 110011n nC I CCCC DDDDDDDDD SSSSSSSSS WAITPNE D,S/#,#0..3 (waits for pins, plus CNT if WC) --LS 1101000 0L I CCCC DDDDDDDDD SSSSSSSSS WRBYTE D/#,S/PTRA/PTRB (waits for hub) --LS 1101000 1L I CCCC DDDDDDDDD SSSSSSSSS WRWORD D/#,S/PTRA/PTRB (waits for hub) --LS 1101001 0L I CCCC DDDDDDDDD SSSSSSSSS WRLONG D/#,S/PTRA/PTRB (waits for hub) --LS 1101001 1L I CCCC DDDDDDDDD SSSSSSSSS FRAC D/#,S/# --LS 1101010 0L I CCCC DDDDDDDDD SSSSSSSSS WRAUX D/#,S/#0..$FF/PTRX/PTRY --LS 1101010 1L I CCCC DDDDDDDDD SSSSSSSSS WRAUXR D/#,S/#0..$FF/PTRX/PTRY --LS 1101011 0L I CCCC DDDDDDDDD SSSSSSSSS SETACCA D/#,S/# --LS 1101011 1L I CCCC DDDDDDDDD SSSSSSSSS SETACCB D/#,S/# --LS 1101100 0L I CCCC DDDDDDDDD SSSSSSSSS MACA D/#,S/# --LS 1101100 1L I CCCC DDDDDDDDD SSSSSSSSS MACB D/#,S/# --LS 1101101 0L I CCCC DDDDDDDDD SSSSSSSSS MUL32 D/#,S/# --LS 1101101 1L I CCCC DDDDDDDDD SSSSSSSSS MUL32U D/#,S/# --LS 1101110 0L I CCCC DDDDDDDDD SSSSSSSSS DIV32 D/#,S/# --LS 1101110 1L I CCCC DDDDDDDDD SSSSSSSSS DIV32U D/#,S/# --LS 1101111 0L I CCCC DDDDDDDDD SSSSSSSSS DIV64 D/#,S/# --LS 1101111 1L I CCCC DDDDDDDDD SSSSSSSSS DIV64U D/#,S/# --LS 1110000 0L I CCCC DDDDDDDDD SSSSSSSSS SQRT64 D/#,S/# --LS 1110000 1L I CCCC DDDDDDDDD SSSSSSSSS QSINCOS D/#,S/# --LS 1110001 0L I CCCC DDDDDDDDD SSSSSSSSS QARCTAN D/#,S/# --LS 1110001 1L I CCCC DDDDDDDDD SSSSSSSSS QROTATE D/#,S/# --LS 1110010 0L I CCCC DDDDDDDDD SSSSSSSSS SETSERA D/#,S/# (config,baud) --LS 1110010 1L I CCCC DDDDDDDDD SSSSSSSSS SETSERB D/#,S/# (config,baud) --LS 1110011 0L I CCCC DDDDDDDDD SSSSSSSSS SETCTRS D/#,S/# (ctrb,ctra) --LS 1110011 1L I CCCC DDDDDDDDD SSSSSSSSS SETWAVS D/#,S/# (ctrb,ctra) --LS 1110100 0L I CCCC DDDDDDDDD SSSSSSSSS SETFRQS D/#,S/# (ctrb,ctra) --LS 1110100 1L I CCCC DDDDDDDDD SSSSSSSSS SETPHSS D/#,S/# (ctrb,ctra) --LS 1110101 0L I CCCC DDDDDDDDD SSSSSSSSS ADDPHSS D/#,S/# (ctrb,ctra) --LS 1110101 1L I CCCC DDDDDDDDD SSSSSSSSS SUBPHSS D/#,S/# (ctrb,ctra) --LS 1110110 0L I CCCC DDDDDDDDD SSSSSSSSS JP D/#,S/@ --LS 1110110 1L I CCCC DDDDDDDDD SSSSSSSSS JPD D/#,S/@ --LS 1110111 0L I CCCC DDDDDDDDD SSSSSSSSS JNP D/#,S/@ --LS 1110111 1L I CCCC DDDDDDDDD SSSSSSSSS JNPD D/#,S/@ --LS 111100n nL I CCCC DDDDDDDDD SSSSSSSSS CFGPINS D/#,S/#,#0..2 (waits for alt) --LS 1111001 1L I CCCC DDDDDDDDD SSSSSSSSS JMPTASK D/#,S/# (mask,address) --LS 1111010 0L I CCCC DDDDDDDDD SSSSSSSSS SETXFR D/#,S/# --LS 1111010 1L I CCCC DDDDDDDDD SSSSSSSSS SETMIX D/#,S/# --RS 1111011 00 I CCCC DDDDDDDDD SSSSSSSSS JZ D,S/@ --RS 1111011 01 I CCCC DDDDDDDDD SSSSSSSSS JZD D,S/@ --RS 1111011 10 I CCCC DDDDDDDDD SSSSSSSSS JNZ D,S/@ --RS 1111011 11 I CCCC DDDDDDDDD SSSSSSSSS JNZD D,S/@ --WS 1111100 00 I CCCC DDDDDDDDD SSSSSSSSS LOCBASE D,S/@ (if S: S<<2, if @S: (P+@S)<<2) --MS 1111100 01 I CCCC DDDDDDDDD SSSSSSSSS LOCBYTE D,S/@ (if S: D<<0 + S<<2, if @S: D<<0 + (P+@S)<<2) --MS 1111100 10 I CCCC DDDDDDDDD SSSSSSSSS LOCWORD D,S/@ (if S: D<<1 + S<<2, if @S: D<<1 + (P+@S)<<2) --MS 1111100 11 I CCCC DDDDDDDDD SSSSSSSSS LOCLONG D,S/@ (if S: D<<2 + S<<2, if @S: D<<2 + (P+@S)<<2) --RS 1111101 00 I CCCC DDDDDDDDD SSSSSSSSS JMPLIST D,S/@ (if S: D<<0 + S<<0, if @S: D<<0 + (P+@S)<<0) --W- 1111101 01 0 CCCC DDDDDDDDD SSSSSSSSS LOCINST D,@S (P+@S) ---- 1111101 01 1 nnnn nnnnnnnnn nnniiiiii REPS #1..$10000,#1..64 ---- 1111101 10 n nnnn nnnnnnnnn nnnnnnnnn AUGS #23bits (appends n to upper bits of next immediate S) ---- 1111101 11 n nnnn nnnnnnnnn nnnnnnnnn AUGD #23bits (appends n to upper bits of next immediate D) ---- 1111110 00 0 BBAA ddddddddd sssssssss FIXINDA #d,#s / FIXINDB #d,#s / FIXINDS #d,#s / SETINDA #s / SETINDB #d / SETINDS #d,#s ---- 1111110 00 1 CCCC 00 nnnnnnnnnnnnnnnn LOCPTRA #abs ---- 1111110 00 1 CCCC 01 nnnnnnnnnnnnnnnn LOCPTRA @rel ---- 1111110 00 1 CCCC 10 nnnnnnnnnnnnnnnn LOCPTRB #abs ---- 1111110 00 1 CCCC 11 nnnnnnnnnnnnnnnn LOCPTRB @rel ---- 1111110 01 0 CCCC 00 nnnnnnnnnnnnnnnn JMP #abs ---- 1111110 01 0 CCCC 01 nnnnnnnnnnnnnnnn JMP @rel ---- 1111110 01 0 CCCC 10 nnnnnnnnnnnnnnnn JMPD #abs ---- 1111110 01 0 CCCC 11 nnnnnnnnnnnnnnnn JMPD @rel ---- 1111110 01 1 CCCC 00 nnnnnnnnnnnnnnnn CALL #abs ---- 1111110 01 1 CCCC 01 nnnnnnnnnnnnnnnn CALL @rel ---- 1111110 01 1 CCCC 10 nnnnnnnnnnnnnnnn CALLD #abs ---- 1111110 01 1 CCCC 11 nnnnnnnnnnnnnnnn CALLD @rel ---- 1111110 10 0 CCCC 00 nnnnnnnnnnnnnnnn CALLA #abs ---- 1111110 10 0 CCCC 01 nnnnnnnnnnnnnnnn CALLA @rel ---- 1111110 10 0 CCCC 10 nnnnnnnnnnnnnnnn CALLAD #abs ---- 1111110 10 0 CCCC 11 nnnnnnnnnnnnnnnn CALLAD @rel ---- 1111110 10 1 CCCC 00 nnnnnnnnnnnnnnnn CALLB #abs ---- 1111110 10 1 CCCC 01 nnnnnnnnnnnnnnnn CALLB @rel ---- 1111110 10 1 CCCC 10 nnnnnnnnnnnnnnnn CALLBD #abs ---- 1111110 10 1 CCCC 11 nnnnnnnnnnnnnnnn CALLBD @rel ---- 1111110 11 0 CCCC 00 nnnnnnnnnnnnnnnn CALLX #abs ---- 1111110 11 0 CCCC 01 nnnnnnnnnnnnnnnn CALLX @rel ---- 1111110 11 0 CCCC 10 nnnnnnnnnnnnnnnn CALLXD #abs ---- 1111110 11 0 CCCC 11 nnnnnnnnnnnnnnnn CALLXD @rel ---- 1111110 11 1 CCCC 00 nnnnnnnnnnnnnnnn CALLY #abs ---- 1111110 11 1 CCCC 01 nnnnnnnnnnnnnnnn CALLY @rel ---- 1111110 11 1 CCCC 10 nnnnnnnnnnnnnnnn CALLYD #abs ---- 1111110 11 1 CCCC 11 nnnnnnnnnnnnnnnn CALLYD @rel ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000000000 COGID D (waits for hub) (doesn't write D if WC) ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000000001 TASKID D ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000000010 LOCKNEW D (waits for hub) ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000000011 GETLFSR D ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000000100 GETCNT D ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000000101 GETCNTX D ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000000110 GETACAL D (waits for mac) ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000000111 GETACAH D (waits for mac) ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000001000 GETACBL D (waits for mac) ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000001001 GETACBH D (waits for mac) ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000001010 GETPTRA D ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000001011 GETPTRB D ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000001100 GETPTRX D ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000001101 GETPTRY D ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000001110 SERINA D (waits for rx if single-task, loops if multi-task, releases if WC) ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000001111 SERINB D (waits for rx if single-task, loops if multi-task, releases if WC) ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000010000 GETMULL D (waits for mul if single-task, loops if multi-task) ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000010001 GETMULH D (waits for mul if single-task, loops if multi-task) ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000010010 GETDIVQ D (waits for div if single-task, loops if multi-task) ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000010011 GETDIVR D (waits for div if single-task, loops if multi-task) ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000010100 GETSQRT D (waits for sqrt if single-task, loops if multi-task) ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000010101 GETQX D (waits for cordic if single-task, loops if multi-task) ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000010110 GETQY D (waits for cordic if single-task, loops if multi-task) ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000010111 GETQZ D (waits for cordic if single-task, loops if multi-task) ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000011000 GETPHSA D ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000011001 GETPHZA D (clears phsa) ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000011010 GETCOSA D ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000011011 GETSINA D ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000011100 GETPHSB D ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000011101 GETPHZB D (clears phsb) ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000011110 GETCOSB D ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000011111 GETSINB D ZCM- 1111111 ZC 0 CCCC DDDDDDDDD 000100000 PUSHZC D ZCM- 1111111 ZC 0 CCCC DDDDDDDDD 000100001 POPZC D ZCM- 1111111 ZC 0 CCCC DDDDDDDDD 000100010 SUBCNT D (subtracts D from CNT, then CNTX if same thread) ZCM- 1111111 ZC 0 CCCC DDDDDDDDD 000100011 GETPIX D (takes 3 clocks, needs 3 clocks in prior two stages, no condition allowed) ZCM- 1111111 ZC 0 CCCC DDDDDDDDD 000100100 BINBCD D ZCM- 1111111 ZC 0 CCCC DDDDDDDDD 000100101 BCDBIN D ZCM- 1111111 ZC 0 CCCC DDDDDDDDD 000100110 BINGRY D ZCM- 1111111 ZC 0 CCCC DDDDDDDDD 000100111 GRYBIN D (waits one clock) ZCM- 1111111 ZC 0 CCCC DDDDDDDDD 000101000 ESWAP4 D ZCM- 1111111 ZC 0 CCCC DDDDDDDDD 000101001 ESWAP8 D ZCM- 1111111 ZC 0 CCCC DDDDDDDDD 000101010 SEUSSF D ZCM- 1111111 ZC 0 CCCC DDDDDDDDD 000101011 SEUSSR D Z-M- 1111111 ZC 0 CCCC DDDDDDDDD 000101100 INCD D (D += $200) Z-M- 1111111 ZC 0 CCCC DDDDDDDDD 000101101 DECD D (D -= $200) Z-M- 1111111 ZC 0 CCCC DDDDDDDDD 000101110 INCDS D (D += $201) Z-M- 1111111 ZC 0 CCCC DDDDDDDDD 000101111 DECDS D (D -= $201) ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000110000 POP D (pops from task's tiny stack) --L- 1111111 00 L CCCC DDDDDDDDD 001iiiiii REPD D/#1..512,#1..64 (REPD $1FF,#1..64 = infinite repeat, can use REPD #i) --L- 1111111 00 L CCCC DDDDDDDDD 010000000 CLKSET D/# (waits for hub) --L- 1111111 00 L CCCC DDDDDDDDD 010000001 COGSTOP D/# (waits for hub) -CL- 1111111 0C L CCCC DDDDDDDDD 010000010 LOCKSET D/# (waits for hub) -CL- 1111111 0C L CCCC DDDDDDDDD 010000011 LOCKCLR D/# (waits for hub) --L- 1111111 00 L CCCC DDDDDDDDD 010000100 LOCKRET D/# (waits for hub) --L- 1111111 00 L CCCC DDDDDDDDD 010000101 RDWIDEC D/PTRA/PTRB (waits for hub if dcache miss) --L- 1111111 00 L CCCC DDDDDDDDD 010000110 RDWIDE D/PTRA/PTRB (waits for hub) --L- 1111111 00 L CCCC DDDDDDDDD 010000111 WRWIDE D/PTRA/PTRB (waits for hub) ZCL- 1111111 ZC L CCCC DDDDDDDDD 010001000 GETP D/# (pin into !Z/C via WZ/WC) ZCL- 1111111 ZC L CCCC DDDDDDDDD 010001001 GETNP D/# (pin into Z/!C via WZ/WC) -CL- 1111111 0C L CCCC DDDDDDDDD 010001010 SEROUTA D/# (waits for tx if single-task, loops if multi-task, releases if WC) -CL- 1111111 0C L CCCC DDDDDDDDD 010001011 SEROUTB D/# (waits for tx if single-task, loops if multi-task, releases if WC) -CL- 1111111 0C L CCCC DDDDDDDDD 010001100 CMPCNT D/# (subtracts D from CNT, then CNTX if same thread) -CL- 1111111 0C L CCCC DDDDDDDDD 010001101 WAITPX D/# (waits for any edge, +CNT if WC) -CL- 1111111 0C L CCCC DDDDDDDDD 010001110 WAITPR D/# (waits for pos edge, +CNT if WC) -CL- 1111111 0C L CCCC DDDDDDDDD 010001111 WAITPF D/# (waits for neg edge, +CNT if WC) ZCL- 1111111 ZC L CCCC DDDDDDDDD 010010000 SETZC D/# (D[1:0] into Z/C via WZ/WC) --L- 1111111 00 L CCCC DDDDDDDDD 010010001 SETMAP D/# --L- 1111111 00 L CCCC DDDDDDDDD 010010010 SETXCH D/# --L- 1111111 00 L CCCC DDDDDDDDD 010010011 SETTASK D/# --L- 1111111 00 L CCCC DDDDDDDDD 010010100 SETRACE D/# --L- 1111111 00 L CCCC DDDDDDDDD 010010101 SARACCA D/# (waits for mac) --L- 1111111 00 L CCCC DDDDDDDDD 010010110 SARACCB D/# (waits for mac) --L- 1111111 00 L CCCC DDDDDDDDD 010010111 SARACCS D/# (waits for mac) --L- 1111111 00 L CCCC DDDDDDDDD 010011000 SETPTRA D/# --L- 1111111 00 L CCCC DDDDDDDDD 010011001 SETPTRB D/# --L- 1111111 00 L CCCC DDDDDDDDD 010011010 ADDPTRA D/# --L- 1111111 00 L CCCC DDDDDDDDD 010011011 ADDPTRB D/# --L- 1111111 00 L CCCC DDDDDDDDD 010011100 SUBPTRA D/# --L- 1111111 00 L CCCC DDDDDDDDD 010011101 SUBPTRB D/# --L- 1111111 00 L CCCC DDDDDDDDD 010011110 SETWIDE D/# --L- 1111111 00 L CCCC DDDDDDDDD 010011111 SETWIDZ D/# --L- 1111111 00 L CCCC DDDDDDDDD 010100000 SETPTRX D/# --L- 1111111 00 L CCCC DDDDDDDDD 010100001 SETPTRY D/# --L- 1111111 00 L CCCC DDDDDDDDD 010100010 ADDPTRX D/# --L- 1111111 00 L CCCC DDDDDDDDD 010100011 ADDPTRY D/# --L- 1111111 00 L CCCC DDDDDDDDD 010100100 SUBPTRX D/# --L- 1111111 00 L CCCC DDDDDDDDD 010100101 SUBPTRY D/# --L- 1111111 00 L CCCC DDDDDDDDD 010100110 PASSCNT D/# (loops if (CNT - D) msb set) --L- 1111111 00 L CCCC DDDDDDDDD 010100111 WAIT D/# (waits 1+ clocks, 0 same as 1) --L- 1111111 00 L CCCC DDDDDDDDD 010101000 OFFP D/# --L- 1111111 00 L CCCC DDDDDDDDD 010101001 NOTP D/# --L- 1111111 00 L CCCC DDDDDDDDD 010101010 CLRP D/# --L- 1111111 00 L CCCC DDDDDDDDD 010101011 SETP D/# --L- 1111111 00 L CCCC DDDDDDDDD 010101100 SETPC D/# --L- 1111111 00 L CCCC DDDDDDDDD 010101101 SETPNC D/# --L- 1111111 00 L CCCC DDDDDDDDD 010101110 SETPZ D/# --L- 1111111 00 L CCCC DDDDDDDDD 010101111 SETPNZ D/# --L- 1111111 00 L CCCC DDDDDDDDD 010110000 DIV64D D/# --L- 1111111 00 L CCCC DDDDDDDDD 010110001 SQRT32 D/# --L- 1111111 00 L CCCC DDDDDDDDD 010110010 QLOG D/# --L- 1111111 00 L CCCC DDDDDDDDD 010110011 QEXP D/# --L- 1111111 00 L CCCC DDDDDDDDD 010110100 SETQI D/# --L- 1111111 00 L CCCC DDDDDDDDD 010110101 SETQZ D/# --L- 1111111 00 L CCCC DDDDDDDDD 010110110 CFGDACS D/# --L- 1111111 00 L CCCC DDDDDDDDD 010110111 SETDACS D/# --L- 1111111 00 L CCCC DDDDDDDDD 010111000 CFGDAC0 D/# --L- 1111111 00 L CCCC DDDDDDDDD 010111001 CFGDAC1 D/# --L- 1111111 00 L CCCC DDDDDDDDD 010111010 CFGDAC2 D/# --L- 1111111 00 L CCCC DDDDDDDDD 010111011 CFGDAC3 D/# --L- 1111111 00 L CCCC DDDDDDDDD 010111100 SETDAC0 D/# --L- 1111111 00 L CCCC DDDDDDDDD 010111101 SETDAC1 D/# --L- 1111111 00 L CCCC DDDDDDDDD 010111110 SETDAC2 D/# --L- 1111111 00 L CCCC DDDDDDDDD 010111111 SETDAC3 D/# --L- 1111111 00 L CCCC DDDDDDDDD 011000000 SETCTRA D/# --L- 1111111 00 L CCCC DDDDDDDDD 011000001 SETWAVA D/# --L- 1111111 00 L CCCC DDDDDDDDD 011000010 SETFRQA D/# --L- 1111111 00 L CCCC DDDDDDDDD 011000011 SETPHSA D/# --L- 1111111 00 L CCCC DDDDDDDDD 011000100 ADDPHSA D/# --L- 1111111 00 L CCCC DDDDDDDDD 011000101 SUBPHSA D/# --L- 1111111 00 L CCCC DDDDDDDDD 011000110 SETVID D/# --L- 1111111 00 L CCCC DDDDDDDDD 011000111 SETVIDY D/# --L- 1111111 00 L CCCC DDDDDDDDD 011001000 SETCTRB D/# --L- 1111111 00 L CCCC DDDDDDDDD 011001001 SETWAVB D/# --L- 1111111 00 L CCCC DDDDDDDDD 011001010 SETFRQB D/# --L- 1111111 00 L CCCC DDDDDDDDD 011001011 SETPHSB D/# --L- 1111111 00 L CCCC DDDDDDDDD 011001100 ADDPHSB D/# --L- 1111111 00 L CCCC DDDDDDDDD 011001101 SUBPHSB D/# --L- 1111111 00 L CCCC DDDDDDDDD 011001110 SETVIDI D/# --L- 1111111 00 L CCCC DDDDDDDDD 011001111 SETVIDQ D/# --L- 1111111 00 L CCCC DDDDDDDDD 011010000 SETPIX D/# --L- 1111111 00 L CCCC DDDDDDDDD 011010001 SETPIXZ D/# --L- 1111111 00 L CCCC DDDDDDDDD 011010010 SETPIXU D/# --L- 1111111 00 L CCCC DDDDDDDDD 011010011 SETPIXV D/# --L- 1111111 00 L CCCC DDDDDDDDD 011010100 SETPIXA D/# --L- 1111111 00 L CCCC DDDDDDDDD 011010101 SETPIXR D/# --L- 1111111 00 L CCCC DDDDDDDDD 011010110 SETPIXG D/# --L- 1111111 00 L CCCC DDDDDDDDD 011010111 SETPIXB D/# --L- 1111111 00 L CCCC DDDDDDDDD 011011000 SETPORA D/# --L- 1111111 00 L CCCC DDDDDDDDD 011011001 SETPORB D/# --L- 1111111 00 L CCCC DDDDDDDDD 011011010 SETPORC D/# --L- 1111111 00 L CCCC DDDDDDDDD 011011011 SETPORD D/# --L- 1111111 00 L CCCC DDDDDDDDD 011011100 PUSH D/# (pushes into task's 4-level stack) --R- 1111111 ZC 0 CCCC DDDDDDDDD 011110100 JMP D (D[31:30] into Z/C via WZ/WC for JMP..CALLYD) --R- 1111111 ZC 0 CCCC DDDDDDDDD 011110101 JMPD D --R- 1111111 ZC 0 CCCC DDDDDDDDD 011110110 CALL D --R- 1111111 ZC 0 CCCC DDDDDDDDD 011110111 CALLD D --R- 1111111 ZC 0 CCCC DDDDDDDDD 011111000 CALLA D --R- 1111111 ZC 0 CCCC DDDDDDDDD 011111001 CALLAD D --R- 1111111 ZC 0 CCCC DDDDDDDDD 011111010 CALLB D --R- 1111111 ZC 0 CCCC DDDDDDDDD 011111011 CALLBD D --R- 1111111 ZC 0 CCCC DDDDDDDDD 011111100 CALLX D --R- 1111111 ZC 0 CCCC DDDDDDDDD 011111101 CALLXD D --R- 1111111 ZC 0 CCCC DDDDDDDDD 011111110 CALLY D --R- 1111111 ZC 0 CCCC DDDDDDDDD 011111111 CALLYD D ZC-- 1111111 ZC x CCCC xxxxxxxxx 100000000 RETA ZC-- 1111111 ZC x CCCC xxxxxxxxx 100000001 RETAD ZC-- 1111111 ZC x CCCC xxxxxxxxx 100000010 RETB ZC-- 1111111 ZC x CCCC xxxxxxxxx 100000011 RETBD ZC-- 1111111 ZC x CCCC xxxxxxxxx 100000100 RETX ZC-- 1111111 ZC x CCCC xxxxxxxxx 100000101 RETXD ZC-- 1111111 ZC x CCCC xxxxxxxxx 100000110 RETY ZC-- 1111111 ZC x CCCC xxxxxxxxx 100000111 RETYD ZC-- 1111111 ZC x CCCC xxxxxxxxx 100001000 RET ZC-- 1111111 ZC x CCCC xxxxxxxxx 100001001 RETD ZC-- 1111111 ZC x CCCC xxxxxxxxx 100001010 POLCTRA (ctra-rollover into !Z/C) ZC-- 1111111 ZC x CCCC xxxxxxxxx 100001011 POLCTRB (ctra-rollover into !Z/C) ZC-- 1111111 ZC x CCCC xxxxxxxxx 100001100 POLVID (vid-ready into !Z/C) ---- 1111111 00 x CCCC xxxxxxxxx 100001101 CAPCTRA ---- 1111111 00 x CCCC xxxxxxxxx 100001110 CAPCTRB ---- 1111111 00 x CCCC xxxxxxxxx 100001111 CAPCTRS ---- 1111111 00 x CCCC xxxxxxxxx 100010000 SETPIXW ---- 1111111 00 x CCCC xxxxxxxxx 100010001 CLRACCA ---- 1111111 00 x CCCC xxxxxxxxx 100010010 CLRACCB ---- 1111111 00 x CCCC xxxxxxxxx 100010011 CLRACCS ZC-- 1111111 ZC x CCCC xxxxxxxxx 100010100 CHKPTRX ZC-- 1111111 ZC x CCCC xxxxxxxxx 100010101 CHKPTRY ---- 1111111 00 x CCCC xxxxxxxxx 100010110 SYNCTRA (waits for ctra if single-task, loops if multi-task)) ---- 1111111 00 x CCCC xxxxxxxxx 100010111 SYNCTRB (waits for ctrb if single-task, loops if multi-task)) ---- 1111111 00 x CCCC xxxxxxxxx 100011000 DCACHEX ---- 1111111 00 x CCCC xxxxxxxxx 100011001 ICACHEX ---- 1111111 00 x CCCC xxxxxxxxx 100011010 ICACHEP ---- 1111111 00 x CCCC xxxxxxxxx 100011011 ICACHEN x = don't care, use 0 ---------------------------------------------------------------------------------------------------------------------- Z effect ------------------------------------------------------------------------------------------ 0 1 wz C effect ------------------------------------------------------------------------------------------ 0 1 wc L DDDDDDDDD destination operand ------------------------------------------------------------------------------------------ 0/na DDDDDDDDD register 1 #DDDDDDDDD immediate, zero-extended I SSSSSSSSS source operand ------------------------------------------------------------------------------------------ 0/na SSSSSSSSS register 1 #SSSSSSSSS immediate, zero-extended CCCC condition (easier-to-read list) ------------------------------------------------------------------------------------------ 0000 never 1111 always (default) 0001 nc & nz 1100 if_c if_b 0010 nc & z 0011 if_nc if_ae 0011 nc 1010 if_z if_e 0100 c & nz 0101 if_nz if_ne 0101 nz 1000 if_c_and_z if_z_and_c 0110 c <> z 0100 if_c_and_nz if_nz_and_c 0111 nc | nz 0010 if_nc_and_z if_z_and_nc 1000 c & z 0001 if_nc_and_nz if_nz_and_nc if_a 1001 c = z 1110 if_c_or_z if_z_or_c if_be 1010 z 1101 if_c_or_nz if_nz_or_c 1011 nc | z 1011 if_nc_or_z if_z_or_nc 1100 c 0111 if_nc_or_nz if_nz_or_nc 1101 c | nz 1001 if_c_eq_z if_z_eq_c 1110 c | z 0110 if_c_ne_z if_z_ne_c 1111 always 0000 never CCCC inda/indb - CCCC=1111 after stage 2 of pipeline if inda/indb used (indx=inda/indb) ------------------------------------------------------------------------------------------ xx00 source indx xx01 source indx++ xx10 source indx-- xx11 source ++indx 00xx destination indx 01xx destination indx++ 10xx destination indx-- 11xx destination ++indx