Even though it doesn't say so, I assume that all of these instructions wait for a hub slot. Is that correct? Or is there some sort of data cache between to allow them to continue without waiting?
CALLA's, CALLB's, RETA's, and RETB's must wait for the next hub slot to push or pop the address as a long. There are also PUSHA/PUSHB/POPA/POPB instructions which are just aliases for WRLONG/RDLONG.
There are now 16-bit immediate jumps and calls that can be relative or absolute. The jumps and calls that end in an underscore ("_") toggle hub execution mode. If you are running in the cog, a CALL_ #address will jump to hub memory. When that routine does a RET, it will return to cog memory. It works the other way, too. A CALL or JMP without an underscore stays in the cog or hub.
The JMPSW/JMPSWD instruction can be used to switch among threads. It will store {hubmode,Z,C,PC} into D and and load {hubmode,Z,C,PC} from S. So, it tracks threads wherever they are executing. All CALLs and RETs save and restore {hubmode,Z,C,PC}. PC is 16 bits so that it can span the entire 64K longs in the 256KB hub memory.
Here is the list:
ZCDS (for D column: W=write, M=modify, R=read, L=read/immediate)
---------------------------------------------------------------------------------------------------------------------
ZCWS 0000000 ZC I CCCC DDDDDDDDD SSSSSSSSS RDBYTE D,S/PTRA/PTRB (waits for hub)
ZCWS 0000001 ZC I CCCC DDDDDDDDD SSSSSSSSS RDBYTEC D,S/PTRA/PTRB (waits for hub if cache miss)
ZCWS 0000010 ZC I CCCC DDDDDDDDD SSSSSSSSS RDWORD D,S/PTRA/PTRB (waits for hub)
ZCWS 0000011 ZC I CCCC DDDDDDDDD SSSSSSSSS RDWORDC D,S/PTRA/PTRB (waits for hub if cache miss)
ZCWS 0000100 ZC I CCCC DDDDDDDDD SSSSSSSSS RDLONG D,S/PTRA/PTRB (waits for hub)
ZCWS 0000101 ZC I CCCC DDDDDDDDD SSSSSSSSS RDLONGC D,S/PTRA/PTRB (waits for hub if cache miss)
ZCWS 0000110 ZC I CCCC DDDDDDDDD SSSSSSSSS RDAUX D,S/#0..$FF/PTRX/PTRY
ZCWS 0000111 ZC I CCCC DDDDDDDDD SSSSSSSSS RDAUXR D,S/#0..$FF/PTRX/PTRY
ZCMS 0001000 ZC I CCCC DDDDDDDDD SSSSSSSSS ISOB D,S/#
ZCMS 0001001 ZC I CCCC DDDDDDDDD SSSSSSSSS NOTB D,S/#
ZCMS 0001010 ZC I CCCC DDDDDDDDD SSSSSSSSS CLRB D,S/#
ZCMS 0001011 ZC I CCCC DDDDDDDDD SSSSSSSSS SETB D,S/#
ZCMS 0001100 ZC I CCCC DDDDDDDDD SSSSSSSSS SETBC D,S/#
ZCMS 0001101 ZC I CCCC DDDDDDDDD SSSSSSSSS SETBNC D,S/#
ZCMS 0001110 ZC I CCCC DDDDDDDDD SSSSSSSSS SETBZ D,S/#
ZCMS 0001111 ZC I CCCC DDDDDDDDD SSSSSSSSS SETBNZ D,S/#
ZCMS 0010000 ZC I CCCC DDDDDDDDD SSSSSSSSS ANDN D,S/#
ZCMS 0010001 ZC I CCCC DDDDDDDDD SSSSSSSSS AND D,S/#
ZCMS 0010010 ZC I CCCC DDDDDDDDD SSSSSSSSS OR D,S/#
ZCMS 0010011 ZC I CCCC DDDDDDDDD SSSSSSSSS XOR D,S/#
ZCMS 0010100 ZC I CCCC DDDDDDDDD SSSSSSSSS MUXC D,S/#
ZCMS 0010101 ZC I CCCC DDDDDDDDD SSSSSSSSS MUXNC D,S/#
ZCMS 0010110 ZC I CCCC DDDDDDDDD SSSSSSSSS MUXZ D,S/#
ZCMS 0010111 ZC I CCCC DDDDDDDDD SSSSSSSSS MUXNZ D,S/#
ZCMS 0011000 ZC I CCCC DDDDDDDDD SSSSSSSSS ROR D,S/#
ZCMS 0011001 ZC I CCCC DDDDDDDDD SSSSSSSSS ROL D,S/#
ZCMS 0011010 ZC I CCCC DDDDDDDDD SSSSSSSSS SHR D,S/#
ZCMS 0011011 ZC I CCCC DDDDDDDDD SSSSSSSSS SHL D,S/#
ZCMS 0011100 ZC I CCCC DDDDDDDDD SSSSSSSSS RCR D,S/#
ZCMS 0011101 ZC I CCCC DDDDDDDDD SSSSSSSSS RCL D,S/#
ZCMS 0011110 ZC I CCCC DDDDDDDDD SSSSSSSSS SAR D,S/#
ZCMS 0011111 ZC I CCCC DDDDDDDDD SSSSSSSSS REV D,S/#
ZCWS 0100000 ZC I CCCC DDDDDDDDD SSSSSSSSS MOV D,S/#
ZCWS 0100001 ZC I CCCC DDDDDDDDD SSSSSSSSS NOT D,S/#
ZCWS 0100010 ZC I CCCC DDDDDDDDD SSSSSSSSS ABS D,S/#
ZCWS 0100011 ZC I CCCC DDDDDDDDD SSSSSSSSS NEG D,S/#
ZCWS 0100100 ZC I CCCC DDDDDDDDD SSSSSSSSS NEGC D,S/#
ZCWS 0100101 ZC I CCCC DDDDDDDDD SSSSSSSSS NEGNC D,S/#
ZCWS 0100110 ZC I CCCC DDDDDDDDD SSSSSSSSS NEGZ D,S/#
ZCWS 0100111 ZC I CCCC DDDDDDDDD SSSSSSSSS NEGNZ D,S/#
ZCMS 0101000 ZC I CCCC DDDDDDDDD SSSSSSSSS ADD D,S/#
ZCMS 0101001 ZC I CCCC DDDDDDDDD SSSSSSSSS SUB D,S/#
ZCMS 0101010 ZC I CCCC DDDDDDDDD SSSSSSSSS ADDX D,S/#
ZCMS 0101011 ZC I CCCC DDDDDDDDD SSSSSSSSS SUBX D,S/#
ZCMS 0101100 ZC I CCCC DDDDDDDDD SSSSSSSSS ADDS D,S/#
ZCMS 0101101 ZC I CCCC DDDDDDDDD SSSSSSSSS SUBS D,S/#
ZCMS 0101110 ZC I CCCC DDDDDDDDD SSSSSSSSS ADDSX D,S/#
ZCMS 0101111 ZC I CCCC DDDDDDDDD SSSSSSSSS SUBSX D,S/#
ZCMS 0110000 ZC I CCCC DDDDDDDDD SSSSSSSSS SUMC D,S/#
ZCMS 0110001 ZC I CCCC DDDDDDDDD SSSSSSSSS SUMNC D,S/#
ZCMS 0110010 ZC I CCCC DDDDDDDDD SSSSSSSSS SUMZ D,S/#
ZCMS 0110011 ZC I CCCC DDDDDDDDD SSSSSSSSS SUMNZ D,S/#
ZCMS 0110100 ZC I CCCC DDDDDDDDD SSSSSSSSS MIN D,S/#
ZCMS 0110101 ZC I CCCC DDDDDDDDD SSSSSSSSS MAX D,S/#
ZCMS 0110110 ZC I CCCC DDDDDDDDD SSSSSSSSS MINS D,S/#
ZCMS 0110111 ZC I CCCC DDDDDDDDD SSSSSSSSS MAXS D,S/#
ZCMS 0111000 ZC I CCCC DDDDDDDDD SSSSSSSSS ADDABS D,S/#
ZCMS 0111001 ZC I CCCC DDDDDDDDD SSSSSSSSS SUBABS D,S/#
ZCMS 0111010 ZC I CCCC DDDDDDDDD SSSSSSSSS INCMOD D,S/#
ZCMS 0111011 ZC I CCCC DDDDDDDDD SSSSSSSSS DECMOD D,S/#
ZCMS 0111100 ZC I CCCC DDDDDDDDD SSSSSSSSS CMPSUB D,S/#
ZCMS 0111101 ZC I CCCC DDDDDDDDD SSSSSSSSS SUBR D,S/#
ZCMS 0111110 ZC I CCCC DDDDDDDDD SSSSSSSSS MUL D,S/# (waits one clock)
ZCMS 0111111 ZC I CCCC DDDDDDDDD SSSSSSSSS SCL D,S/# (waits one clock)
ZCWS 1000000 ZC I CCCC DDDDDDDDD SSSSSSSSS DECOD2 D,S/#
ZCWS 1000001 ZC I CCCC DDDDDDDDD SSSSSSSSS DECOD3 D,S/#
ZCWS 1000010 ZC I CCCC DDDDDDDDD SSSSSSSSS DECOD4 D,S/#
ZCWS 1000011 ZC I CCCC DDDDDDDDD SSSSSSSSS DECOD5 D,S/#
Z-WS 1000100 Z0 I CCCC DDDDDDDDD SSSSSSSSS ENCOD D,S/#
Z-WS 1000100 Z1 I CCCC DDDDDDDDD SSSSSSSSS BLMASK D,S/#
Z-WS 1000101 Z0 I CCCC DDDDDDDDD SSSSSSSSS ONECNT D,S/# (waits one clock)
Z-WS 1000101 Z1 I CCCC DDDDDDDDD SSSSSSSSS ZERCNT D,S/# (waits one clock)
-CWS 1000110 0C I CCCC DDDDDDDDD SSSSSSSSS INCPAT D,S/#
-CWS 1000110 1C I CCCC DDDDDDDDD SSSSSSSSS DECPAT D,S/#
--WS 1000111 00 I CCCC DDDDDDDDD SSSSSSSSS SPLITB D,S/# (also MERGEN)
--WS 1000111 01 I CCCC DDDDDDDDD SSSSSSSSS MERGEB D,S/# (also SPLITN)
--WS 1000111 10 I CCCC DDDDDDDDD SSSSSSSSS SPLITW D,S/#
--WS 1000111 11 I CCCC DDDDDDDDD SSSSSSSSS MERGEW D,S/#
--MS 10010nn n0 I CCCC DDDDDDDDD SSSSSSSSS GETNIB D,S/#,#0..7
--MS 10010nn n1 I CCCC DDDDDDDDD SSSSSSSSS SETNIB D,S/#,#0..7
--MS 1001100 n0 I CCCC DDDDDDDDD SSSSSSSSS GETWORD D,S/#,#0..1
--MS 1001100 n1 I CCCC DDDDDDDDD SSSSSSSSS SETWORD D,S/#,#0..1
--MS 1001101 00 I CCCC DDDDDDDDD SSSSSSSSS STWORDS D,S/#
--MS 1001101 01 I CCCC DDDDDDDDD SSSSSSSSS ROLNIB D,S/#
--MS 1001101 10 I CCCC DDDDDDDDD SSSSSSSSS ROLBYTE D,S/#
--MS 1001101 11 I CCCC DDDDDDDDD SSSSSSSSS ROLWORD D,S/#
--MS 1001110 00 I CCCC DDDDDDDDD SSSSSSSSS SETS D,S/#
--MS 1001110 01 I CCCC DDDDDDDDD SSSSSSSSS SETD D,S/#
--MS 1001110 10 I CCCC DDDDDDDDD SSSSSSSSS SETX D,S/#
--MS 1001110 11 I CCCC DDDDDDDDD SSSSSSSSS SETI D,S/#
-CMS 1001111 0C I CCCC DDDDDDDDD SSSSSSSSS COGNEW D,S/# (waits for hub)
-CMS 1001111 1C I CCCC DDDDDDDDD SSSSSSSSS WAITCNT D,S/# (waits for CNT, +CNTX if WC)
--MS 101000n n0 I CCCC DDDDDDDDD SSSSSSSSS GETBYTE D,S/#,#0..3
--MS 101000n n1 I CCCC DDDDDDDDD SSSSSSSSS SETBYTE D,S/#,#0..3
--WS 1010010 00 I CCCC DDDDDDDDD SSSSSSSSS STBYTES D,S/#
--MS 1010010 01 I CCCC DDDDDDDDD SSSSSSSSS SWBYTES D,S/# (switch/copy bytes in D, S = %11_10_01_00 = D same)
--MS 1010010 10 I CCCC DDDDDDDDD SSSSSSSSS PACKRGB D,S/# (S 8:8:8 -> D 5:5:5 << 16 | D >> 16)
--WS 1010010 11 I CCCC DDDDDDDDD SSSSSSSSS UNPKRGB D,S/# (S 5:5:5 -> D 8:8:8)
--MS 1010011 00 I CCCC DDDDDDDDD SSSSSSSSS ADDPIX D,S/# (waits one clock)
--MS 1010011 01 I CCCC DDDDDDDDD SSSSSSSSS MULPIX D,S/# (waits one clock)
--MS 1010011 10 I CCCC DDDDDDDDD SSSSSSSSS BLNPIX D,S/# (waits one clock)
--MS 1010011 11 I CCCC DDDDDDDDD SSSSSSSSS MIXPIX D,S/# (waits one clock)
ZCMS 1010100 ZC I CCCC DDDDDDDDD SSSSSSSSS JMPSW D,S/#
ZCMS 1010101 ZC I CCCC DDDDDDDDD SSSSSSSSS JMPSWD D,S/#
--MS 1010110 00 I CCCC DDDDDDDDD SSSSSSSSS IJZ D,S/#
--MS 1010110 01 I CCCC DDDDDDDDD SSSSSSSSS IJZD D,S/#
--MS 1010110 10 I CCCC DDDDDDDDD SSSSSSSSS IJNZ D,S/#
--MS 1010110 11 I CCCC DDDDDDDDD SSSSSSSSS IJNZD D,S/#
--MS 1010111 00 I CCCC DDDDDDDDD SSSSSSSSS DJZ D,S/#
--MS 1010111 01 I CCCC DDDDDDDDD SSSSSSSSS DJZD D,S/#
--MS 1010111 10 I CCCC DDDDDDDDD SSSSSSSSS DJNZ D,S/#
--MS 1010111 11 I CCCC DDDDDDDDD SSSSSSSSS DJNZD D,S/#
ZCRS 1011000 ZC I CCCC DDDDDDDDD SSSSSSSSS TESTB D,S/#
ZCRS 1011001 ZC I CCCC DDDDDDDDD SSSSSSSSS TESTN D,S/#
ZCRS 1011010 ZC I CCCC DDDDDDDDD SSSSSSSSS TEST D,S/#
ZCRS 1011011 ZC I CCCC DDDDDDDDD SSSSSSSSS CMP D,S/#
ZCRS 1011100 ZC I CCCC DDDDDDDDD SSSSSSSSS CMPX D,S/#
ZCRS 1011101 ZC I CCCC DDDDDDDDD SSSSSSSSS CMPS D,S/#
ZCRS 1011110 ZC I CCCC DDDDDDDDD SSSSSSSSS CMPSX D,S/#
ZCRS 1011111 ZC I CCCC DDDDDDDDD SSSSSSSSS CMPR D,S/#
--RS 11000nn n0 I CCCC DDDDDDDDD SSSSSSSSS COGINIT D,S/#,#0..7 (waits for hub) (SETNIB :coginit,cog,#6)
---S 11000nn n1 I CCCC nnnnnnnnn SSSSSSSSS WAITVID #0..$DFF,S/# (waits for vid if single-task, loops if multi-task)
--RS 1100011 11 I CCCC DDDDDDDDD SSSSSSSSS WAITVID D,S/# (waits for vid if single-task, loops if multi-task)
-CRS 110010n nC I CCCC DDDDDDDDD SSSSSSSSS WAITPEQ D,S/#,#0..3 (waits for pins, plus CNT if WC)
-CRS 110011n nC I CCCC DDDDDDDDD SSSSSSSSS WAITPNE D,S/#,#0..3 (waits for pins, plus CNT if WC)
--LS 1101000 0L I CCCC DDDDDDDDD SSSSSSSSS WRBYTE D/#,S/PTRA/PTRB (waits for hub)
--LS 1101000 1L I CCCC DDDDDDDDD SSSSSSSSS WRWORD D/#,S/PTRA/PTRB (waits for hub)
--LS 1101001 0L I CCCC DDDDDDDDD SSSSSSSSS WRLONG D/#,S/PTRA/PTRB (waits for hub)
--LS 1101001 1L I CCCC DDDDDDDDD SSSSSSSSS FRAC D/#,S/#
--LS 1101010 0L I CCCC DDDDDDDDD SSSSSSSSS WRAUX D/#,S/#0..$FF/PTRX/PTRY
--LS 1101010 1L I CCCC DDDDDDDDD SSSSSSSSS WRAUXR D/#,S/#0..$FF/PTRX/PTRY
--LS 1101011 0L I CCCC DDDDDDDDD SSSSSSSSS SETACCA D/#,S/#
--LS 1101011 1L I CCCC DDDDDDDDD SSSSSSSSS SETACCB D/#,S/#
--LS 1101100 0L I CCCC DDDDDDDDD SSSSSSSSS MACA D/#,S/#
--LS 1101100 1L I CCCC DDDDDDDDD SSSSSSSSS MACB D/#,S/#
--LS 1101101 0L I CCCC DDDDDDDDD SSSSSSSSS MUL32 D/#,S/#
--LS 1101101 1L I CCCC DDDDDDDDD SSSSSSSSS MUL32U D/#,S/#
--LS 1101110 0L I CCCC DDDDDDDDD SSSSSSSSS DIV32 D/#,S/#
--LS 1101110 1L I CCCC DDDDDDDDD SSSSSSSSS DIV32U D/#,S/#
--LS 1101111 0L I CCCC DDDDDDDDD SSSSSSSSS DIV64 D/#,S/#
--LS 1101111 1L I CCCC DDDDDDDDD SSSSSSSSS DIV64U D/#,S/#
--LS 1110000 0L I CCCC DDDDDDDDD SSSSSSSSS SQRT64 D/#,S/#
--LS 1110000 1L I CCCC DDDDDDDDD SSSSSSSSS QSINCOS D/#,S/#
--LS 1110001 0L I CCCC DDDDDDDDD SSSSSSSSS QARCTAN D/#,S/#
--LS 1110001 1L I CCCC DDDDDDDDD SSSSSSSSS QROTATE D/#,S/#
--LS 1110010 0L I CCCC DDDDDDDDD SSSSSSSSS SETSERA D/#,S/# (config,baud)
--LS 1110010 1L I CCCC DDDDDDDDD SSSSSSSSS SETSERB D/#,S/# (config,baud)
--LS 1110011 0L I CCCC DDDDDDDDD SSSSSSSSS SETCTRS D/#,S/# (ctrb,ctra)
--LS 1110011 1L I CCCC DDDDDDDDD SSSSSSSSS SETWAVS D/#,S/# (ctrb,ctra)
--LS 1110100 0L I CCCC DDDDDDDDD SSSSSSSSS SETFRQS D/#,S/# (ctrb,ctra)
--LS 1110100 1L I CCCC DDDDDDDDD SSSSSSSSS SETPHSS D/#,S/# (ctrb,ctra)
--LS 1110101 0L I CCCC DDDDDDDDD SSSSSSSSS ADDPHSS D/#,S/# (ctrb,ctra)
--LS 1110101 1L I CCCC DDDDDDDDD SSSSSSSSS SUBPHSS D/#,S/# (ctrb,ctra)
--LS 1110110 0L I CCCC DDDDDDDDD SSSSSSSSS JP D/#,S/#
--LS 1110110 1L I CCCC DDDDDDDDD SSSSSSSSS JPD D/#,S/#
--LS 1110111 0L I CCCC DDDDDDDDD SSSSSSSSS JNP D/#,S/#
--LS 1110111 1L I CCCC DDDDDDDDD SSSSSSSSS JNPD D/#,S/#
--LS 111100n nL I CCCC DDDDDDDDD SSSSSSSSS CFGPINS D/#,S/#,#0..2 (waits for alt)
--LS 1111001 1L I CCCC DDDDDDDDD SSSSSSSSS JMPTASK D/#,S/# (mode:mask,address)
--LS 1111010 0L I CCCC DDDDDDDDD SSSSSSSSS SETXFR D/#,S/#
--LS 1111010 1L I CCCC DDDDDDDDD SSSSSSSSS SETMIX D/#,S/#
--LS 1111011 0L I CCCC DDDDDDDDD SSSSSSSSS <empty> D/#,S/#
--LS 1111011 1L I CCCC DDDDDDDDD SSSSSSSSS <empty> D/#,S/#
--RS 1111100 00 I CCCC DDDDDDDDD SSSSSSSSS JZ D,S/#
--RS 1111100 01 I CCCC DDDDDDDDD SSSSSSSSS JZD D,S/#
--RS 1111100 10 I CCCC DDDDDDDDD SSSSSSSSS JNZ D,S/#
--RS 1111100 11 I CCCC DDDDDDDDD SSSSSSSSS JNZD D,S/#
---- 1111101 00 n nnnn nnnnnnnnn nnnnnnnnn AUGI #23bits (appends n to upper bits of next S or D immediate)
---- 1111101 01 0 nnnn nnnnnnnnn nnniiiiii REPS #1..$10000,#1..64
---- 1111101 01 1 BBAA ddddddddd sssssssss FIXINDA #d,#s / FIXINDB #d,#s / FIXINDS #d,#s / SETINDA #s / SETINDB #d / SETINDS #d,#s
---- 1111101 10 0 CCCC 00 nnnnnnnnnnnnnnnn JMP #abs
---- 1111101 10 0 CCCC 01 nnnnnnnnnnnnnnnn JMP_ #abs
---- 1111101 10 0 CCCC 10 nnnnnnnnnnnnnnnn JMP @rel
---- 1111101 10 0 CCCC 11 nnnnnnnnnnnnnnnn JMP_ @rel
---- 1111101 10 1 CCCC 00 nnnnnnnnnnnnnnnn JMPD #abs
---- 1111101 10 1 CCCC 01 nnnnnnnnnnnnnnnn JMPD_ #abs
---- 1111101 10 1 CCCC 10 nnnnnnnnnnnnnnnn JMPD @rel
---- 1111101 10 1 CCCC 11 nnnnnnnnnnnnnnnn JMPD_ @rel
---- 1111101 11 0 CCCC 00 nnnnnnnnnnnnnnnn CALL #abs
---- 1111101 11 0 CCCC 01 nnnnnnnnnnnnnnnn CALL_ #abs
---- 1111101 11 0 CCCC 10 nnnnnnnnnnnnnnnn CALL @rel
---- 1111101 11 0 CCCC 11 nnnnnnnnnnnnnnnn CALL_ @rel
---- 1111101 11 1 CCCC 00 nnnnnnnnnnnnnnnn CALLD #abs
---- 1111101 11 1 CCCC 01 nnnnnnnnnnnnnnnn CALLD_ #abs
---- 1111101 11 1 CCCC 10 nnnnnnnnnnnnnnnn CALLD @rel
---- 1111101 11 1 CCCC 11 nnnnnnnnnnnnnnnn CALLD_ @rel
---- 1111110 00 0 CCCC 00 nnnnnnnnnnnnnnnn CALLA #abs
---- 1111110 00 0 CCCC 01 nnnnnnnnnnnnnnnn CALLA_ #abs
---- 1111110 00 0 CCCC 10 nnnnnnnnnnnnnnnn CALLA @rel
---- 1111110 00 0 CCCC 11 nnnnnnnnnnnnnnnn CALLA_ @rel
---- 1111110 00 1 CCCC 00 nnnnnnnnnnnnnnnn CALLAD #abs
---- 1111110 00 1 CCCC 01 nnnnnnnnnnnnnnnn CALLAD_ #abs
---- 1111110 00 1 CCCC 10 nnnnnnnnnnnnnnnn CALLAD @rel
---- 1111110 00 1 CCCC 11 nnnnnnnnnnnnnnnn CALLAD_ @rel
---- 1111110 01 0 CCCC 00 nnnnnnnnnnnnnnnn CALLB #abs
---- 1111110 01 0 CCCC 01 nnnnnnnnnnnnnnnn CALLB_ #abs
---- 1111110 01 0 CCCC 10 nnnnnnnnnnnnnnnn CALLB @rel
---- 1111110 01 0 CCCC 11 nnnnnnnnnnnnnnnn CALLB_ @rel
---- 1111110 01 1 CCCC 00 nnnnnnnnnnnnnnnn CALLBD #abs
---- 1111110 01 1 CCCC 01 nnnnnnnnnnnnnnnn CALLBD_ #abs
---- 1111110 01 1 CCCC 10 nnnnnnnnnnnnnnnn CALLBD @rel
---- 1111110 01 1 CCCC 11 nnnnnnnnnnnnnnnn CALLBD_ @rel
---- 1111110 10 0 CCCC 00 nnnnnnnnnnnnnnnn CALLX #abs
---- 1111110 10 0 CCCC 01 nnnnnnnnnnnnnnnn CALLX_ #abs
---- 1111110 10 0 CCCC 10 nnnnnnnnnnnnnnnn CALLX @rel
---- 1111110 10 0 CCCC 11 nnnnnnnnnnnnnnnn CALLX_ @rel
---- 1111110 10 1 CCCC 00 nnnnnnnnnnnnnnnn CALLXD #abs
---- 1111110 10 1 CCCC 01 nnnnnnnnnnnnnnnn CALLXD_ #abs
---- 1111110 10 1 CCCC 10 nnnnnnnnnnnnnnnn CALLXD @rel
---- 1111110 10 1 CCCC 11 nnnnnnnnnnnnnnnn CALLXD_ @rel
---- 1111110 11 0 CCCC 00 nnnnnnnnnnnnnnnn CALLY #abs
---- 1111110 11 0 CCCC 01 nnnnnnnnnnnnnnnn CALLY_ #abs
---- 1111110 11 0 CCCC 10 nnnnnnnnnnnnnnnn CALLY @rel
---- 1111110 11 0 CCCC 11 nnnnnnnnnnnnnnnn CALLY_ @rel
---- 1111110 11 1 CCCC 00 nnnnnnnnnnnnnnnn CALLYD #abs
---- 1111110 11 1 CCCC 01 nnnnnnnnnnnnnnnn CALLYD_ #abs
---- 1111110 11 1 CCCC 10 nnnnnnnnnnnnnnnn CALLYD @rel
---- 1111110 11 1 CCCC 11 nnnnnnnnnnnnnnnn CALLYD_ @rel
ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000000000 COGID D (waits for hub) (doesn't write D if WC)
ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000000001 LOCKNEW D (waits for hub)
ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000000010 GETPC D
ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000000011 GETLFSR D
ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000000100 GETCNT D
ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000000101 GETCNTX D
ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000000110 GETACAL D (waits for mac)
ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000000111 GETACAH D (waits for mac)
ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000001000 GETACBL D (waits for mac)
ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000001001 GETACBH D (waits for mac)
ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000001010 GETPTRA D
ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000001011 GETPTRB D
ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000001100 GETPTRX D
ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000001101 GETPTRY D
ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000001110 SERINA D (waits for rx if single-task, loops if multi-task, releases if WC)
ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000001111 SERINB D (waits for rx if single-task, loops if multi-task, releases if WC)
ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000010000 GETMULL D (waits for mul if single-task, loops if multi-task)
ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000010001 GETMULH D (waits for mul if single-task, loops if multi-task)
ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000010010 GETDIVQ D (waits for div if single-task, loops if multi-task)
ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000010011 GETDIVR D (waits for div if single-task, loops if multi-task)
ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000010100 GETSQRT D (waits for sqrt if single-task, loops if multi-task)
ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000010101 GETQX D (waits for cordic if single-task, loops if multi-task)
ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000010110 GETQY D (waits for cordic if single-task, loops if multi-task)
ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000010111 GETQZ D (waits for cordic if single-task, loops if multi-task)
ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000011000 GETPHSA D
ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000011001 GETPHZA D (clears phsa)
ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000011010 GETCOSA D
ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000011011 GETSINA D
ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000011100 GETPHSB D
ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000011101 GETPHZB D (clears phsb)
ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000011110 GETCOSB D
ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000011111 GETSINB D
ZCM- 1111111 ZC 0 CCCC DDDDDDDDD 000100000 PUSHZC D
ZCM- 1111111 ZC 0 CCCC DDDDDDDDD 000100001 POPZC D
ZCM- 1111111 ZC 0 CCCC DDDDDDDDD 000100010 SUBCNT D (subtracts D from CNT, then CNTX if same thread)
ZCM- 1111111 ZC 0 CCCC DDDDDDDDD 000100011 GETPIX D (takes 3 clocks, needs 3 clocks per two prior stages, no condition allowed)
ZCM- 1111111 ZC 0 CCCC DDDDDDDDD 000100100 BINBCD D
ZCM- 1111111 ZC 0 CCCC DDDDDDDDD 000100101 BCDBIN D
ZCM- 1111111 ZC 0 CCCC DDDDDDDDD 000100110 BINGRY D
ZCM- 1111111 ZC 0 CCCC DDDDDDDDD 000100111 GRYBIN D (waits one clock)
ZCM- 1111111 ZC 0 CCCC DDDDDDDDD 000101000 ESWAP4 D
ZCM- 1111111 ZC 0 CCCC DDDDDDDDD 000101001 ESWAP8 D
ZCM- 1111111 ZC 0 CCCC DDDDDDDDD 000101010 SEUSSF D
ZCM- 1111111 ZC 0 CCCC DDDDDDDDD 000101011 SEUSSR D
Z-M- 1111111 ZC 0 CCCC DDDDDDDDD 000101100 INCD D (D += $200)
Z-M- 1111111 ZC 0 CCCC DDDDDDDDD 000101101 DECD D (D -= $200)
Z-M- 1111111 ZC 0 CCCC DDDDDDDDD 000101110 INCDS D (D += $201)
Z-M- 1111111 ZC 0 CCCC DDDDDDDDD 000101111 DECDS D (D -= $201)
ZCW- 1111111 ZC 0 CCCC DDDDDDDDD 000110000 POP D (pops from task's tiny stack)
--L- 1111111 00 L CCCC DDDDDDDDD 001iiiiii REPD D/#1..512,#1..64 (REPD $1FF,#1..64 = infinite repeat, can use REPD #i)
--L- 1111111 00 L CCCC DDDDDDDDD 010000000 CLKSET D/# (waits for hub)
--L- 1111111 00 L CCCC DDDDDDDDD 010000001 COGSTOP D/# (waits for hub)
-CL- 1111111 0C L CCCC DDDDDDDDD 010000010 LOCKSET D/# (waits for hub)
-CL- 1111111 0C L CCCC DDDDDDDDD 010000011 LOCKCLR D/# (waits for hub)
--L- 1111111 00 L CCCC DDDDDDDDD 010000100 LOCKRET D/# (waits for hub)
--L- 1111111 00 L CCCC DDDDDDDDD 010000101 RDWIDEC D/PTRA/PTRB (waits for hub if cache miss)
--L- 1111111 00 L CCCC DDDDDDDDD 010000110 RDWIDE D/PTRA/PTRB (waits for hub)
--L- 1111111 00 L CCCC DDDDDDDDD 010000111 WRWIDE D/PTRA/PTRB (waits for hub)
ZCL- 1111111 ZC L CCCC DDDDDDDDD 010001000 GETP D/# (pin into !Z/C via WZ/WC)
ZCL- 1111111 ZC L CCCC DDDDDDDDD 010001001 GETNP D/# (pin into Z/!C via WZ/WC)
-CL- 1111111 0C L CCCC DDDDDDDDD 010001010 SEROUTA D/# (waits for tx if single-task, loops if multi-task, releases if WC)
-CL- 1111111 0C L CCCC DDDDDDDDD 010001011 SEROUTB D/# (waits for tx if single-task, loops if multi-task, releases if WC)
-CL- 1111111 0C L CCCC DDDDDDDDD 010001100 CMPCNT D/# (subtracts D from CNT, then CNTX if same thread)
-CL- 1111111 0C L CCCC DDDDDDDDD 010001101 WAITPX D/# (waits for any edge, +CNT if WC)
-CL- 1111111 0C L CCCC DDDDDDDDD 010001110 WAITPR D/# (waits for pos edge, +CNT if WC)
-CL- 1111111 0C L CCCC DDDDDDDDD 010001111 WAITPF D/# (waits for neg edge, +CNT if WC)
ZCL- 1111111 ZC L CCCC DDDDDDDDD 010010000 SETZC D/# (D[1:0] into Z/C via WZ/WC)
--L- 1111111 00 L CCCC DDDDDDDDD 010010001 SETMAP D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010010010 SETXCH D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010010011 SETTASK D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010010100 SETRACE D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010010101 SARACCA D/# (waits for mac)
--L- 1111111 00 L CCCC DDDDDDDDD 010010110 SARACCB D/# (waits for mac)
--L- 1111111 00 L CCCC DDDDDDDDD 010010111 SARACCS D/# (waits for mac)
--L- 1111111 00 L CCCC DDDDDDDDD 010011000 SETPTRA D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010011001 SETPTRB D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010011010 ADDPTRA D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010011011 ADDPTRB D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010011100 SUBPTRA D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010011101 SUBPTRB D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010011110 SETWIDE D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010011111 SETWIDZ D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010100000 SETPTRX D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010100001 SETPTRY D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010100010 ADDPTRX D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010100011 ADDPTRY D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010100100 SUBPTRX D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010100101 SUBPTRY D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010100110 PASSCNT D/# (loops if (CNT - D) msb set)
--L- 1111111 00 L CCCC DDDDDDDDD 010100111 WAIT D/# (waits 1+ clocks, 0 same as 1)
--L- 1111111 00 L CCCC DDDDDDDDD 010101000 OFFP D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010101001 NOTP D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010101010 CLRP D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010101011 SETP D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010101100 SETPC D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010101101 SETPNC D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010101110 SETPZ D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010101111 SETPNZ D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010110000 DIV64D D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010110001 SQRT32 D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010110010 QLOG D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010110011 QEXP D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010110100 SETQI D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010110101 SETQZ D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010110110 CFGDACS D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010110111 SETDACS D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010111000 CFGDAC0 D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010111001 CFGDAC1 D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010111010 CFGDAC2 D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010111011 CFGDAC3 D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010111100 SETDAC0 D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010111101 SETDAC1 D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010111110 SETDAC2 D/#
--L- 1111111 00 L CCCC DDDDDDDDD 010111111 SETDAC3 D/#
--L- 1111111 00 L CCCC DDDDDDDDD 011000000 SETCTRA D/#
--L- 1111111 00 L CCCC DDDDDDDDD 011000001 SETWAVA D/#
--L- 1111111 00 L CCCC DDDDDDDDD 011000010 SETFRQA D/#
--L- 1111111 00 L CCCC DDDDDDDDD 011000011 SETPHSA D/#
--L- 1111111 00 L CCCC DDDDDDDDD 011000100 ADDPHSA D/#
--L- 1111111 00 L CCCC DDDDDDDDD 011000101 SUBPHSA D/#
--L- 1111111 00 L CCCC DDDDDDDDD 011000110 SETVID D/#
--L- 1111111 00 L CCCC DDDDDDDDD 011000111 SETVIDY D/#
--L- 1111111 00 L CCCC DDDDDDDDD 011001000 SETCTRB D/#
--L- 1111111 00 L CCCC DDDDDDDDD 011001001 SETWAVB D/#
--L- 1111111 00 L CCCC DDDDDDDDD 011001010 SETFRQB D/#
--L- 1111111 00 L CCCC DDDDDDDDD 011001011 SETPHSB D/#
--L- 1111111 00 L CCCC DDDDDDDDD 011001100 ADDPHSB D/#
--L- 1111111 00 L CCCC DDDDDDDDD 011001101 SUBPHSB D/#
--L- 1111111 00 L CCCC DDDDDDDDD 011001110 SETVIDI D/#
--L- 1111111 00 L CCCC DDDDDDDDD 011001111 SETVIDQ D/#
--L- 1111111 00 L CCCC DDDDDDDDD 011010000 SETPIX D/#
--L- 1111111 00 L CCCC DDDDDDDDD 011010001 SETPIXZ D/#
--L- 1111111 00 L CCCC DDDDDDDDD 011010010 SETPIXU D/#
--L- 1111111 00 L CCCC DDDDDDDDD 011010011 SETPIXV D/#
--L- 1111111 00 L CCCC DDDDDDDDD 011010100 SETPIXA D/#
--L- 1111111 00 L CCCC DDDDDDDDD 011010101 SETPIXR D/#
--L- 1111111 00 L CCCC DDDDDDDDD 011010110 SETPIXG D/#
--L- 1111111 00 L CCCC DDDDDDDDD 011010111 SETPIXB D/#
--L- 1111111 00 L CCCC DDDDDDDDD 011011000 SETPORA D/#
--L- 1111111 00 L CCCC DDDDDDDDD 011011001 SETPORB D/#
--L- 1111111 00 L CCCC DDDDDDDDD 011011010 SETPORC D/#
--L- 1111111 00 L CCCC DDDDDDDDD 011011011 SETPORD D/#
--L- 1111111 00 L CCCC DDDDDDDDD 011011100 PUSH D/# (pushes into task's tiny stack)
--R- 1111111 00 0 CCCC DDDDDDDDD 011100110 JMPREL D
--R- 1111111 00 0 CCCC DDDDDDDDD 011100111 JMPRELD D
--R- 1111111 00 0 CCCC DDDDDDDDD 011101000 JMP D
--R- 1111111 00 0 CCCC DDDDDDDDD 011101001 JMP_ D
--R- 1111111 00 0 CCCC DDDDDDDDD 011101010 JMPD D
--R- 1111111 00 0 CCCC DDDDDDDDD 011101011 JMPD_ D
--R- 1111111 00 0 CCCC DDDDDDDDD 011101100 CALL D
--R- 1111111 00 0 CCCC DDDDDDDDD 011101101 CALL_ D
--R- 1111111 00 0 CCCC DDDDDDDDD 011101110 CALLD D
--R- 1111111 00 0 CCCC DDDDDDDDD 011101111 CALLD_ D
--R- 1111111 00 0 CCCC DDDDDDDDD 011110000 CALLA D
--R- 1111111 00 0 CCCC DDDDDDDDD 011110001 CALLA_ D
--R- 1111111 00 0 CCCC DDDDDDDDD 011110010 CALLAD D
--R- 1111111 00 0 CCCC DDDDDDDDD 011110011 CALLAD_ D
--R- 1111111 00 0 CCCC DDDDDDDDD 011110100 CALLB D
--R- 1111111 00 0 CCCC DDDDDDDDD 011110101 CALLB_ D
--R- 1111111 00 0 CCCC DDDDDDDDD 011110110 CALLBD D
--R- 1111111 00 0 CCCC DDDDDDDDD 011110111 CALLBD_ D
--R- 1111111 00 0 CCCC DDDDDDDDD 011111000 CALLX D
--R- 1111111 00 0 CCCC DDDDDDDDD 011111001 CALLX_ D
--R- 1111111 00 0 CCCC DDDDDDDDD 011111010 CALLXD D
--R- 1111111 00 0 CCCC DDDDDDDDD 011111011 CALLXD_ D
--R- 1111111 00 0 CCCC DDDDDDDDD 011111100 CALLY D
--R- 1111111 00 0 CCCC DDDDDDDDD 011111101 CALLY_ D
--R- 1111111 00 0 CCCC DDDDDDDDD 011111110 CALLYD D
--R- 1111111 00 0 CCCC DDDDDDDDD 011111111 CALLYD_ D
ZC-- 1111111 ZC x CCCC xxxxxxxxx 100000000 RETA
ZC-- 1111111 ZC x CCCC xxxxxxxxx 100000001 RETAD
ZC-- 1111111 ZC x CCCC xxxxxxxxx 100000010 RETB
ZC-- 1111111 ZC x CCCC xxxxxxxxx 100000011 RETBD
ZC-- 1111111 ZC x CCCC xxxxxxxxx 100000100 RETX
ZC-- 1111111 ZC x CCCC xxxxxxxxx 100000101 RETXD
ZC-- 1111111 ZC x CCCC xxxxxxxxx 100000110 RETY
ZC-- 1111111 ZC x CCCC xxxxxxxxx 100000111 RETYD
ZC-- 1111111 ZC x CCCC xxxxxxxxx 100001000 RET
ZC-- 1111111 ZC x CCCC xxxxxxxxx 100001001 RETD
ZC-- 1111111 ZC x CCCC xxxxxxxxx 100001010 POLCTRA (ctra-rollover into !Z/C)
ZC-- 1111111 ZC x CCCC xxxxxxxxx 100001011 POLCTRB (ctra-rollover into !Z/C)
ZC-- 1111111 ZC x CCCC xxxxxxxxx 100001100 POLVID (vid-ready into !Z/C)
---- 1111111 00 x CCCC xxxxxxxxx 100001101 CAPCTRA
---- 1111111 00 x CCCC xxxxxxxxx 100001110 CAPCTRB
---- 1111111 00 x CCCC xxxxxxxxx 100001111 CAPCTRS
---- 1111111 00 x CCCC xxxxxxxxx 100010000 CACHEX
---- 1111111 00 x CCCC xxxxxxxxx 100010001 CLRACCA
---- 1111111 00 x CCCC xxxxxxxxx 100010010 CLRACCB
---- 1111111 00 x CCCC xxxxxxxxx 100010011 CLRACCS
ZC-- 1111111 ZC x CCCC xxxxxxxxx 100010100 CHKPTRX
ZC-- 1111111 ZC x CCCC xxxxxxxxx 100010101 CHKPTRY
---- 1111111 00 x CCCC xxxxxxxxx 100010110 SYNCTRA (waits for ctra if single-task, loops if multi-task))
---- 1111111 00 x CCCC xxxxxxxxx 100010111 SYNCTRB (waits for ctrb if single-task, loops if multi-task))
---- 1111111 00 x CCCC xxxxxxxxx 100011000 SETPIXW
x = don't care, use 0
---------------------------------------------------------------------------------------------------------------------
Z effect
------------------------------------------------------------------------------------------
0 <none>
1 wz
C effect
------------------------------------------------------------------------------------------
0 <none>
1 wc
L DDDDDDDDD destination operand
------------------------------------------------------------------------------------------
0/na DDDDDDDDD register
1 #DDDDDDDDD immediate, zero-extended
I SSSSSSSSS source operand
------------------------------------------------------------------------------------------
0/na SSSSSSSSS register
1 #SSSSSSSSS immediate, zero-extended
CCCC condition (easier-to-read list)
------------------------------------------------------------------------------------------
0000 never 1111 always (default)
0001 nc & nz 1100 if_c if_b
0010 nc & z 0011 if_nc if_ae
0011 nc 1010 if_z if_e
0100 c & nz 0101 if_nz if_ne
0101 nz 1000 if_c_and_z if_z_and_c
0110 c <> z 0100 if_c_and_nz if_nz_and_c
0111 nc | nz 0010 if_nc_and_z if_z_and_nc
1000 c & z 0001 if_nc_and_nz if_nz_and_nc if_a
1001 c = z 1110 if_c_or_z if_z_or_c if_be
1010 z 1101 if_c_or_nz if_nz_or_c
1011 nc | z 1011 if_nc_or_z if_z_or_nc
1100 c 0111 if_nc_or_nz if_nz_or_nc
1101 c | nz 1001 if_c_eq_z if_z_eq_c
1110 c | z 0110 if_c_ne_z if_z_ne_c
1111 always 0000 never
CCCC inda/indb - CCCC=1111 after stage 2 of pipeline if inda/indb used (indx=inda/indb)
------------------------------------------------------------------------------------------
xx00 source indx
xx01 source indx++
xx10 source indx--
xx11 source ++indx
00xx destination indx
01xx destination indx++
10xx destination indx--
11xx destination ++indx
I'm getting all these changes into PNut.exe now. It's taking a while because the assembler must be made to work in hub space, plus all the branches work differently now.
I've been thinking about all of the CALL / CALL_ pairs and am wondering if it might be better to just use the "_" bit to select directly between COG addresses and hub addresses. That way you don't need to know what mode you're in, you just need to know what kind of address you're calling. That seems easier to understand than a toggle.
Also, I notice you now have enough address space for up to 64k of COG memory! :-)
I've been thinking about all of the CALL / CALL_ pairs and am wondering if it might be better to just use the "_" bit to select directly between COG addresses and hub addresses. That way you don't need to know what mode you're in, you just need to know what kind of address you're calling. That seems easier to understand than a toggle.
Also, I notice you now have enough address space for up to 64k of COG memory! :-)
I started out thinking about different instructions for hub and cog branching and came to the notion that it was best just to toggle. And you don't need to worry about what mode you're in. It'll be obvious. If your cog code is running, you're in the cog. If your hub code is running, you're in the hub. Programs will be assembled differently, in the sense that cog code is limited to a map of 512 and hub code lives in a 64K instruction space.
The problem is that JMPSW only has a 9 bit S field and it only writes 9 PC bits into the S field of the corresponding RET instruction.
It's only 9 bits when immediate. If it's an S register, it has {hubmode,Z,C,PC[15:0]}, which is everything. The bugaboo is getting a 16-bit address called out efficiently.
By the way, JMPSW/JMPSWD writes the whole D long, not just the lower bits. The top bits are cleared to 0. So, the old subroutine_RET labeling is history.
It's only 9 bits when immediate. If it's an S register, it has {hubmode,Z,C,PC[15:0]}, which is everything. The bugaboo is getting a 16-bit address called out efficiently.
Here's a good idea. You could make P2 into a 36 bit architecture like the old DEC PDP-10. That would give you some extra bits to play with! :-)
In case you do that, please add PUSHJ and POPJ as well has the highly intuitive opcodes HLRZ and JFCL.
Here's a good idea. You could make P2 into a 36 bit architecture like the old DEC PDP-10. That would give you some extra bits to play with! :-)
In case you do that, please add PUSHJ and POPJ as well has the highly intuitive opcodes HLRZ and JFCL.
In hardware ways, it would be easy to go to 36 bits. The trouble is that you'd need to make hub memory the same if you wanted hub execution. And that wouldn't be that hard, either, but then there's the strangeness of memory not being size compatible with just about everything else in the world. Consider SDRAM, for example.
In hardware ways, it would be easy to go to 36 bits. The trouble is that you'd need to make hub memory the same if you wanted hub execution. And that wouldn't be that hard, either, but then there's the strangeness of memory not being size compatible with just about everything else in the world. Consider SDRAM, for example.
I guess PIC chips have odd sized instructions. In any case, I wasn't really serious. The PDP-10 used 7 bit bytes normally for ASCII data and had funky variable-width byte pointer instructions where a "byte" could be anywhere between 1 bit and 36 bits. While I have fond memories of the PDP-10, I think it would be a poor choice for a modern chip. It worked well for Lisp though.
HLRZ (half word left to right and zero the other half word) == car
HRRZ (half word right to right and zero the other half word) == cdr
I guess PIC chips have odd sized instructions. In any case, I wasn't really serious. The PDP-10 used 7 bit bytes normally for ASCII data and had funky variable-width byte pointer instructions where a "byte" could be anywhere between 1 bit and 36 bits. While I have fond memories of the PDP-10, I think it would be a poor choice for a modern chip. It worked well for Lisp though.
HLRZ (half word left to right and zero the other half word) == car
HRRZ (half word right to right and zero the other half word) == cdr
I don't understand HLRZ/HRRZ. Do you mean 'halve' the word? What does it do?
I don't understand HLRZ/HRRZ. Do you mean 'halve' the word? What does it do?
Memory in the PDP-10 was addressed in 36 bit words. Other than the funky byte pointers that I mentioned, there was no way to directly address anything smaller than 36 bits. The "half word" instructions manipulated the upper 18 bits of a word as the "left" half word and the lower 18 bits as the "right" half word. Since the address space was limited to 256K 36 bit words, an address fit in one of these half words so a single 36 bit word could hold two addresses. This is perfect for Lisp since a CONS consists of two parts, the CAR and the CDR. That meant that the primary Lisp data structure, the CONS, could be represented as a single PDP-10 36 bit word.
So, HLRZ would take the left 18 bits of the source address and write it to the right 18 bits of the destination address and zero the high 18 bits.
JFCL was really only used as a NOP although it stands for "Jump on Flag and Clear". The NOP happened when you didn't specify a flag.
Note: I don't recommend adding any of these instructions to the P2! :-)
Memory in the PDP-10 was addressed in 36 bit words. Other than the funky byte pointers that I mentioned, there was no way to directly address anything smaller than 36 bits. The "half word" instructions manipulated the upper 18 bits of a word as the "left" half word and the lower 18 bits as the "right" half word. Since the address space was limited to 256K 36 bit words, an address fit in one of these half words so a single 36 bit word could hold two addresses. This is perfect for Lisp since a CONS consists of two parts, the CAR and the CDR. That meant that the primary Lisp data structure, the CONS, could be represented as a single PDP-10 36 bit word.
So, HLRZ would take the left 18 bits of the source address and write it to the right 18 bits of the destination address and zero the high 18 bits.
JFCL was really only used as a NOP although it stands for "Jump on Flag and Clear". The NOP happened when you didn't specify a flag.
Note: I don't recommend adding any of these instructions to the P2! :-)
Thanks for the explanation. We could definitely do that in Prop3.
Actually, in thinking about it, something like PDP-10 byte pointers could be handy for parsing bit sequences where the fields don't align on byte boundaries. I'm not suggesting adding these instructions but they might be useful in a future chip. The PDP-10 byte pointer had a number of fields. The first was just the address of the currently addressed 36 bit word. Another field was the bit number in that 36 bit word that would be consumed next when the byte pointer was dereferenced. Lastly, there was a field indicating the number of bits in the byte being fetched. That let you walk through memory fetching bytes of an odd size like 6 or 7 bits without having to do complicated pointer arithmetic and shifting and masking to isolate the required bits. The problem was that it only worked well if all bytes in a string were the same size. I could see having a byte pointer concept where the pointer contains just the address of the next byte or long and the bit number of the next bit to consume but where the LOAD_BYTE instruction specifies the width of the byte to fetch. That way you could parse along a non-byte aligned sequence of fields and fetch fields of different widths without regard to how they are positioned in the byte or even if they cross byte boundaries.
I guess PIC chips have odd sized instructions. In any case, I wasn't really serious. The PDP-10 used 7 bit bytes normally for ASCII data and had funky variable-width byte pointer instructions where a "byte" could be anywhere between 1 bit and 36 bits. While I have fond memories of the PDP-10, I think it would be a poor choice for a modern chip. It worked well for Lisp though.
HLRZ (half word left to right and zero the other half word) == car
HRRZ (half word right to right and zero the other half word) == cdr
<semi-serious>
Please, don't!
PIC = harvard
harvard = separate data spaces / can't exec "data"
can't exec data = much harder tricks to play overlay code, HUB as software controlled L2, etc
ergo...
OMG!!! once you take that path, it ultimately leads to... integrated flash, and one more "mission to bathroom" launch window at every F10 press! :frown:
PIC = harvard
harvard = separate data spaces / can't exec "data"
can't exec data = much harder tricks to play overlay code, HUB as software controlled L2, etc
ergo...
OMG!!! once you take that path, it ultimately leads to... integrated flash, and one more "mission to bathroom" launch window at every F10 press! :frown:
harvard = EVIL!!!
</semi-serious>
Good point about separate instruction and data address spaces on the PIC. In any case, I wasn't seriously suggesting going to 36 bit instructions unless we also go to 36 bit data! :-)
Really, I wasn't serious at all. Just a silly response to Chip saying it was hard to find space for a 16 bit immediate field in 32 bit instructions. In fact, 4 bits more probably wouldn't be enough to help anyway.
Now I really need to stop yapping and get to work on implementing Chip's new instruction set in GCC!
I started out thinking about different instructions for hub and cog branching and came to the notion that it was best just to toggle. And you don't need to worry about what mode you're in. It'll be obvious. If your cog code is running, you're in the cog. If your hub code is running, you're in the hub. Programs will be assembled differently, in the sense that cog code is limited to a map of 512 and hub code lives in a 64K instruction space.
I'm not quite so sure about this. It seems like some automatic tools might not necessarily know which mode they're in when generating code. My feeling is that you're more likely to know where the destination is (in HUB or COG). Actually you need to know that anyway in order to use the toggle properly... so the advantage of the "CALLHUB/CALLCOG" versions is that you only need to know the destination mode, you don't need to know the current mode (whereas for the toggle version you need to know both).
I may have missed this somewhere. When in HUB mode, are all addresses referencing the HUB memory? Does this mean that access to PINx will require switching to COG mode?
I may have missed this somewhere. When in HUB mode, are all addresses referencing the HUB memory? Does this mean that access to PINx will require switching to COG mode?
I'm not positive about this but I think the hub vs. COG bit only applies to CALL/RET/JMP instructions, not to data references.
Lisp, Scheme, what's the difference? Just dialects of the same language. Actually, Ken Rose told me recently that he was thinking of porting Hedgehog Lisp to the P1.
Programs will be assembled differently, in the sense that cog code is limited to a map of 512 and hub code lives in a 64K instruction space.
So this would map to the bottom 256k of hub memory, for exec hub mode ?
Has the COG reach expanded > 512, or is that only on CALL ? ( If if it is only on CALLs that would make any extended COG memory the most costly place to run code ?)
A better way to gain some 'free' COG memory ? :
Is there opcode room, to move the peripheral config and control registers (SFR), to > 512 to allow all 512 as VAR/CODE space ?
Peripheral config and control registers have less useful access modes than VAR/CODE memory, which makes them expensive in memory stealing terms.
If this 'fits' there is no source code change, the assembler just senses a SFR address, and sets whichever bit determines physical space access.
So this would map to the bottom 256k of hub memory, for exec hub mode ?
Has the COG reach expanded > 512, or is that only on CALL ? ( If if it is only on CALLs that would make any extended COG memory the most costly place to run code ?)
A better way to gain some 'free' COG memory ? :
Is there opcode room, to move the peripheral config and control registers (SFR), to > 512 to allow all 512 as VAR/CODE space ?
Peripheral config and control registers have less useful access modes than VAR/CODE memory, which makes them expensive in memory stealing terms.
If this 'fits' there is no source code change, the assembler just senses a SFR address, and sets whichever bit determines physical space access.
This hub exec mode has only to do with the tasks' PCs and where they're getting instructions from. All D/S register references are to within the cog that is executing. Just the program space gets bigger and instructions can be fetched from the hub, instead of the cog, only. All the 9-bit D and S fields within the opcodes still reference the cogs internal registers. There are no more address bits for expanding cog register space. Hub exec mode is just like running normal cog code, except the 32-bit instructions are coming from the hub via a caching system, instead of from the cog RAM.
I just realized that since all instructions during hub execution come from the hub, the cog RAM instruction fetching is still going on, but it's being ignored. We could stuff some other address in the instruction-read-address of the cog RAM and get any long out of cog RAM we want. I wonder if there is something useful that can be done by repurposing the cog's internal instruction fetch. It's a free cog RAM read on every hub exec instruction.
So in hubexec mode we could have ~500 registers available, is there any way to make this useful for optimization in GCC?
C.W.
Yes, we can increase the number of registers available to GCC and we can also use some of COG memory for library functions that we want to run really fast. It won't be wasted! :-)
...
I just realized that since all instructions during hub execution come from the hub, the cog RAM instruction fetching is still going on, but it's being ignored. We could stuff some other address in the instruction-read-address of the cog RAM and get any long out of cog RAM we want. I wonder if there is something useful that can be done by repurposing the cog's internal instruction fetch. It's a free cog RAM read on every hub exec instruction.
It would need to go to some holding register, which takes the same opcode size/time to read, as any other register ?, so there seems little to be gained data-flow wise ?
But there may be DEBUG uses for this ?
If it is a read, that might include compare with PC-Val for two 'hardware' breakpoints ?
Two PC compare values would neatly fit into one 32 bit, with some simple choices of
* Break on either match
and maybe even
* Break if inside the Range
* Break if outside the Range
or a split of that 16,16 into one PC, and a pass-counter.
PC match incs the counter half, and a separate tiny thread manages almost-real-time debug.
I just realized that since all instructions during hub execution come from the hub, the cog RAM instruction fetching is still going on, but it's being ignored. We could stuff some other address in the instruction-read-address of the cog RAM and get any long out of cog RAM we want. I wonder if there is something useful that can be done by repurposing the cog's internal instruction fetch. It's a free cog RAM read on every hub exec instruction.
Comments
CALLA's, CALLB's, RETA's, and RETB's must wait for the next hub slot to push or pop the address as a long. There are also PUSHA/PUSHB/POPA/POPB instructions which are just aliases for WRLONG/RDLONG.
I'll read the .txt file carefully today.
Also, I notice you now have enough address space for up to 64k of COG memory! :-)
Or do I just not get the real problem?
Andy
Sorry for all of the questions but did the "BIG" prefix/suffix go away?
Edit: Never mind. I see it's there as "AUG". Perfect!
I started out thinking about different instructions for hub and cog branching and came to the notion that it was best just to toggle. And you don't need to worry about what mode you're in. It'll be obvious. If your cog code is running, you're in the cog. If your hub code is running, you're in the hub. Programs will be assembled differently, in the sense that cog code is limited to a map of 512 and hub code lives in a 64K instruction space.
It's only 9 bits when immediate. If it's an S register, it has {hubmode,Z,C,PC[15:0]}, which is everything. The bugaboo is getting a 16-bit address called out efficiently.
By the way, JMPSW/JMPSWD writes the whole D long, not just the lower bits. The top bits are cleared to 0. So, the old subroutine_RET labeling is history.
In case you do that, please add PUSHJ and POPJ as well has the highly intuitive opcodes HLRZ and JFCL.
In hardware ways, it would be easy to go to 36 bits. The trouble is that you'd need to make hub memory the same if you wanted hub execution. And that wouldn't be that hard, either, but then there's the strangeness of memory not being size compatible with just about everything else in the world. Consider SDRAM, for example.
Added: Dave, what do PUSHJ/POPJ/HLRZ/JFCL do?
HLRZ (half word left to right and zero the other half word) == car
HRRZ (half word right to right and zero the other half word) == cdr
I don't understand HLRZ/HRRZ. Do you mean 'halve' the word? What does it do?
So, HLRZ would take the left 18 bits of the source address and write it to the right 18 bits of the destination address and zero the high 18 bits.
JFCL was really only used as a NOP although it stands for "Jump on Flag and Clear". The NOP happened when you didn't specify a flag.
Note: I don't recommend adding any of these instructions to the P2! :-)
Thanks for the explanation. We could definitely do that in Prop3.
Anyway, not a P2 feature!
Hmm... would not
Reads:
CAR = GETWORD D,S,#1
CDR = GETWORD D,S,#0
Writes:
CAR = SETWORD D,S,#1
CDR = SETWORD D,S,#0
work???
It would probably require a second SETWORD with source #0 to clear the other half, but pretty close...
<semi-serious>
Please, don't!
PIC = harvard
harvard = separate data spaces / can't exec "data"
can't exec data = much harder tricks to play overlay code, HUB as software controlled L2, etc
ergo...
OMG!!! once you take that path, it ultimately leads to... integrated flash, and one more "mission to bathroom" launch window at every F10 press! :frown:
harvard = EVIL!!!
</semi-serious>
Really, I wasn't serious at all. Just a silly response to Chip saying it was hard to find space for a 16 bit immediate field in 32 bit instructions. In fact, 4 bits more probably wouldn't be enough to help anyway.
Now I really need to stop yapping and get to work on implementing Chip's new instruction set in GCC!
I'm not quite so sure about this. It seems like some automatic tools might not necessarily know which mode they're in when generating code. My feeling is that you're more likely to know where the destination is (in HUB or COG). Actually you need to know that anyway in order to use the toggle properly... so the advantage of the "CALLHUB/CALLCOG" versions is that you only need to know the destination mode, you don't need to know the current mode (whereas for the toggle version you need to know both).
Eric
Lisp? I'm sure I heard you say Lisp.
I'd go for Scheme.
So this would map to the bottom 256k of hub memory, for exec hub mode ?
Has the COG reach expanded > 512, or is that only on CALL ? ( If if it is only on CALLs that would make any extended COG memory the most costly place to run code ?)
A better way to gain some 'free' COG memory ? :
Is there opcode room, to move the peripheral config and control registers (SFR), to > 512 to allow all 512 as VAR/CODE space ?
Peripheral config and control registers have less useful access modes than VAR/CODE memory, which makes them expensive in memory stealing terms.
If this 'fits' there is no source code change, the assembler just senses a SFR address, and sets whichever bit determines physical space access.
This hub exec mode has only to do with the tasks' PCs and where they're getting instructions from. All D/S register references are to within the cog that is executing. Just the program space gets bigger and instructions can be fetched from the hub, instead of the cog, only. All the 9-bit D and S fields within the opcodes still reference the cogs internal registers. There are no more address bits for expanding cog register space. Hub exec mode is just like running normal cog code, except the 32-bit instructions are coming from the hub via a caching system, instead of from the cog RAM.
I just realized that since all instructions during hub execution come from the hub, the cog RAM instruction fetching is still going on, but it's being ignored. We could stuff some other address in the instruction-read-address of the cog RAM and get any long out of cog RAM we want. I wonder if there is something useful that can be done by repurposing the cog's internal instruction fetch. It's a free cog RAM read on every hub exec instruction.
C.W.
It would need to go to some holding register, which takes the same opcode size/time to read, as any other register ?, so there seems little to be gained data-flow wise ?
But there may be DEBUG uses for this ?
If it is a read, that might include compare with PC-Val for two 'hardware' breakpoints ?
Two PC compare values would neatly fit into one 32 bit, with some simple choices of
* Break on either match
and maybe even
* Break if inside the Range
* Break if outside the Range
or a split of that 16,16 into one PC, and a pass-counter.
PC match incs the counter half, and a separate tiny thread manages almost-real-time debug.
Alternate way of feeding the video engine?