The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

Bill Henning · 2014-04-17 14:15

JMP uses one long, can jump to any hub address

JMPSW can only jump to a cog address without an AUGD, which would make it be two longs - which would waste a lot of memory.

Seairth wrote: »
That's what I figured. At which point, what is the difference between:
JMP reg WC WZ
and
JMPSW $1EF, reg WC WZ

Seairth · 2014-04-17 14:28

Bill Henning wrote: »

JMP uses one long, can jump to any hub address

JMPSW can only jump to a cog address without an AUGD, which would make it be two longs - which would waste a lot of memory.

ZCWS		1011111 ZC I CCCC DDDDDDDDD SSSSSSSSS		JMPSW	D,S/@			(jump to S/@, store return address in D, WZ/WC to save/load flags)
ZCR- wr		1111111 ZC x CCCC DDDDDDDDD xxxx01000		JMP	D			(jump to D[16:0] and write {Z,C,P[16:0]} to $1EF)

It looks to me like JMP is getting its address out of the D register and JMPSW is getting its address out of the S register (ignoring the relative variant). And both write PC+1 to a register. And both use WZ/WC to save (and presumably restore) Z/C.

tonyp12 · 2014-04-17 14:50

>_RET labels at the end of subroutines, and any RET will return you.

Saves you from having to jump to the _RET label if you want to have multiple exit points in the subroutine.
Space wise it would be the same, speeds it's up a little though.

David Betz · 2014-04-17 14:53

Looks good to me. Thanks Chip!

cgracey wrote: »

Okay. Here is the tentative plan for the new chip:

New Propeller Chip - 16 April 2014

	200MHz system clock
	16 cogs with 2-clock instructions, hub execution at 50% cog speed
	512KB hub memory with 8/16/32/128 bit cog transfers
	64 smart I/O pins
	100-pin 14x14mm TQFP with exposed thermal GND pad


-- addressable cog registers
--
--	addr		read		write		name		hidden
--	-----------------------------------------------------------------------
--
--	000-1EF		RAM		RAM
--
--	1F0		CNT		-		CNT		ICACHE0
--	1F1		RND		-		RND		ICACHE0
--	1F2		INA		-		INA		ICACHE0
--	1F3		INB		-		INB		ICACHE0
--	1F4		RAM		RAM+OUTA	OUTA
--	1F5		RAM		RAM+OUTB	OUTB
--	1F6		RAM		RAM+DIRA	DIRA
--	1F7		RAM		RAM+DIRB	DIRB
--	1F8		RAM		RAM+CTRA	CTRA
--	1F9		RAM		RAM+CTRB	CTRB
--	1FA		RAM		RAM+FRQA	FRQA
--	1FB		RAM		RAM+FRQB	FRQB
--	1FC		PHSA		PHSA		PHSA		ICACHE1
--	1FD		PHSB		PHSB		PHSB		ICACHE1
--	1FE		PTRA		PTRA		PTRA		ICACHE1
--	1FF		PTRB		PTRB		PTRB		ICACHE1



ZCDS (for D column: W=write, M=modify, R=read, L=read/immediate)
----------------------------------------------------------------------------------------------------------------------

ZCWS		0000000 ZC I CCCC DDDDDDDDD SSSSSSSSS		RDBYTE	D,S/PTRA/PTRB		(waits for hub)
ZCWS		0000001 ZC I CCCC DDDDDDDDD SSSSSSSSS		RDWORD	D,S/PTRA/PTRB		(waits for hub)
ZCWS		0000010 ZC I CCCC DDDDDDDDD SSSSSSSSS		RDLONG	D,S/PTRA/PTRB		(waits for hub)
ZCWS		0000011 ZC I CCCC DDDDDDDDD SSSSSSSSS		RDQUAD	D,S/PTRA/PTRB		(waits for hub)

ZCMS		0000100 ZC I CCCC DDDDDDDDD SSSSSSSSS		SYSOP	D,S/#			(waits for hub, S/# determines four write-long enables)

ZCWS		0000101 ZC I CCCC DDDDDDDDD SSSSSSSSS		MSGIN	D,S/#			(receives message on pin, C=timeout)

ZCMS		0000110 ZC I CCCC DDDDDDDDD SSSSSSSSS		MUL	D,S/#			multiplier	(16 x 16 unsigned multiply)
ZCMS		0000111 ZC I CCCC DDDDDDDDD SSSSSSSSS		MULS	D,S/#			multiplier	(16 x 16 signed multiply)

ZCMS		0001000 ZC I CCCC DDDDDDDDD SSSSSSSSS		ISOB	D,S/#			bitop
ZCMS		0001001 ZC I CCCC DDDDDDDDD SSSSSSSSS		NOTB	D,S/#			bitop
ZCMS		0001010 ZC I CCCC DDDDDDDDD SSSSSSSSS		CLRB	D,S/#			bitop
ZCMS		0001011 ZC I CCCC DDDDDDDDD SSSSSSSSS		SETB	D,S/#			bitop
ZCMS		0001100 ZC I CCCC DDDDDDDDD SSSSSSSSS		SETBC	D,S/#			bitop
ZCMS		0001101 ZC I CCCC DDDDDDDDD SSSSSSSSS		SETBNC	D,S/#			bitop
ZCMS		0001110 ZC I CCCC DDDDDDDDD SSSSSSSSS		SETBZ	D,S/#			bitop
ZCMS		0001111 ZC I CCCC DDDDDDDDD SSSSSSSSS		SETBNZ	D,S/#			bitop

ZCMS		0010000 ZC I CCCC DDDDDDDDD SSSSSSSSS		ANDN	D,S/#			logic
ZCMS		0010001 ZC I CCCC DDDDDDDDD SSSSSSSSS		AND	D,S/#			logic
ZCMS		0010010 ZC I CCCC DDDDDDDDD SSSSSSSSS		OR	D,S/#			logic
ZCMS		0010011 ZC I CCCC DDDDDDDDD SSSSSSSSS		XOR	D,S/#			logic
ZCMS		0010100 ZC I CCCC DDDDDDDDD SSSSSSSSS		MUXC	D,S/#			logic
ZCMS		0010101 ZC I CCCC DDDDDDDDD SSSSSSSSS		MUXNC	D,S/#			logic
ZCMS		0010110 ZC I CCCC DDDDDDDDD SSSSSSSSS		MUXZ	D,S/#			logic
ZCMS		0010111 ZC I CCCC DDDDDDDDD SSSSSSSSS		MUXNZ	D,S/#			logic

ZCMS		0011000 ZC I CCCC DDDDDDDDD SSSSSSSSS		ROR	D,S/#			rotator
ZCMS		0011001 ZC I CCCC DDDDDDDDD SSSSSSSSS		ROL	D,S/#			rotator
ZCMS		0011010 ZC I CCCC DDDDDDDDD SSSSSSSSS		SHR	D,S/#			rotator
ZCMS		0011011 ZC I CCCC DDDDDDDDD SSSSSSSSS		SHL	D,S/#			rotator
ZCMS		0011100 ZC I CCCC DDDDDDDDD SSSSSSSSS		RCR	D,S/#			rotator
ZCMS		0011101 ZC I CCCC DDDDDDDDD SSSSSSSSS		RCL	D,S/#			rotator
ZCMS		0011110 ZC I CCCC DDDDDDDDD SSSSSSSSS		SAR	D,S/#			rotator
ZCMS		0011111 ZC I CCCC DDDDDDDDD SSSSSSSSS		REV	D,S/#			rotator

ZCWS		0100000 ZC I CCCC DDDDDDDDD SSSSSSSSS		MOV	D,S/#			adder
ZCWS		0100001 ZC I CCCC DDDDDDDDD SSSSSSSSS		ABS	D,S/#			adder
ZCWS		0100010 ZC I CCCC DDDDDDDDD SSSSSSSSS		ABSNEG	D,S/#			adder
ZCWS		0100011 ZC I CCCC DDDDDDDDD SSSSSSSSS		NEG	D,S/#			adder
ZCWS		0100100 ZC I CCCC DDDDDDDDD SSSSSSSSS		NEGC	D,S/#			adder
ZCWS		0100101 ZC I CCCC DDDDDDDDD SSSSSSSSS		NEGNC	D,S/#			adder
ZCWS		0100110 ZC I CCCC DDDDDDDDD SSSSSSSSS		NEGZ	D,S/#			adder
ZCWS		0100111 ZC I CCCC DDDDDDDDD SSSSSSSSS		NEGNZ	D,S/#			adder

ZCMS		0101000 ZC I CCCC DDDDDDDDD SSSSSSSSS		MIN	D,S/#			adder
ZCMS		0101001 ZC I CCCC DDDDDDDDD SSSSSSSSS		MAX	D,S/#			adder
ZCMS		0101010 ZC I CCCC DDDDDDDDD SSSSSSSSS		MINS	D,S/#			adder
ZCMS		0101011 ZC I CCCC DDDDDDDDD SSSSSSSSS		MAXS	D,S/#			adder
ZCMS		0101100 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUMC	D,S/#			adder
ZCMS		0101101 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUMNC	D,S/#			adder
ZCMS		0101110 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUMZ	D,S/#			adder
ZCMS		0101111 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUMNZ	D,S/#			adder

ZCMS		0110000 ZC I CCCC DDDDDDDDD SSSSSSSSS		ADD	D,S/#			adder
ZCMS		0110001 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUB	D,S/#			adder
ZCMS		0110010 ZC I CCCC DDDDDDDDD SSSSSSSSS		ADDS	D,S/#			adder
ZCMS		0110011 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUBS	D,S/#			adder
ZCMS		0110100 ZC I CCCC DDDDDDDDD SSSSSSSSS		ADDX	D,S/#			adder
ZCMS		0110101 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUBX	D,S/#			adder
ZCMS		0110110 ZC I CCCC DDDDDDDDD SSSSSSSSS		ADDSX	D,S/#			adder
ZCMS		0110111 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUBSX	D,S/#			adder

ZCWS		0111000 ZC I CCCC DDDDDDDDD SSSSSSSSS		NOT	D,S/#			adder
ZCMS		0111001 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUBR	D,S/#			adder
ZCMS		0111010 ZC I CCCC DDDDDDDDD SSSSSSSSS		ADDABS	D,S/#			adder
ZCMS		0111011 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUBABS	D,S/#			adder
ZCMS		0111100 ZC I CCCC DDDDDDDDD SSSSSSSSS		INCMOD	D,S/#			adder
ZCMS		0111101 ZC I CCCC DDDDDDDDD SSSSSSSSS		DECMOD	D,S/#			adder
ZCMS		0111110 ZC I CCCC DDDDDDDDD SSSSSSSSS		CMPSUB	D,S/#			adder
ZCMS		0111111 ZC I CCCC DDDDDDDDD SSSSSSSSS		WAITCNT	D,S/#			adder

ZCMS		1000000 ZC I CCCC DDDDDDDDD SSSSSSSSS		SETS	D,S/#			muxer
ZCWS		1000001 ZC I CCCC DDDDDDDDD SSSSSSSSS		GETS	D,S/#			muxer
ZCMS		1000010 ZC I CCCC DDDDDDDDD SSSSSSSSS		SETD	D,S/#			muxer
ZCWS		1000011 ZC I CCCC DDDDDDDDD SSSSSSSSS		GETD	D,S/#			muxer
ZCMS		1000100 ZC I CCCC DDDDDDDDD SSSSSSSSS		SETCOND	D,S/#			muxer
ZCWS		1000101 ZC I CCCC DDDDDDDDD SSSSSSSSS		GETCOND	D,S/#			muxer
ZCMS		1000110 ZC I CCCC DDDDDDDDD SSSSSSSSS		SETI	D,S/#			muxer
ZCWS		1000111 ZC I CCCC DDDDDDDDD SSSSSSSSS		GETI	D,S/#			muxer

--MS		100100n nn I CCCC DDDDDDDDD SSSSSSSSS		RORNIBn	D,S/#			muxer
--MS		100101n nn I CCCC DDDDDDDDD SSSSSSSSS		ROLNIBn	D,S/#			muxer
--WS		100110n nn I CCCC DDDDDDDDD SSSSSSSSS		GETNIBn	D,S/#			muxer
--MS		100111n nn I CCCC DDDDDDDDD SSSSSSSSS		SETNIBn	D,S/#			muxer

--MS		1010000 nn I CCCC DDDDDDDDD SSSSSSSSS		RORBYTn	D,S/#			muxer
--MS		1010001 nn I CCCC DDDDDDDDD SSSSSSSSS		ROLBYTn	D,S/#			muxer
--WS		1010010 nn I CCCC DDDDDDDDD SSSSSSSSS		GETBYTn	D,S/#			muxer
--MS		1010011 nn I CCCC DDDDDDDDD SSSSSSSSS		SETBYTn	D,S/#			muxer

--MS		1010100 0n I CCCC DDDDDDDDD SSSSSSSSS		RORWRDn	D,S/#			muxer
--MS		1010100 1n I CCCC DDDDDDDDD SSSSSSSSS		ROLWRDn	D,S/#			muxer
--WS		1010101 0n I CCCC DDDDDDDDD SSSSSSSSS		GETWRDn	D,S/#			muxer
--MS		1010101 1n I CCCC DDDDDDDDD SSSSSSSSS		SETWRDn	D,S/#			muxer

ZCWS		1010110 ZC I CCCC DDDDDDDDD SSSSSSSSS		ESWAP4	D,S/#			muxer
ZCWS		1010111 ZC I CCCC DDDDDDDDD SSSSSSSSS		ESWAP8	D,S/#			muxer

ZCWS		1011000 ZC I CCCC DDDDDDDDD SSSSSSSSS		SPLITW	D,S/#			muxer
ZCWS		1011001 ZC I CCCC DDDDDDDDD SSSSSSSSS		MERGEW	D,S/#			muxer

ZCMS		1011010 ZC I CCCC DDDDDDDDD SSSSSSSSS		DJZ	D,S/@			adder
ZCMS		1011011 ZC I CCCC DDDDDDDDD SSSSSSSSS		DJNZ	D,S/@			adder

ZCWS		1011100 ZC I CCCC DDDDDDDDD SSSSSSSSS		TOPBIT	D,S/#			miscellaneous
ZCWS		1011101 ZC I CCCC DDDDDDDDD SSSSSSSSS		DECOD	D,S/#
ZCMS		1011110 ZC I CCCC DDDDDDDDD SSSSSSSSS		ALTDS	D,S/#			(set up redirection for result/D/S)
ZCWS		1011111 ZC I CCCC DDDDDDDDD SSSSSSSSS		JMPSW	D,S/@			(jump to S/@, store return address in D, WZ/WC to save/load flags)

ZCRS		1100000 ZC I CCCC DDDDDDDDD SSSSSSSSS		TESTB	D,S/#			bitop	tests and compares
ZCRS		1100001 ZC I CCCC DDDDDDDDD SSSSSSSSS		TESTN	D,S/#			logic
ZCRS		1100010 ZC I CCCC DDDDDDDDD SSSSSSSSS		TEST	D,S/#			logic
ZCRS		1100011 ZC I CCCC DDDDDDDDD SSSSSSSSS		CMP	D,S/#			adder
ZCRS		1100100 ZC I CCCC DDDDDDDDD SSSSSSSSS		CMPX	D,S/#			adder
ZCRS		1100101 ZC I CCCC DDDDDDDDD SSSSSSSSS		CMPS	D,S/#			adder
ZCRS		1100110 ZC I CCCC DDDDDDDDD SSSSSSSSS		CMPSX	D,S/#			adder
ZCRS		1100111 ZC I CCCC DDDDDDDDD SSSSSSSSS		CMPR	D,S/#			adder

ZCRS		1101000 ZC I CCCC DDDDDDDDD SSSSSSSSS		TJZ	D,S/@
ZCRS		1101001 ZC I CCCC DDDDDDDDD SSSSSSSSS		TJNZ	D,S/@
ZCRS		1101010 ZC I CCCC DDDDDDDDD SSSSSSSSS		TJS	D,S/@
ZCRS		1101011 ZC I CCCC DDDDDDDDD SSSSSSSSS		TJNS	D,S/@

ZCRS		1101100 ZC I CCCC DDDDDDDDD SSSSSSSSS		-	D,S/#
ZCRS		1101101 ZC I CCCC DDDDDDDDD SSSSSSSSS		-	D,S/#
ZCRS		1101110 ZC I CCCC DDDDDDDDD SSSSSSSSS		-	D,S/#
ZCRS		1101111 ZC I CCCC DDDDDDDDD SSSSSSSSS		-	D,S/#

--LS		1110000 0L I CCCC DDDDDDDDD SSSSSSSSS		WRBYTE	D/#,S/PTRA/PTRB		(waits for hub)
--LS		1110000 1L I CCCC DDDDDDDDD SSSSSSSSS		WRWORD	D/#,S/PTRA/PTRB		(waits for hub)
--LS		1110001 0L I CCCC DDDDDDDDD SSSSSSSSS		WRLONG	D/#,S/PTRA/PTRB		(waits for hub)
--LS		1110001 1L I CCCC DDDDDDDDD SSSSSSSSS		WRQUAD	D/#,S/PTRA/PTRB		(waits for hub, zero-extends #)

--LS		1110010 0L I CCCC DDDDDDDDD SSSSSSSSS		MSGOUTA	D/#,S/#			(send message to pin(s) on OUTA)
--LS		1110010 1L I CCCC DDDDDDDDD SSSSSSSSS		MSGOUTB	D/#,S/#			(send message to pin(s) on OUTB)
--LS		1110011 0L I CCCC DDDDDDDDD SSSSSSSSS		MSGDIRA	D/#,S/#			(send message to pin(s) on DIRA)
--LS		1110011 1L I CCCC DDDDDDDDD SSSSSSSSS		MSGDIRB	D/#,S/#			(send message to pin(s) on DIRB)

--LS		1110100 0L I CCCC DDDDDDDDD SSSSSSSSS		WAITPAE	D/#,S/#			(waits for INA)
--LS		1110100 1L I CCCC DDDDDDDDD SSSSSSSSS		WAITPAN	D/#,S/#			(waits for INA)
--LS		1110101 0L I CCCC DDDDDDDDD SSSSSSSSS		WAITPBE	D/#,S/#			(waits for INB)
--LS		1110101 1L I CCCC DDDDDDDDD SSSSSSSSS		WAITPBN	D/#,S/#			(waits for INB)

--LS		1110110 0L I CCCC DDDDDDDDD SSSSSSSSS		WAITVID	D/#,S/#			(waits for video)
--LS		1110110 1L I CCCC DDDDDDDDD SSSSSSSSS		PICKZC	D/#,S/#			(always writes Z/C)
--LS		1110111 0L I CCCC DDDDDDDDD SSSSSSSSS		JP	D/#,S/@			(jump if pin IN high, pins registered at beginning of ALU cycle)
--LS		1110111 1L I CCCC DDDDDDDDD SSSSSSSSS		JNP	D/#,S/@			(jump if pin IN high, pins registered at beginning of ALU cycle)

--LS		1111000 0L I CCCC DDDDDDDDD SSSSSSSSS		REP	D/#,S/#			(begin repeat block of size D/# with S/# iterations)
--LS		1111000 1L I CCCC DDDDDDDDD SSSSSSSSS		-	D/#,S/#
--LS		1111001 0L I CCCC DDDDDDDDD SSSSSSSSS		-	D/#,S/#
--LS		1111001 1L I CCCC DDDDDDDDD SSSSSSSSS		-	D/#,S/#

--LS		1111010 0L I CCCC DDDDDDDDD SSSSSSSSS		-	D/#,S/#
--LS		1111010 1L I CCCC DDDDDDDDD SSSSSSSSS		-	D/#,S/#
--LS		1111011 0L I CCCC DDDDDDDDD SSSSSSSSS		-	D/#,S/#
--LS		1111011 1L I CCCC DDDDDDDDD SSSSSSSSS		-	D/#,S/#

--LS		1111100 0L I CCCC DDDDDDDDD SSSSSSSSS		-	D/#,S/#
--LS		1111100 1L I CCCC DDDDDDDDD SSSSSSSSS		-	D/#,S/#

----		1111101 00 n nnnn nnnnnnnnn nnnnnnnnn		AUGS	#23bits			(appends n to upper bits of next immediate S in same task)
----		1111101 01 n nnnn nnnnnnnnn nnnnnnnnn		AUGD	#23bits			(appends n to upper bits of next immediate D in same task)

----		1111101 10 0 CCCC 0 nnnnnnnnnnnnnnnnn		LOC	#abs			(write 17-bit absolute address to $1EF)
----		1111101 10 0 CCCC 1 nnnnnnnnnnnnnnnnn		LOC	@rel			(write 17-bit relative address to $1EF)
---- wr		1111101 10 1 CCCC 0 nnnnnnnnnnnnnnnnn		JMP	#abs			(jump to 17-bit absolute address and write {Z,C,P[16:0]} to $1EF)
---- wr		1111101 10 1 CCCC 1 nnnnnnnnnnnnnnnnn		JMP	@rel			(jump to 17-bit relative address and write {Z,C,P[16:0]} to $1EF)
----		1111101 11 0 CCCC 0 nnnnnnnnnnnnnnnnn		CALL	#abs			(call to 17-bit absolute address using 4-level stack)
----		1111101 11 0 CCCC 1 nnnnnnnnnnnnnnnnn		CALL	@rel			(call to 17-bit relative address using 4-level stack)
----		1111101 11 1 CCCC 0 nnnnnnnnnnnnnnnnn		CALLA	#abs			(call to 17-bit absolute address using PTRA)
----		1111101 11 1 CCCC 1 nnnnnnnnnnnnnnnnn		CALLA	@rel			(call to 17-bit relative address using PTRA)

----		1111110 00 n CCCC n nnnnnnnnnnnnnnnnn		SETPTRA	#abs			(write 19-bit absolute address to PTRA)
----		1111110 01 n CCCC n nnnnnnnnnnnnnnnnn		SETPTRA	@rel			(write 19-bit relative address to PTRA)
----		1111110 10 n CCCC n nnnnnnnnnnnnnnnnn		SETPTRB	#abs			(write 19-bit absolute address to PTRB)
----		1111110 11 n CCCC n nnnnnnnnnnnnnnnnn		SETPTRB	@rel			(write 19-bit relative address to PTRB)

--L-		1111111 00 L CCCC DDDDDDDDD xxxx00000		WAIT	D/#			(wait for some number of clocks, 0 same as 1)
--L-		1111111 00 L CCCC DDDDDDDDD xxxx00001		WAITPX	D/#			(wait for any edge on pin D/#)
--L-		1111111 00 L CCCC DDDDDDDDD xxxx00010		WAITPR	D/#			(wait for pos edge on pin D/#)
--L-		1111111 00 L CCCC DDDDDDDDD xxxx00011		WAITPF	D/#			(wait for neg edge on pin D/#)
--L-		1111111 00 L CCCC DDDDDDDDD xxxx00100		PUSH	D/#			(push D/# into 4-level stack)
--L-		1111111 00 L CCCC DDDDDDDDD xxxx00101		SETVID	D/#			(set video mode)
--L-		1111111 00 L CCCC DDDDDDDDD xxxx00110		-	D/#
--L-		1111111 00 L CCCC DDDDDDDDD xxxx00111		-	D/#
												(D[18:17] into Z/C via WZ/WC for JMP/CALL/CALLA/POP D)
ZCR- wr		1111111 ZC x CCCC DDDDDDDDD xxxx01000		JMP	D			(jump to D[16:0] and write {Z,C,P[16:0]} to $1EF)
ZCR-		1111111 ZC x CCCC DDDDDDDDD xxxx01001		CALL	D			(call to D[16:0] using 4-level stack)
ZCR-		1111111 ZC x CCCC DDDDDDDDD xxxx01010		CALLA	D			(call to D[16:0] using PTRA stack)
ZCR-		1111111 ZC x CCCC DDDDDDDDD xxxx01011		-	D
ZCR-		1111111 ZC x CCCC DDDDDDDDD xxxx01100		-	D
--R-		1111111 00 x CCCC DDDDDDDDD xxxx01101		-	D
--R-		1111111 00 x CCCC DDDDDDDDD xxxx01110		-	D
--R-		1111111 00 x CCCC DDDDDDDDD xxxx01111		-	D

ZCW-		1111111 ZC x CCCC DDDDDDDDD xxxx10000		POP	D			(pop 4-level stack into D)
--W-		1111111 00 x CCCC DDDDDDDDD xxxx10001		-	D
--W-		1111111 00 x CCCC DDDDDDDDD xxxx10010		-	D
--W-		1111111 00 x CCCC DDDDDDDDD xxxx10011		-	D
--W-		1111111 00 x CCCC DDDDDDDDD xxxx10100		-	D
--W-		1111111 00 x CCCC DDDDDDDDD xxxx10101		-	D
--W-		1111111 00 x CCCC DDDDDDDDD xxxx10110		-	D
--W-		1111111 00 x CCCC DDDDDDDDD xxxx10111		-	D

ZC--		1111111 ZC x CCCC xxxxxxxxx xxxx11000		RET				(return using 4-level stack)
ZC--		1111111 ZC x CCCC xxxxxxxxx xxxx11001		RETA				(return using PTRA stack)
ZC--		1111111 ZC x CCCC xxxxxxxxx xxxx11010		POLVID				(C = ready for WAITVID)
-C--		1111111 0C x CCCC xxxxxxxxx xxxx11011		CACHEX				(invalidate instruction cache)
----		1111111 00 x CCCC xxxxxxxxx xxxx11100		-
----		1111111 00 x CCCC xxxxxxxxx xxxx11101		-
----		1111111 00 x CCCC xxxxxxxxx xxxx11110		-
----		1111111 00 x CCCC xxxxxxxxx xxxx11111		-

----		0000000 00 0 0000 000000000 000000000		NOP

Aliases for WRLONG/RDLONG: 	PUSHA/PUSHB/POPA/POPB

Note that the JMP instructions save a return address into $1EF, so these double as the old LINK instructions.

Ariba · 2014-04-17 15:04

Heater. wrote: »

...
What is the point of this little 4 entry stack?

You can't modify the RET instruction with hubexec. On Prop1 the CALL (jmpret) had to modify the S field of the RET (also a jmpret) .

@Roy
Nested calls were possible in PASM1 just no recursions.

Andy

Heater. · 2014-04-17 15:10

@Chip & @Roy,

Thank you for the explanation. It's kind of what I was imagining.

Which leaves me with the question: WTF is it for?

With the good old JMPRET if I am calling a subroutine, which calls a subroutine, which calls a subroutine ...everything works just fine. Every subroutine call's return address gets stored at a unique address for that subroutine.

Having a stack in which to store return addresses only helps if I want to make recursive calls. Either the routine calls itself directly or something it calls calls back to it.

Such recursive calls are a rare thing in micro-controller land.

When do we do need this? Perhaps, for example, for the recursive FIBO benchmark. But then a depth of 4 is pretty much useless. Do we need the recursive FIBO bench mark?

How would a C compiler use this? How would it handle the stack overflow? How would it know if that might happen? And where do the parameters to such recursive calls go?

I'm not sure what HUB memory has to do with this. Surely a stack can be built in COG registers if need be by an PASM programmer?

So WTF is this tiny stack for actually?

Phil Pilgrim (PhiPi) · 2014-04-17 15:11

tonyp12 wrote:

Saves you from having to jump to the _RET label if you want to have multiple exit points in the subroutine.

You don't have to do that now. jmp sub_ret works just as well as, and one instruction faster than, jmp #sub_ret. In fact, if you use jmpret instead of call you can share return-address depositories with other non-nested routines, saving register space.

-Phil

Roy Eltham · 2014-04-17 15:31

Heater,
Yeah, I was thinking the wrong thing when I was trying to explain the 4 long stack. The stack based call/ret makes having multiple exit points from a function less of a hassle to worry about. I think the thinking was that having the call/ret work without having to do the self-modifying code approach of JMPRET was desirable, and it was made 4 deep because that was Chip's determination of being enough depth for cog sized pasm code.

Ariba · 2014-04-17 15:41

Heater. wrote: »

@Chip & @Roy,

Thank you for the explanation. It's kind of what I was imagining.

Which leaves me with the question: WTF is it for?

With the good old JMPRET if I am calling a subroutine, which calls a subroutine, which calls a subroutine ...everything works just fine. Every subroutine call's return address gets stored at a unique address for that subroutine.

Having a stack in which to store return addresses only helps if I want to make recursive calls. Either the routine calls itself directly or something it calls calls back to it.

Such recursive calls are a rare thing in micro-controller land.

When do we do need this? Perhaps, for example, for the recursive FIBO benchmark. But then a depth of 4 is pretty much useless. Do we need the recursive FIBO bench mark?

How would a C compiler use this? How would it handle the stack overflow? How would it know if that might happen? And where do the parameters to such recursive calls go?

I'm not sure what HUB memory has to do with this. Surely a stack can be built in COG registers if need be by an PASM programmer?

So WTF is this tiny stack for actually?

Compilers will not use the 4 level stack. PropGCC will use the JMP with the link register at $1EF and pushes that on a stack only if it is a non-leaf function.
Other Compilers will use CALLA and RETA which handle a hub-stack with PTRA as stackpointer.

For simple calls in handcrafted PASM we have the 4 level fast stack. The JMPRET way is no longer possible if you want to execute PASM also from Hub. The RET will then be in HubRam and can't be modified easy be the jumpret.

Andy

Heater. · 2014-04-17 15:43

And now Phil tells us we don't need a stack to make multiple quick exit points from a subroutine.

Which we should know already because there is P1 code around that does exactly that.

So, WTF is this tinsy, useless stack for?

Heater. · 2014-04-17 15:48

Ariba,

Thanks for that, I obviously have to think this through...

But before I do, how come LLM code in the P1 did not need this silly little stack but hubexec on the PII does?

Ariba · 2014-04-17 15:51

Heater
LMM has to handle the CALL and RET with cog-helper routines that do an indirect access with self modifying code - all very costly.
I think we don't want that to do in Hubexec mode.

Andy

Ariba · 2014-04-17 15:59

Seairth wrote: »
ZCWS		1011111 ZC I CCCC DDDDDDDDD SSSSSSSSS		JMPSW	D,S/@			(jump to S/@, store return address in D, WZ/WC to save/load flags)
ZCR- wr		1111111 ZC x CCCC DDDDDDDDD xxxx01000		JMP	D			(jump to D[16:0] and write {Z,C,P[16:0]} to $1EF)
It looks to me like JMP is getting its address out of the D register and JMPSW is getting its address out of the S register (ignoring the relative variant). And both write PC+1 to a register. And both use WZ/WC to save (and presumably restore) Z/C.

It's a bit confusing because we have 3 version of the JMP instruction: immediate17, relative17 and the register indirect.
If I understand you correct you think we don't need the register indirect version, because this can be made with a JMPSW. I think this will work if we use a read-only register in the D field of jmpsw, for example the CNT or RND register.

Andy

Heater. · 2014-04-17 16:07

Ariba,

Compilers will not use the 4 level stack. PropGCC will use the JMP with the link register at $1EF and pushes that on a stack only if it is a non-leaf function.

and

LMM has to handle the CALL and RET with cog-helper routines that do an indirect access with self modifying code - all very costly.
I think we don't want that to do in Hubexec mode.

So we need a four level stack ... but it's not going to be used.

So, tell me again, very slowly, why do we need it?

Bill Henning · 2014-04-17 16:23

If Chip leaves it in, it is great for cog code.

I'd use it.

I always intensely disliked the

sub blah
       blah
sub_ret ret

business.

The 4 level LIFO stack also allows generic subroutines that can be called regardless of living in cog or hub space.

You can use JMPSW as if it was JMPRET if you insist, but that only works for cog-only code.

using the LR at $1EF for cog only code is a pain, as you'd have to save it before calling another routine.

Heater. wrote: »

Ariba,

Compilers will not use the 4 level stack. PropGCC will use the JMP with the link register at $1EF and pushes that on a stack only if it is a non-leaf function.

and

So we need a four level stack ... but it's not going to be used.

So, tell me again, very slowly, why do we need it?

Ariba · 2014-04-17 16:25

Heater. wrote: »

Ariba,

Compilers will not use the 4 level stack. PropGCC will use the JMP with the link register at $1EF and pushes that on a stack only if it is a non-leaf function.

and

So we need a four level stack ... but it's not going to be used.

So, tell me again, very slowly, why do we need it?

Okay slowly:

W e _ n e e d _ i t _ f o r _ f a s t _ c a l l s _ i n _ p u r e _ P A S M _ c o d e _ t h a t _ c a n _ r u n _ f r o m _ c o g r a m _ o r _ h u b r a m .

Perhaps you still think that hubexec will not get implemented on the P1+, but I think the current state is that Chip will implement it. Perhaps you don't understand that the P1+ uses exactly the same instructions for cogexec and hubexec, so you can run the same code from both memories. Or perhaps you just got up on the wrong side of the bed today.

Andy

Heater. · 2014-04-17 16:31

Bill,

I always intensely disliked the...

Well, yeah, but it's a minor inconvenience and has no impact on performance.

It also allows generic subroutines that can be called regardless of living in cog or hub space.

Well, yeah, but is propgcc or anybody ever going to use that?

I'm still not convinced this silly little 4 slot stack is any worth yet.

evanh · 2014-04-17 16:33

It's looking like the hardware stack's real advantage is not that it has more than one level or that it is fast but that it is spare for handcrafted use because the compiler isn't using it.

jmg · 2014-04-17 16:37

Ariba wrote: »

Okay slowly:

W e _ n e e d _ i t _ f o r _ f a s t _ c a l l s _ i n _ p u r e _ P A S M _ c o d e _ t h a t _ c a n _ r u n _ f r o m _ c o g r a m _ o r _ h u b r a m .

Perhaps you still think that hubexec will not get implemented on the P1+, but I think the current state is that Chip will implement it. Perhaps you don't understand that the P1+ uses exactly the same instructions for cogexec and hubexec, so you can run the same code from both memories.

Makes good sense to me, and Chip has already decided it is worth the effort to include.

Of course, anyone who wants to, is free to ignore it.

ctwardell · 2014-04-17 16:38

Heater. wrote: »

I'm still not convinced this silly little 4 slot stack is any worth yet.

I agree, Chip should expand it to at least 42 slots, otherwise any robots built with it might be prone to depression.

C.W.

jmg · 2014-04-17 16:40

evanh wrote: »

It's looking like the hardware stack's real advantage is not that it has more than one level or that it is fast but that it is spare for handcrafted use because the compiler isn't using it.

Compilers do not yet use it, and this sounds a compelling enough case :
..uses exactly the same instructions for cogexec and hubexec, so you can run the same code from both memories.

Bill Henning · 2014-04-17 16:42

I tend to write some fairly cool pasm stuff, and I will gladly use it - FAR prefer it to jmpret.

Heater. wrote: »

Bill,

Well, yeah, but it's a minor inconvenience and has no impact on performance.

Well, yeah, but is propgcc or anybody ever going to use that?

I'm still not convinced this silly little 4 slot stack is any worth yet.

evanh · 2014-04-17 16:43

jmg wrote: »

Compilers do not yet use it, and this sounds a compelling enough case :
..uses exactly the same instructions for cogexec and hubexec, so you can run the same code from both memories.

The moment a compiler is allowed to use it then it's "speed" advantage vanishes due you having to conform to the compiler's stack management.

Heater. · 2014-04-17 16:47

Ariba,
You should know me by now. I get up on the wrong side of the bed everyday

W e _ n e e d _ i t _ f o r _ f a s t _ c a l l s _ i n _ p u r e _ P A S M _ c o d e _ t h a t _ c a n _ r u n _ f r o m _ c o g r a m _ o r _ h u b r a m .

and

Perhaps you don't understand that the P1+ uses exactly the same instructions for cogexec and hubexec, so you can run the same code from both memories.

OK. That sounds great...but...

When would you ever need to be able to place the exact same code in HUB or COG?

Why is it necessary for HUB resident code to look exactly like COG resident code?

Who is ever going to make use of that idea?

We have already determined that compilers will not.

My FFT in C can be compiled for the P1 with or without FCACHE. It's inner loops can be run from HUB or loaded to COG and run. I have no idea how the code is compiled differently in each case. Neither do I care. All I know is that when compiled with FCACHE it runs nearly as fast as my hand crafted PASM version!

So WhyTF do we need this little stack again?

Bill Henning · 2014-04-17 16:48

generic comment (not aimed at heater or anyone in particular)

Where is this "if it is not used by gcc turf it" attitude coming from?

Adding some capabilities to make life easier for some compiler is great, even if assembly language programmers don't need/want it.

Why is adding capabilities that makes life easier/better/faster for assembly language programmers seen as evil by some?

I find it more than a bit hyporcritical.

The best, tightest, most amazing pieces of code will be in assembly language.

You don't want to use a capability? Don't use it.

You don't need it? Don't use it.

Kindly don't try to tell the rest of us not to use it.

Same goes for helper instructions that reduce memory requirements (ie using a single long, instead of 2/3..) for compiler generated code.

512KB is not infinite, and I'd rather have more ram left for arrays, data, and display buffers than wasting it on two instructions - where a helper could use one.

Ditto for tasks if Chip adds them.

You find them too complicated? Don't use them. Don't try to stop me from using them (if Chip puts them in)

You prefer co-operative multi-tasking? Great! Go for it! But don't try to stop me from using tasks (if Chip puts them in)

Heater. · 2014-04-17 16:51

jmg,

Compilers do not yet use it,

I might be wrong but my bet is that compilers will never use it because they will already be managing a stack of their own.

Bill Henning · 2014-04-17 16:52

str*() functions, mem*() functions etc.

All small enough to run in a cog based library (flib) or from the hub if cog space is not available.

I love the idea of being able to have small routines callable regardless of where they live.

(as an aside, if we had fast aux or cog stacks, I would not resist getting rid of the LIFO)

Heater. wrote: »

Ariba,
You should know me by now. I get up on the wrong side of the bed everyday

and

OK. That sounds great...but...

When would you ever need to be able to place the exact same code in HUB or COG?

Why is it necessary for HUB resident code to look exactly like COG resident code?

Who is ever going to make use of that idea?

We have already determined that compilers will not.

My FFT in C can be compiled for the P1 with or without FCACHE. It's inner loops can be run from HUB or loaded to COG and run. I have no idea how the code is compiled differently in each case. Neither do I care. All I know is that when compiled with FCACHE it runs nearly as fast as my hand crafted PASM version!

So WhyTF do we need this little stack again?

evanh · 2014-04-17 16:59

Bill/Ariba, you've sold me. It took a while to come to terms with the fact that it's really the separation, an independent stack that doesn't have any software management, that is desired. A second separate LR would be almost as good.

Phil Pilgrim (PhiPi) · 2014-04-17 17:02

I guess I always envisioned an in-cog, stack-based call/return as something like this:

call == jmpret sp--,#dest
retn == jmp ++sp

That way the stack could start wherever you want and be as deep as you want it to be. sp may, by necessity, have to be a special register, since there are no extra bits to indicate post-decrement or pre-increment. And jmpret maintains its original flavor when the destination register is other than an SFR; similarly for jmp.

A fixed, four-deep stack does seem a bit on stingy side. OTOH, I can't readily come up with an example from my own PASM code where it would have been inadequate. (And I'm not about to write a Perl script to do that kind of "in-depth" -- pardon the pun -- analysis.)

Disclaimer: I haven't followed this thread closely enough to know whether an idea like this has already been hashed out and discarded. If so, in the words of Emily Litella,

-Phil

jmg · 2014-04-17 17:02

Heater. wrote: »

OK. That sounds great...but...

When would you ever need to be able to place the exact same code in HUB or COG?

Why is it necessary for HUB resident code to look exactly like COG resident code?

Who is ever going to make use of that idea?

We have already determined that compilers will not.

?? Libraries are one obvious area where all the features you claim have no use, will be very useful.

I'm still missing the point of all your posts, Chip already has this in there - if it bothers you so much, just ignore it.

The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

Comments