Shop OBEX P1 Docs P2 Docs Learn Events
The New 16-Cog, 512KB, 64 analog I/O Propeller Chip - Page 38 — Parallax Forums

The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

13536384041144

Comments

  • Bill HenningBill Henning Posts: 6,445
    edited 2014-04-17 14:15
    JMP uses one long, can jump to any hub address

    JMPSW can only jump to a cog address without an AUGD, which would make it be two longs - which would waste a lot of memory.
    Seairth wrote: »
    That's what I figured. At which point, what is the difference between:
    JMP reg WC WZ
    

    and
    JMPSW $1EF, reg WC WZ
    
  • SeairthSeairth Posts: 2,474
    edited 2014-04-17 14:28
    JMP uses one long, can jump to any hub address

    JMPSW can only jump to a cog address without an AUGD, which would make it be two longs - which would waste a lot of memory.

    ZCWS		1011111 ZC I CCCC DDDDDDDDD SSSSSSSSS		JMPSW	D,S/@			(jump to S/@, store return address in D, WZ/WC to save/load flags)
    ZCR- wr		1111111 ZC x CCCC DDDDDDDDD xxxx01000		JMP	D			(jump to D[16:0] and write {Z,C,P[16:0]} to $1EF)
    

    It looks to me like JMP is getting its address out of the D register and JMPSW is getting its address out of the S register (ignoring the relative variant). And both write PC+1 to a register. And both use WZ/WC to save (and presumably restore) Z/C.
  • tonyp12tonyp12 Posts: 1,951
    edited 2014-04-17 14:50
    >_RET labels at the end of subroutines, and any RET will return you.

    Saves you from having to jump to the _RET label if you want to have multiple exit points in the subroutine.
    Space wise it would be the same, speeds it's up a little though.
  • David BetzDavid Betz Posts: 14,516
    edited 2014-04-17 14:53
    Looks good to me. Thanks Chip!
    cgracey wrote: »
    Okay. Here is the tentative plan for the new chip:
    New Propeller Chip - 16 April 2014
    
    	200MHz system clock
    	16 cogs with 2-clock instructions, hub execution at 50% cog speed
    	512KB hub memory with 8/16/32/128 bit cog transfers
    	64 smart I/O pins
    	100-pin 14x14mm TQFP with exposed thermal GND pad
    
    
    -- addressable cog registers
    --
    --	addr		read		write		name		hidden
    --	-----------------------------------------------------------------------
    --
    --	000-1EF		RAM		RAM
    --
    --	1F0		CNT		-		CNT		ICACHE0
    --	1F1		RND		-		RND		ICACHE0
    --	1F2		INA		-		INA		ICACHE0
    --	1F3		INB		-		INB		ICACHE0
    --	1F4		RAM		RAM+OUTA	OUTA
    --	1F5		RAM		RAM+OUTB	OUTB
    --	1F6		RAM		RAM+DIRA	DIRA
    --	1F7		RAM		RAM+DIRB	DIRB
    --	1F8		RAM		RAM+CTRA	CTRA
    --	1F9		RAM		RAM+CTRB	CTRB
    --	1FA		RAM		RAM+FRQA	FRQA
    --	1FB		RAM		RAM+FRQB	FRQB
    --	1FC		PHSA		PHSA		PHSA		ICACHE1
    --	1FD		PHSB		PHSB		PHSB		ICACHE1
    --	1FE		PTRA		PTRA		PTRA		ICACHE1
    --	1FF		PTRB		PTRB		PTRB		ICACHE1
    
    
    
    ZCDS (for D column: W=write, M=modify, R=read, L=read/immediate)
    ----------------------------------------------------------------------------------------------------------------------
    
    ZCWS		0000000 ZC I CCCC DDDDDDDDD SSSSSSSSS		RDBYTE	D,S/PTRA/PTRB		(waits for hub)
    ZCWS		0000001 ZC I CCCC DDDDDDDDD SSSSSSSSS		RDWORD	D,S/PTRA/PTRB		(waits for hub)
    ZCWS		0000010 ZC I CCCC DDDDDDDDD SSSSSSSSS		RDLONG	D,S/PTRA/PTRB		(waits for hub)
    ZCWS		0000011 ZC I CCCC DDDDDDDDD SSSSSSSSS		RDQUAD	D,S/PTRA/PTRB		(waits for hub)
    
    ZCMS		0000100 ZC I CCCC DDDDDDDDD SSSSSSSSS		SYSOP	D,S/#			(waits for hub, S/# determines four write-long enables)
    
    ZCWS		0000101 ZC I CCCC DDDDDDDDD SSSSSSSSS		MSGIN	D,S/#			(receives message on pin, C=timeout)
    
    ZCMS		0000110 ZC I CCCC DDDDDDDDD SSSSSSSSS		MUL	D,S/#			multiplier	(16 x 16 unsigned multiply)
    ZCMS		0000111 ZC I CCCC DDDDDDDDD SSSSSSSSS		MULS	D,S/#			multiplier	(16 x 16 signed multiply)
    
    ZCMS		0001000 ZC I CCCC DDDDDDDDD SSSSSSSSS		ISOB	D,S/#			bitop
    ZCMS		0001001 ZC I CCCC DDDDDDDDD SSSSSSSSS		NOTB	D,S/#			bitop
    ZCMS		0001010 ZC I CCCC DDDDDDDDD SSSSSSSSS		CLRB	D,S/#			bitop
    ZCMS		0001011 ZC I CCCC DDDDDDDDD SSSSSSSSS		SETB	D,S/#			bitop
    ZCMS		0001100 ZC I CCCC DDDDDDDDD SSSSSSSSS		SETBC	D,S/#			bitop
    ZCMS		0001101 ZC I CCCC DDDDDDDDD SSSSSSSSS		SETBNC	D,S/#			bitop
    ZCMS		0001110 ZC I CCCC DDDDDDDDD SSSSSSSSS		SETBZ	D,S/#			bitop
    ZCMS		0001111 ZC I CCCC DDDDDDDDD SSSSSSSSS		SETBNZ	D,S/#			bitop
    
    ZCMS		0010000 ZC I CCCC DDDDDDDDD SSSSSSSSS		ANDN	D,S/#			logic
    ZCMS		0010001 ZC I CCCC DDDDDDDDD SSSSSSSSS		AND	D,S/#			logic
    ZCMS		0010010 ZC I CCCC DDDDDDDDD SSSSSSSSS		OR	D,S/#			logic
    ZCMS		0010011 ZC I CCCC DDDDDDDDD SSSSSSSSS		XOR	D,S/#			logic
    ZCMS		0010100 ZC I CCCC DDDDDDDDD SSSSSSSSS		MUXC	D,S/#			logic
    ZCMS		0010101 ZC I CCCC DDDDDDDDD SSSSSSSSS		MUXNC	D,S/#			logic
    ZCMS		0010110 ZC I CCCC DDDDDDDDD SSSSSSSSS		MUXZ	D,S/#			logic
    ZCMS		0010111 ZC I CCCC DDDDDDDDD SSSSSSSSS		MUXNZ	D,S/#			logic
    
    ZCMS		0011000 ZC I CCCC DDDDDDDDD SSSSSSSSS		ROR	D,S/#			rotator
    ZCMS		0011001 ZC I CCCC DDDDDDDDD SSSSSSSSS		ROL	D,S/#			rotator
    ZCMS		0011010 ZC I CCCC DDDDDDDDD SSSSSSSSS		SHR	D,S/#			rotator
    ZCMS		0011011 ZC I CCCC DDDDDDDDD SSSSSSSSS		SHL	D,S/#			rotator
    ZCMS		0011100 ZC I CCCC DDDDDDDDD SSSSSSSSS		RCR	D,S/#			rotator
    ZCMS		0011101 ZC I CCCC DDDDDDDDD SSSSSSSSS		RCL	D,S/#			rotator
    ZCMS		0011110 ZC I CCCC DDDDDDDDD SSSSSSSSS		SAR	D,S/#			rotator
    ZCMS		0011111 ZC I CCCC DDDDDDDDD SSSSSSSSS		REV	D,S/#			rotator
    
    ZCWS		0100000 ZC I CCCC DDDDDDDDD SSSSSSSSS		MOV	D,S/#			adder
    ZCWS		0100001 ZC I CCCC DDDDDDDDD SSSSSSSSS		ABS	D,S/#			adder
    ZCWS		0100010 ZC I CCCC DDDDDDDDD SSSSSSSSS		ABSNEG	D,S/#			adder
    ZCWS		0100011 ZC I CCCC DDDDDDDDD SSSSSSSSS		NEG	D,S/#			adder
    ZCWS		0100100 ZC I CCCC DDDDDDDDD SSSSSSSSS		NEGC	D,S/#			adder
    ZCWS		0100101 ZC I CCCC DDDDDDDDD SSSSSSSSS		NEGNC	D,S/#			adder
    ZCWS		0100110 ZC I CCCC DDDDDDDDD SSSSSSSSS		NEGZ	D,S/#			adder
    ZCWS		0100111 ZC I CCCC DDDDDDDDD SSSSSSSSS		NEGNZ	D,S/#			adder
    
    ZCMS		0101000 ZC I CCCC DDDDDDDDD SSSSSSSSS		MIN	D,S/#			adder
    ZCMS		0101001 ZC I CCCC DDDDDDDDD SSSSSSSSS		MAX	D,S/#			adder
    ZCMS		0101010 ZC I CCCC DDDDDDDDD SSSSSSSSS		MINS	D,S/#			adder
    ZCMS		0101011 ZC I CCCC DDDDDDDDD SSSSSSSSS		MAXS	D,S/#			adder
    ZCMS		0101100 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUMC	D,S/#			adder
    ZCMS		0101101 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUMNC	D,S/#			adder
    ZCMS		0101110 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUMZ	D,S/#			adder
    ZCMS		0101111 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUMNZ	D,S/#			adder
    
    ZCMS		0110000 ZC I CCCC DDDDDDDDD SSSSSSSSS		ADD	D,S/#			adder
    ZCMS		0110001 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUB	D,S/#			adder
    ZCMS		0110010 ZC I CCCC DDDDDDDDD SSSSSSSSS		ADDS	D,S/#			adder
    ZCMS		0110011 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUBS	D,S/#			adder
    ZCMS		0110100 ZC I CCCC DDDDDDDDD SSSSSSSSS		ADDX	D,S/#			adder
    ZCMS		0110101 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUBX	D,S/#			adder
    ZCMS		0110110 ZC I CCCC DDDDDDDDD SSSSSSSSS		ADDSX	D,S/#			adder
    ZCMS		0110111 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUBSX	D,S/#			adder
    
    ZCWS		0111000 ZC I CCCC DDDDDDDDD SSSSSSSSS		NOT	D,S/#			adder
    ZCMS		0111001 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUBR	D,S/#			adder
    ZCMS		0111010 ZC I CCCC DDDDDDDDD SSSSSSSSS		ADDABS	D,S/#			adder
    ZCMS		0111011 ZC I CCCC DDDDDDDDD SSSSSSSSS		SUBABS	D,S/#			adder
    ZCMS		0111100 ZC I CCCC DDDDDDDDD SSSSSSSSS		INCMOD	D,S/#			adder
    ZCMS		0111101 ZC I CCCC DDDDDDDDD SSSSSSSSS		DECMOD	D,S/#			adder
    ZCMS		0111110 ZC I CCCC DDDDDDDDD SSSSSSSSS		CMPSUB	D,S/#			adder
    ZCMS		0111111 ZC I CCCC DDDDDDDDD SSSSSSSSS		WAITCNT	D,S/#			adder
    
    ZCMS		1000000 ZC I CCCC DDDDDDDDD SSSSSSSSS		SETS	D,S/#			muxer
    ZCWS		1000001 ZC I CCCC DDDDDDDDD SSSSSSSSS		GETS	D,S/#			muxer
    ZCMS		1000010 ZC I CCCC DDDDDDDDD SSSSSSSSS		SETD	D,S/#			muxer
    ZCWS		1000011 ZC I CCCC DDDDDDDDD SSSSSSSSS		GETD	D,S/#			muxer
    ZCMS		1000100 ZC I CCCC DDDDDDDDD SSSSSSSSS		SETCOND	D,S/#			muxer
    ZCWS		1000101 ZC I CCCC DDDDDDDDD SSSSSSSSS		GETCOND	D,S/#			muxer
    ZCMS		1000110 ZC I CCCC DDDDDDDDD SSSSSSSSS		SETI	D,S/#			muxer
    ZCWS		1000111 ZC I CCCC DDDDDDDDD SSSSSSSSS		GETI	D,S/#			muxer
    
    --MS		100100n nn I CCCC DDDDDDDDD SSSSSSSSS		RORNIBn	D,S/#			muxer
    --MS		100101n nn I CCCC DDDDDDDDD SSSSSSSSS		ROLNIBn	D,S/#			muxer
    --WS		100110n nn I CCCC DDDDDDDDD SSSSSSSSS		GETNIBn	D,S/#			muxer
    --MS		100111n nn I CCCC DDDDDDDDD SSSSSSSSS		SETNIBn	D,S/#			muxer
    
    --MS		1010000 nn I CCCC DDDDDDDDD SSSSSSSSS		RORBYTn	D,S/#			muxer
    --MS		1010001 nn I CCCC DDDDDDDDD SSSSSSSSS		ROLBYTn	D,S/#			muxer
    --WS		1010010 nn I CCCC DDDDDDDDD SSSSSSSSS		GETBYTn	D,S/#			muxer
    --MS		1010011 nn I CCCC DDDDDDDDD SSSSSSSSS		SETBYTn	D,S/#			muxer
    
    --MS		1010100 0n I CCCC DDDDDDDDD SSSSSSSSS		RORWRDn	D,S/#			muxer
    --MS		1010100 1n I CCCC DDDDDDDDD SSSSSSSSS		ROLWRDn	D,S/#			muxer
    --WS		1010101 0n I CCCC DDDDDDDDD SSSSSSSSS		GETWRDn	D,S/#			muxer
    --MS		1010101 1n I CCCC DDDDDDDDD SSSSSSSSS		SETWRDn	D,S/#			muxer
    
    ZCWS		1010110 ZC I CCCC DDDDDDDDD SSSSSSSSS		ESWAP4	D,S/#			muxer
    ZCWS		1010111 ZC I CCCC DDDDDDDDD SSSSSSSSS		ESWAP8	D,S/#			muxer
    
    ZCWS		1011000 ZC I CCCC DDDDDDDDD SSSSSSSSS		SPLITW	D,S/#			muxer
    ZCWS		1011001 ZC I CCCC DDDDDDDDD SSSSSSSSS		MERGEW	D,S/#			muxer
    
    ZCMS		1011010 ZC I CCCC DDDDDDDDD SSSSSSSSS		DJZ	D,S/@			adder
    ZCMS		1011011 ZC I CCCC DDDDDDDDD SSSSSSSSS		DJNZ	D,S/@			adder
    
    ZCWS		1011100 ZC I CCCC DDDDDDDDD SSSSSSSSS		TOPBIT	D,S/#			miscellaneous
    ZCWS		1011101 ZC I CCCC DDDDDDDDD SSSSSSSSS		DECOD	D,S/#
    ZCMS		1011110 ZC I CCCC DDDDDDDDD SSSSSSSSS		ALTDS	D,S/#			(set up redirection for result/D/S)
    ZCWS		1011111 ZC I CCCC DDDDDDDDD SSSSSSSSS		JMPSW	D,S/@			(jump to S/@, store return address in D, WZ/WC to save/load flags)
    
    ZCRS		1100000 ZC I CCCC DDDDDDDDD SSSSSSSSS		TESTB	D,S/#			bitop	tests and compares
    ZCRS		1100001 ZC I CCCC DDDDDDDDD SSSSSSSSS		TESTN	D,S/#			logic
    ZCRS		1100010 ZC I CCCC DDDDDDDDD SSSSSSSSS		TEST	D,S/#			logic
    ZCRS		1100011 ZC I CCCC DDDDDDDDD SSSSSSSSS		CMP	D,S/#			adder
    ZCRS		1100100 ZC I CCCC DDDDDDDDD SSSSSSSSS		CMPX	D,S/#			adder
    ZCRS		1100101 ZC I CCCC DDDDDDDDD SSSSSSSSS		CMPS	D,S/#			adder
    ZCRS		1100110 ZC I CCCC DDDDDDDDD SSSSSSSSS		CMPSX	D,S/#			adder
    ZCRS		1100111 ZC I CCCC DDDDDDDDD SSSSSSSSS		CMPR	D,S/#			adder
    
    ZCRS		1101000 ZC I CCCC DDDDDDDDD SSSSSSSSS		TJZ	D,S/@
    ZCRS		1101001 ZC I CCCC DDDDDDDDD SSSSSSSSS		TJNZ	D,S/@
    ZCRS		1101010 ZC I CCCC DDDDDDDDD SSSSSSSSS		TJS	D,S/@
    ZCRS		1101011 ZC I CCCC DDDDDDDDD SSSSSSSSS		TJNS	D,S/@
    
    ZCRS		1101100 ZC I CCCC DDDDDDDDD SSSSSSSSS		-	D,S/#
    ZCRS		1101101 ZC I CCCC DDDDDDDDD SSSSSSSSS		-	D,S/#
    ZCRS		1101110 ZC I CCCC DDDDDDDDD SSSSSSSSS		-	D,S/#
    ZCRS		1101111 ZC I CCCC DDDDDDDDD SSSSSSSSS		-	D,S/#
    
    --LS		1110000 0L I CCCC DDDDDDDDD SSSSSSSSS		WRBYTE	D/#,S/PTRA/PTRB		(waits for hub)
    --LS		1110000 1L I CCCC DDDDDDDDD SSSSSSSSS		WRWORD	D/#,S/PTRA/PTRB		(waits for hub)
    --LS		1110001 0L I CCCC DDDDDDDDD SSSSSSSSS		WRLONG	D/#,S/PTRA/PTRB		(waits for hub)
    --LS		1110001 1L I CCCC DDDDDDDDD SSSSSSSSS		WRQUAD	D/#,S/PTRA/PTRB		(waits for hub, zero-extends #)
    
    --LS		1110010 0L I CCCC DDDDDDDDD SSSSSSSSS		MSGOUTA	D/#,S/#			(send message to pin(s) on OUTA)
    --LS		1110010 1L I CCCC DDDDDDDDD SSSSSSSSS		MSGOUTB	D/#,S/#			(send message to pin(s) on OUTB)
    --LS		1110011 0L I CCCC DDDDDDDDD SSSSSSSSS		MSGDIRA	D/#,S/#			(send message to pin(s) on DIRA)
    --LS		1110011 1L I CCCC DDDDDDDDD SSSSSSSSS		MSGDIRB	D/#,S/#			(send message to pin(s) on DIRB)
    
    --LS		1110100 0L I CCCC DDDDDDDDD SSSSSSSSS		WAITPAE	D/#,S/#			(waits for INA)
    --LS		1110100 1L I CCCC DDDDDDDDD SSSSSSSSS		WAITPAN	D/#,S/#			(waits for INA)
    --LS		1110101 0L I CCCC DDDDDDDDD SSSSSSSSS		WAITPBE	D/#,S/#			(waits for INB)
    --LS		1110101 1L I CCCC DDDDDDDDD SSSSSSSSS		WAITPBN	D/#,S/#			(waits for INB)
    
    --LS		1110110 0L I CCCC DDDDDDDDD SSSSSSSSS		WAITVID	D/#,S/#			(waits for video)
    --LS		1110110 1L I CCCC DDDDDDDDD SSSSSSSSS		PICKZC	D/#,S/#			(always writes Z/C)
    --LS		1110111 0L I CCCC DDDDDDDDD SSSSSSSSS		JP	D/#,S/@			(jump if pin IN high, pins registered at beginning of ALU cycle)
    --LS		1110111 1L I CCCC DDDDDDDDD SSSSSSSSS		JNP	D/#,S/@			(jump if pin IN high, pins registered at beginning of ALU cycle)
    
    --LS		1111000 0L I CCCC DDDDDDDDD SSSSSSSSS		REP	D/#,S/#			(begin repeat block of size D/# with S/# iterations)
    --LS		1111000 1L I CCCC DDDDDDDDD SSSSSSSSS		-	D/#,S/#
    --LS		1111001 0L I CCCC DDDDDDDDD SSSSSSSSS		-	D/#,S/#
    --LS		1111001 1L I CCCC DDDDDDDDD SSSSSSSSS		-	D/#,S/#
    
    --LS		1111010 0L I CCCC DDDDDDDDD SSSSSSSSS		-	D/#,S/#
    --LS		1111010 1L I CCCC DDDDDDDDD SSSSSSSSS		-	D/#,S/#
    --LS		1111011 0L I CCCC DDDDDDDDD SSSSSSSSS		-	D/#,S/#
    --LS		1111011 1L I CCCC DDDDDDDDD SSSSSSSSS		-	D/#,S/#
    
    --LS		1111100 0L I CCCC DDDDDDDDD SSSSSSSSS		-	D/#,S/#
    --LS		1111100 1L I CCCC DDDDDDDDD SSSSSSSSS		-	D/#,S/#
    
    ----		1111101 00 n nnnn nnnnnnnnn nnnnnnnnn		AUGS	#23bits			(appends n to upper bits of next immediate S in same task)
    ----		1111101 01 n nnnn nnnnnnnnn nnnnnnnnn		AUGD	#23bits			(appends n to upper bits of next immediate D in same task)
    
    ----		1111101 10 0 CCCC 0 nnnnnnnnnnnnnnnnn		LOC	#abs			(write 17-bit absolute address to $1EF)
    ----		1111101 10 0 CCCC 1 nnnnnnnnnnnnnnnnn		LOC	@rel			(write 17-bit relative address to $1EF)
    ---- wr		1111101 10 1 CCCC 0 nnnnnnnnnnnnnnnnn		JMP	#abs			(jump to 17-bit absolute address and write {Z,C,P[16:0]} to $1EF)
    ---- wr		1111101 10 1 CCCC 1 nnnnnnnnnnnnnnnnn		JMP	@rel			(jump to 17-bit relative address and write {Z,C,P[16:0]} to $1EF)
    ----		1111101 11 0 CCCC 0 nnnnnnnnnnnnnnnnn		CALL	#abs			(call to 17-bit absolute address using 4-level stack)
    ----		1111101 11 0 CCCC 1 nnnnnnnnnnnnnnnnn		CALL	@rel			(call to 17-bit relative address using 4-level stack)
    ----		1111101 11 1 CCCC 0 nnnnnnnnnnnnnnnnn		CALLA	#abs			(call to 17-bit absolute address using PTRA)
    ----		1111101 11 1 CCCC 1 nnnnnnnnnnnnnnnnn		CALLA	@rel			(call to 17-bit relative address using PTRA)
    
    ----		1111110 00 n CCCC n nnnnnnnnnnnnnnnnn		SETPTRA	#abs			(write 19-bit absolute address to PTRA)
    ----		1111110 01 n CCCC n nnnnnnnnnnnnnnnnn		SETPTRA	@rel			(write 19-bit relative address to PTRA)
    ----		1111110 10 n CCCC n nnnnnnnnnnnnnnnnn		SETPTRB	#abs			(write 19-bit absolute address to PTRB)
    ----		1111110 11 n CCCC n nnnnnnnnnnnnnnnnn		SETPTRB	@rel			(write 19-bit relative address to PTRB)
    
    --L-		1111111 00 L CCCC DDDDDDDDD xxxx00000		WAIT	D/#			(wait for some number of clocks, 0 same as 1)
    --L-		1111111 00 L CCCC DDDDDDDDD xxxx00001		WAITPX	D/#			(wait for any edge on pin D/#)
    --L-		1111111 00 L CCCC DDDDDDDDD xxxx00010		WAITPR	D/#			(wait for pos edge on pin D/#)
    --L-		1111111 00 L CCCC DDDDDDDDD xxxx00011		WAITPF	D/#			(wait for neg edge on pin D/#)
    --L-		1111111 00 L CCCC DDDDDDDDD xxxx00100		PUSH	D/#			(push D/# into 4-level stack)
    --L-		1111111 00 L CCCC DDDDDDDDD xxxx00101		SETVID	D/#			(set video mode)
    --L-		1111111 00 L CCCC DDDDDDDDD xxxx00110		-	D/#
    --L-		1111111 00 L CCCC DDDDDDDDD xxxx00111		-	D/#
    												(D[18:17] into Z/C via WZ/WC for JMP/CALL/CALLA/POP D)
    ZCR- wr		1111111 ZC x CCCC DDDDDDDDD xxxx01000		JMP	D			(jump to D[16:0] and write {Z,C,P[16:0]} to $1EF)
    ZCR-		1111111 ZC x CCCC DDDDDDDDD xxxx01001		CALL	D			(call to D[16:0] using 4-level stack)
    ZCR-		1111111 ZC x CCCC DDDDDDDDD xxxx01010		CALLA	D			(call to D[16:0] using PTRA stack)
    ZCR-		1111111 ZC x CCCC DDDDDDDDD xxxx01011		-	D
    ZCR-		1111111 ZC x CCCC DDDDDDDDD xxxx01100		-	D
    --R-		1111111 00 x CCCC DDDDDDDDD xxxx01101		-	D
    --R-		1111111 00 x CCCC DDDDDDDDD xxxx01110		-	D
    --R-		1111111 00 x CCCC DDDDDDDDD xxxx01111		-	D
    
    ZCW-		1111111 ZC x CCCC DDDDDDDDD xxxx10000		POP	D			(pop 4-level stack into D)
    --W-		1111111 00 x CCCC DDDDDDDDD xxxx10001		-	D
    --W-		1111111 00 x CCCC DDDDDDDDD xxxx10010		-	D
    --W-		1111111 00 x CCCC DDDDDDDDD xxxx10011		-	D
    --W-		1111111 00 x CCCC DDDDDDDDD xxxx10100		-	D
    --W-		1111111 00 x CCCC DDDDDDDDD xxxx10101		-	D
    --W-		1111111 00 x CCCC DDDDDDDDD xxxx10110		-	D
    --W-		1111111 00 x CCCC DDDDDDDDD xxxx10111		-	D
    
    ZC--		1111111 ZC x CCCC xxxxxxxxx xxxx11000		RET				(return using 4-level stack)
    ZC--		1111111 ZC x CCCC xxxxxxxxx xxxx11001		RETA				(return using PTRA stack)
    ZC--		1111111 ZC x CCCC xxxxxxxxx xxxx11010		POLVID				(C = ready for WAITVID)
    -C--		1111111 0C x CCCC xxxxxxxxx xxxx11011		CACHEX				(invalidate instruction cache)
    ----		1111111 00 x CCCC xxxxxxxxx xxxx11100		-
    ----		1111111 00 x CCCC xxxxxxxxx xxxx11101		-
    ----		1111111 00 x CCCC xxxxxxxxx xxxx11110		-
    ----		1111111 00 x CCCC xxxxxxxxx xxxx11111		-
    
    ----		0000000 00 0 0000 000000000 000000000		NOP
    
    Aliases for WRLONG/RDLONG: 	PUSHA/PUSHB/POPA/POPB 
    
    


    Note that the JMP instructions save a return address into $1EF, so these double as the old LINK instructions.
  • AribaAriba Posts: 2,690
    edited 2014-04-17 15:04
    Heater. wrote: »
    ...
    What is the point of this little 4 entry stack?

    You can't modify the RET instruction with hubexec. On Prop1 the CALL (jmpret) had to modify the S field of the RET (also a jmpret) .

    @Roy
    Nested calls were possible in PASM1 just no recursions.

    Andy
  • Heater.Heater. Posts: 21,230
    edited 2014-04-17 15:10
    @Chip & @Roy,

    Thank you for the explanation. It's kind of what I was imagining.

    Which leaves me with the question: WTF is it for?

    With the good old JMPRET if I am calling a subroutine, which calls a subroutine, which calls a subroutine ...everything works just fine. Every subroutine call's return address gets stored at a unique address for that subroutine.

    Having a stack in which to store return addresses only helps if I want to make recursive calls. Either the routine calls itself directly or something it calls calls back to it.

    Such recursive calls are a rare thing in micro-controller land.

    When do we do need this? Perhaps, for example, for the recursive FIBO benchmark. But then a depth of 4 is pretty much useless. Do we need the recursive FIBO bench mark?

    How would a C compiler use this? How would it handle the stack overflow? How would it know if that might happen? And where do the parameters to such recursive calls go?

    I'm not sure what HUB memory has to do with this. Surely a stack can be built in COG registers if need be by an PASM programmer?

    So WTF is this tiny stack for actually?
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2014-04-17 15:11
    tonyp12 wrote:
    Saves you from having to jump to the _RET label if you want to have multiple exit points in the subroutine.
    You don't have to do that now. jmp sub_ret works just as well as, and one instruction faster than, jmp #sub_ret. In fact, if you use jmpret instead of call you can share return-address depositories with other non-nested routines, saving register space.

    -Phil
  • Roy ElthamRoy Eltham Posts: 3,000
    edited 2014-04-17 15:31
    Heater,
    Yeah, I was thinking the wrong thing when I was trying to explain the 4 long stack. The stack based call/ret makes having multiple exit points from a function less of a hassle to worry about. I think the thinking was that having the call/ret work without having to do the self-modifying code approach of JMPRET was desirable, and it was made 4 deep because that was Chip's determination of being enough depth for cog sized pasm code.
  • AribaAriba Posts: 2,690
    edited 2014-04-17 15:41
    Heater. wrote: »
    @Chip & @Roy,

    Thank you for the explanation. It's kind of what I was imagining.

    Which leaves me with the question: WTF is it for?

    With the good old JMPRET if I am calling a subroutine, which calls a subroutine, which calls a subroutine ...everything works just fine. Every subroutine call's return address gets stored at a unique address for that subroutine.

    Having a stack in which to store return addresses only helps if I want to make recursive calls. Either the routine calls itself directly or something it calls calls back to it.

    Such recursive calls are a rare thing in micro-controller land.

    When do we do need this? Perhaps, for example, for the recursive FIBO benchmark. But then a depth of 4 is pretty much useless. Do we need the recursive FIBO bench mark?

    How would a C compiler use this? How would it handle the stack overflow? How would it know if that might happen? And where do the parameters to such recursive calls go?

    I'm not sure what HUB memory has to do with this. Surely a stack can be built in COG registers if need be by an PASM programmer?

    So WTF is this tiny stack for actually?

    Compilers will not use the 4 level stack. PropGCC will use the JMP with the link register at $1EF and pushes that on a stack only if it is a non-leaf function.
    Other Compilers will use CALLA and RETA which handle a hub-stack with PTRA as stackpointer.

    For simple calls in handcrafted PASM we have the 4 level fast stack. The JMPRET way is no longer possible if you want to execute PASM also from Hub. The RET will then be in HubRam and can't be modified easy be the jumpret.

    Andy
  • Heater.Heater. Posts: 21,230
    edited 2014-04-17 15:43
    And now Phil tells us we don't need a stack to make multiple quick exit points from a subroutine.

    Which we should know already because there is P1 code around that does exactly that.

    So, WTF is this tinsy, useless stack for?
  • Heater.Heater. Posts: 21,230
    edited 2014-04-17 15:48
    Ariba,

    Thanks for that, I obviously have to think this through...

    But before I do, how come LLM code in the P1 did not need this silly little stack but hubexec on the PII does?
  • AribaAriba Posts: 2,690
    edited 2014-04-17 15:51
    Heater
    LMM has to handle the CALL and RET with cog-helper routines that do an indirect access with self modifying code - all very costly.
    I think we don't want that to do in Hubexec mode.

    Andy
  • AribaAriba Posts: 2,690
    edited 2014-04-17 15:59
    Seairth wrote: »
    ZCWS		1011111 ZC I CCCC DDDDDDDDD SSSSSSSSS		JMPSW	D,S/@			(jump to S/@, store return address in D, WZ/WC to save/load flags)
    ZCR- wr		1111111 ZC x CCCC DDDDDDDDD xxxx01000		JMP	D			(jump to D[16:0] and write {Z,C,P[16:0]} to $1EF)
    

    It looks to me like JMP is getting its address out of the D register and JMPSW is getting its address out of the S register (ignoring the relative variant). And both write PC+1 to a register. And both use WZ/WC to save (and presumably restore) Z/C.

    It's a bit confusing because we have 3 version of the JMP instruction: immediate17, relative17 and the register indirect.
    If I understand you correct you think we don't need the register indirect version, because this can be made with a JMPSW. I think this will work if we use a read-only register in the D field of jmpsw, for example the CNT or RND register.

    Andy
  • Heater.Heater. Posts: 21,230
    edited 2014-04-17 16:07
    Ariba,
    Compilers will not use the 4 level stack. PropGCC will use the JMP with the link register at $1EF and pushes that on a stack only if it is a non-leaf function.
    and
    LMM has to handle the CALL and RET with cog-helper routines that do an indirect access with self modifying code - all very costly.
    I think we don't want that to do in Hubexec mode.
    So we need a four level stack ... but it's not going to be used.


    So, tell me again, very slowly, why do we need it?
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-04-17 16:23
    If Chip leaves it in, it is great for cog code.

    I'd use it.

    I always intensely disliked the
    sub blah
           blah
    sub_ret ret
    

    business.

    The 4 level LIFO stack also allows generic subroutines that can be called regardless of living in cog or hub space.

    You can use JMPSW as if it was JMPRET if you insist, but that only works for cog-only code.

    using the LR at $1EF for cog only code is a pain, as you'd have to save it before calling another routine.
    Heater. wrote: »
    Ariba,
    Compilers will not use the 4 level stack. PropGCC will use the JMP with the link register at $1EF and pushes that on a stack only if it is a non-leaf function.
    and

    So we need a four level stack ... but it's not going to be used.


    So, tell me again, very slowly, why do we need it?
  • AribaAriba Posts: 2,690
    edited 2014-04-17 16:25
    Heater. wrote: »
    Ariba,

    Compilers will not use the 4 level stack. PropGCC will use the JMP with the link register at $1EF and pushes that on a stack only if it is a non-leaf function.
    and

    So we need a four level stack ... but it's not going to be used.


    So, tell me again, very slowly, why do we need it?

    Okay slowly:

    W e _ n e e d _ i t _ f o r _ f a s t _ c a l l s _ i n _ p u r e _ P A S M _ c o d e _ t h a t _ c a n _ r u n _ f r o m _ c o g r a m _ o r _ h u b r a m .

    Perhaps you still think that hubexec will not get implemented on the P1+, but I think the current state is that Chip will implement it. Perhaps you don't understand that the P1+ uses exactly the same instructions for cogexec and hubexec, so you can run the same code from both memories. Or perhaps you just got up on the wrong side of the bed today. :smile:

    Andy
  • Heater.Heater. Posts: 21,230
    edited 2014-04-17 16:31
    Bill,
    I always intensely disliked the...
    Well, yeah, but it's a minor inconvenience and has no impact on performance.
    It also allows generic subroutines that can be called regardless of living in cog or hub space.
    Well, yeah, but is propgcc or anybody ever going to use that?


    I'm still not convinced this silly little 4 slot stack is any worth yet.
  • evanhevanh Posts: 15,862
    edited 2014-04-17 16:33
    It's looking like the hardware stack's real advantage is not that it has more than one level or that it is fast but that it is spare for handcrafted use because the compiler isn't using it.
  • jmgjmg Posts: 15,172
    edited 2014-04-17 16:37
    Ariba wrote: »
    Okay slowly:

    W e _ n e e d _ i t _ f o r _ f a s t _ c a l l s _ i n _ p u r e _ P A S M _ c o d e _ t h a t _ c a n _ r u n _ f r o m _ c o g r a m _ o r _ h u b r a m .

    Perhaps you still think that hubexec will not get implemented on the P1+, but I think the current state is that Chip will implement it. Perhaps you don't understand that the P1+ uses exactly the same instructions for cogexec and hubexec, so you can run the same code from both memories.

    Makes good sense to me, and Chip has already decided it is worth the effort to include.

    Of course, anyone who wants to, is free to ignore it.
  • ctwardellctwardell Posts: 1,716
    edited 2014-04-17 16:38
    Heater. wrote: »
    I'm still not convinced this silly little 4 slot stack is any worth yet.

    I agree, Chip should expand it to at least 42 slots, otherwise any robots built with it might be prone to depression.

    C.W.
  • jmgjmg Posts: 15,172
    edited 2014-04-17 16:40
    evanh wrote: »
    It's looking like the hardware stack's real advantage is not that it has more than one level or that it is fast but that it is spare for handcrafted use because the compiler isn't using it.

    Compilers do not yet use it, and this sounds a compelling enough case :
    ..uses exactly the same instructions for cogexec and hubexec, so you can run the same code from both memories.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-04-17 16:42
    I tend to write some fairly cool pasm stuff, and I will gladly use it - FAR prefer it to jmpret.
    Heater. wrote: »
    Bill,

    Well, yeah, but it's a minor inconvenience and has no impact on performance.

    Well, yeah, but is propgcc or anybody ever going to use that?


    I'm still not convinced this silly little 4 slot stack is any worth yet.
  • evanhevanh Posts: 15,862
    edited 2014-04-17 16:43
    jmg wrote: »
    Compilers do not yet use it, and this sounds a compelling enough case :
    ..uses exactly the same instructions for cogexec and hubexec, so you can run the same code from both memories.

    The moment a compiler is allowed to use it then it's "speed" advantage vanishes due you having to conform to the compiler's stack management.
  • Heater.Heater. Posts: 21,230
    edited 2014-04-17 16:47
    Ariba,
    You should know me by now. I get up on the wrong side of the bed everyday :)
    W e _ n e e d _ i t _ f o r _ f a s t _ c a l l s _ i n _ p u r e _ P A S M _ c o d e _ t h a t _ c a n _ r u n _ f r o m _ c o g r a m _ o r _ h u b r a m .
    and
    Perhaps you don't understand that the P1+ uses exactly the same instructions for cogexec and hubexec, so you can run the same code from both memories.
    OK. That sounds great...but...

    When would you ever need to be able to place the exact same code in HUB or COG?

    Why is it necessary for HUB resident code to look exactly like COG resident code?

    Who is ever going to make use of that idea?

    We have already determined that compilers will not.

    My FFT in C can be compiled for the P1 with or without FCACHE. It's inner loops can be run from HUB or loaded to COG and run. I have no idea how the code is compiled differently in each case. Neither do I care. All I know is that when compiled with FCACHE it runs nearly as fast as my hand crafted PASM version!

    So WhyTF do we need this little stack again?
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-04-17 16:48
    generic comment (not aimed at heater or anyone in particular)

    Where is this "if it is not used by gcc turf it" attitude coming from?

    Adding some capabilities to make life easier for some compiler is great, even if assembly language programmers don't need/want it.

    Why is adding capabilities that makes life easier/better/faster for assembly language programmers seen as evil by some?

    I find it more than a bit hyporcritical.

    The best, tightest, most amazing pieces of code will be in assembly language.

    You don't want to use a capability? Don't use it.

    You don't need it? Don't use it.

    Kindly don't try to tell the rest of us not to use it.

    Same goes for helper instructions that reduce memory requirements (ie using a single long, instead of 2/3..) for compiler generated code.

    512KB is not infinite, and I'd rather have more ram left for arrays, data, and display buffers than wasting it on two instructions - where a helper could use one.

    Ditto for tasks if Chip adds them.

    You find them too complicated? Don't use them. Don't try to stop me from using them (if Chip puts them in)

    You prefer co-operative multi-tasking? Great! Go for it! But don't try to stop me from using tasks (if Chip puts them in)
  • Heater.Heater. Posts: 21,230
    edited 2014-04-17 16:51
    jmg,
    Compilers do not yet use it,
    I might be wrong but my bet is that compilers will never use it because they will already be managing a stack of their own.
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-04-17 16:52
    str*() functions, mem*() functions etc.

    All small enough to run in a cog based library (flib) or from the hub if cog space is not available.

    I love the idea of being able to have small routines callable regardless of where they live.

    (as an aside, if we had fast aux or cog stacks, I would not resist getting rid of the LIFO)
    Heater. wrote: »
    Ariba,
    You should know me by now. I get up on the wrong side of the bed everyday :)

    and

    OK. That sounds great...but...

    When would you ever need to be able to place the exact same code in HUB or COG?

    Why is it necessary for HUB resident code to look exactly like COG resident code?

    Who is ever going to make use of that idea?

    We have already determined that compilers will not.

    My FFT in C can be compiled for the P1 with or without FCACHE. It's inner loops can be run from HUB or loaded to COG and run. I have no idea how the code is compiled differently in each case. Neither do I care. All I know is that when compiled with FCACHE it runs nearly as fast as my hand crafted PASM version!

    So WhyTF do we need this little stack again?
  • evanhevanh Posts: 15,862
    edited 2014-04-17 16:59
    Bill/Ariba, you've sold me. It took a while to come to terms with the fact that it's really the separation, an independent stack that doesn't have any software management, that is desired. A second separate LR would be almost as good.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2014-04-17 17:02
    I guess I always envisioned an in-cog, stack-based call/return as something like this:
    call == jmpret sp--,#dest
    retn == jmp ++sp

    That way the stack could start wherever you want and be as deep as you want it to be. sp may, by necessity, have to be a special register, since there are no extra bits to indicate post-decrement or pre-increment. And jmpret maintains its original flavor when the destination register is other than an SFR; similarly for jmp.

    A fixed, four-deep stack does seem a bit on stingy side. OTOH, I can't readily come up with an example from my own PASM code where it would have been inadequate. (And I'm not about to write a Perl script to do that kind of "in-depth" -- pardon the pun -- analysis.)

    Disclaimer: I haven't followed this thread closely enough to know whether an idea like this has already been hashed out and discarded. If so, in the words of Emily Litella,

    Emily-Litella-Never-mind.jpg

    :)

    -Phil
  • jmgjmg Posts: 15,172
    edited 2014-04-17 17:02
    Heater. wrote: »
    OK. That sounds great...but...

    When would you ever need to be able to place the exact same code in HUB or COG?

    Why is it necessary for HUB resident code to look exactly like COG resident code?

    Who is ever going to make use of that idea?

    We have already determined that compilers will not.

    ?? Libraries are one obvious area where all the features you claim have no use, will be very useful.

    I'm still missing the point of all your posts, Chip already has this in there - if it bothers you so much, just ignore it.
Sign In or Register to comment.