New Hub Scheme For Next Chip

Phil Pilgrim (PhiPi) · 2014-06-14 17:58

jmg wrote:

The Block transfer opcode will stall the COG, as it needs 16 SysClks to fill the fifo ...

I guess, in that case, it's not really a FIFO -- maybe a RIFO (random-in first-out) or a BIFO (block-in first-out). It would be nice, I suppose, if reading from the "*IFO" could begin as soon as the first (lowest-hub-address) slot is filled, since further reads cannot keep up with subsequent filling. OTOH, that could affect determinism.

-Phil

jmg · 2014-06-14 18:19

Phil Pilgrim (PhiPi) wrote: »

I guess, in that case, it's not really a FIFO -- maybe a RIFO (random-in first-out) or a BIFO (block-in first-out).

I've rephrased that a little better as I'm not sure if the Block read even needs the FIFO - the FIFO is really there for video-DMA style transfers, up to fSys/N. It fills when it can, and empties at a fixed rate. (& vice-versa on read to hub)

Block[16] transfers always needs 16 fSys, but does not have to use the FIFO, - it does needs the address / data lines to the HUB, but in theory, Block Read could work time-shared with a FIFO running video values less than fSys/1.

evanh · 2014-06-14 20:42

jmg wrote: »

I've rephrased that a little better as I'm not sure if the Block read even needs the FIFO - the FIFO is really there for video-DMA style transfers, up to fSys/N. It fills when it can, and empties at a fixed rate. (& vice-versa on read to hub)

Block[16] transfers always needs 16 fSys, but does not have to use the FIFO, - it does needs the address / data lines to the HUB, but in theory, Block Read could work time-shared with a FIFO running video values less than fSys/1.

Hmm, I figured the FIFO was intended for Cog instructions to use. We'll just have to wait and see I guess.

Phil Pilgrim (PhiPi) · 2014-06-14 21:22

jmg wrote:

I've rephrased that a little better as I'm not sure if the Block read even needs the FIFO ...

I guess it could be done either way, with each having its advantages. We'll just have to wait to see how Chip implements it.

-Phil

jmg · 2014-06-14 21:39

evanh wrote: »

Hmm, I figured the FIFO was intended for Cog instructions to use. We'll just have to wait and see I guess.

You mean for HUB exec style code fetch, or general data flows ?
HUB exec is still mentioned by Chip, but no details yet.

A fifo could be useful for LMM style code, but the block[16] read would be more deterministic, and that would free FIFOs for data flows.

It maybe possible to do a hw-assisted form of LMM, that behaves like HUB exec with a speed between LMM and COG code.
Straight line code could be 66% of the speed of COG code, and a size-definable block would allow software to tune/optimise the block sizes.(multiples of 16)

cgracey · 2014-06-15 13:26

evanh wrote: »

Chip, a question: Presumably the FIFO's are single ported. I'm going to guess that any Cog instruction that acesses it's FIFO while the hub is acessing it will stall the Cog, right?

I ask, trying not to be too greedy, because a Cog working on the FIFO contents concurently would seem the most streamlined ... Double buffering anyone? 2x8 FIFO per Cog maybe? /me ducks.

To reliably accommodate an automatic peripheral in the cog that is using the FIFO, the FIFO gets priority over RDxxxx/WRxxxx instructions.

The FIFO is 19 levels deep, maximally. This is to accommodate the worst case of a long being read or written on every clock. Most of the time, in practice, the FIFO will only be a few levels deep.

Sapieha · 2014-06-15 13:42

<Hi Chip

Have You write before read ---- If Fifo's point to same destination?

cgracey wrote: »

To reliably accommodate an automatic peripheral in the cog that is using the FIFO, the FIFO gets priority over RDxxxx/WRxxxx instructions.

The FIFO is 19 levels deep, maximally. This is to accommodate the worst case of a long being read or written on every clock. Most of the time, in practice, the FIFO will only be a few levels deep.

evanh · 2014-06-15 15:27

Thanks Chip.

cgracey wrote: »

To reliably accommodate an automatic peripheral in the cog that is using the FIFO, the FIFO gets priority over RDxxxx/WRxxxx instructions.

What can be an automatic peripheral? What sort of flexibility is there?

The FIFO is 19 levels deep, maximally. This is to accommodate the worst case of a long being read or written on every clock. Most of the time, in practice, the FIFO will only be a few levels deep.

So the plan for the FIFO is mainly around having it pace it's stream at the rate needed for the I/O. So, then, the Cog can still RDxxx/WRxxx in between the FIFO's Hub accesses. Am I on to it now? That almost sounds a bit too luxurious for the Prop.

cgracey · 2014-06-15 16:37

evanh wrote: »

Thanks Chip.

What can be an automatic peripheral? What sort of flexibility is there?

So the plan for the FIFO is mainly around having it pace it's stream at the rate needed for the I/O. So, then, the Cog can still RDxxx/WRxxx in between the FIFO's Hub accesses. Am I on to it now? That almost sounds a bit too luxurious for the Prop.

That's right. RDxxxx/WRxxxx must wait for cycles when the FIFO is not needing to issue reads or writes.

There will be simple state machines to read pins (byte/word/long groups) and write them the hub RAM via the FIFO per NCO setting. Data can also be read from the hub RAM via the FIFO and written to pins or DACs per NCO setting. Video is a case of the latter.

cgracey · 2014-06-15 16:38

Sapieha wrote: »

<Hi Chip

Have You write before read ---- If Fifo's point to same destination?

No. Do you think it would that helpful to have?

Bill Henning · 2014-06-15 17:16

So if I understand correctly: the NCO controls the pin read/write rate, and the FIFO buffers, for both reading and writing pins (in groups of 8/16/32 pins for byte/word/long hub reads/writes), also for writes, it can send them to DAC's?

I have to ask...

1) how fast are the ADC's in the pins? The fifo engine would be more symmetric if the ADC's could also write to the hub - but I suspect this is not needed, as I don't think the ADC's are fast enough to require dma

2) what is the maximum NCO frequency? 200Mhz? (ie sysclk?)

Too bad mixing clock domains is not easy in the Altera software. It would be nice to allow an external clock input for use instead of an NCO frequency. 165MHz oscillator comes to mind.

cgracey wrote: »

That's right. RDxxxx/WRxxxx must wait for cycles when the FIFO is not needing to issue reads or writes.

There will be simple state machines to read pins (byte/word/long groups) and write them the hub RAM via the FIFO per NCO setting. Data can also be read from the hub RAM via the FIFO and written to pins or DACs per NCO setting. Video is a case of the latter.

jmg · 2014-06-15 17:42

Sapieha wrote: »

Have You write before read ---- If Fifo's point to same destination?

This would be rare, - only in Pin streaming -> Hub, not for Hub -> video/pins.
It would also be tricky, as any element in the FIFO might be due to replace what you are about to read in SW.
Checking all FIFO elements will require too much logic.

Most apps would first start the FIFO/DMA and then know whether they were ahead or behind of the write pointer.
(usually SW would be slower, and behind the pointer, but at low NCO speeds, you may need to slow the SW.

There, and for burst cases, it could be useful to have some means to read/track the FIFO burst progress.
Read of the registers used to config this could give that ? This might already be planned ?

jmg · 2014-06-15 17:48

Bill Henning wrote: »

2) what is the maximum NCO frequency? 200Mhz? (ie sysclk?)

Yes, SysClk/N, N >= 1,

Bill Henning wrote: »

Too bad mixing clock domains is not easy in the Altera software. It would be nice to allow an external clock input for use instead of an NCO frequency. 165MHz oscillator comes to mind.

Why not clock at 165MHz in that case ?

External clock can be done, but it will always be sampled by SysCLK, as on most uC.
I'd expect smart pin counters to have External Edge option, & the sampling limits INC rates to < SysCLK/2 (100MHz)

Sapieha · 2014-06-15 18:40

Hi Chip.

As I don't know how FIFO's You made works -- I can't answer that.

BUT I will always calculate on last values --- Not old ones

cgracey wrote: »

No. Do you think it would that helpful to have?

evanh · 2014-06-16 04:17

Bill Henning wrote: »

So if I understand correctly: the NCO controls the pin read/write rate, and the FIFO buffers, for both reading and writing pins (in groups of 8/16/32 pins for byte/word/long hub reads/writes), also for writes, it can send them to DAC's?

Yep, that's my understanding now.

1) how fast are the ADC's in the pins? The fifo engine would be more symmetric if the ADC's could also write to the hub - but I suspect this is not needed, as I don't think the ADC's are fast enough to require dma

The counters are still in limbo I think so that part is still up in the air.

SERDES would be another candidate, but again not done yet.

cgracey · 2014-06-23 17:27

UPDATE:

I've got the cog working! The next step is to write the new boot loader. I'll need to update the assembler to do this. So far, I've been coding short test programs by hand and typing them into the memory files used by Quartus to compile the FPGA image.

Since we can't substitute ROM bit cells for RAM cells in the main memory (since we are using OnSemi's RAM), I had to instantiate a separate 4K*8 ROM (0.067 square mm) that is read via the CLKSET instruction. Each cog has a 5-unique-instruction program (that only cog0 uses) to load the ROM into the base of hub RAM at start-up. Doing it this way keeps the main memory simple, but it added a layer of complexity to the development. This has been slow-going, but once I get PNut.exe working with the downloader, things are going to really accelerate. That's when I'll add hub exec.

Thanks for your patience. I'm feeling confident about the direction of things. I think the chance of failure on the next chip will be very low.

jmg · 2014-06-23 17:37

cgracey wrote: »

... I had to instantiate a separate 4K*8 ROM (0.067 square mm) that is read via the CLKSET instruction. Each cog has a 5-unique-instruction program (that only cog0 uses) to load the ROM into the base of hub RAM at start-up.

Interesting. So this is a serial-like ROM, only accessible at Power-on ? (so the RAM is 100% RAM)
How does that compare with the previous ROM size ?

I noticed TI have released a ROM version of their Piccolo, with Motor control code/libraries in the ROM.

cgracey · 2014-06-23 17:55

jmg wrote: »

Interesting. So this is a serial-like ROM, only accessible at Power-on ? (so the RAM is 100% RAM)
How does that compare with the previous ROM size ?

I noticed TI have released a ROM version of their Piccolo, with Motor control code/libraries in the ROM.

You can always access it by using WC in the CLKSET instruction, so that D returns the next byte. Being only 8 bits wide, it doesn't cost a lot of gates and wires.

The previous ROM was 4KB, too. I might make this ROM 16Kx8, since it would still only take 0.25 sq mm, but could be changed later to accommodate USB boot code, etc. I'd just need to make the 5-unique-instruction boot program read 16KB, instead of 4KB. That just means changing two 0's into 1's.

David Betz · 2014-06-23 17:57

cgracey wrote: »

You can always access it by using WC in the CLKSET instruction, so that D returns the next byte. Being only 8 bits wide, it doesn't cost a lot of gates and wires.

The previous ROM was 4KB, too. I might make this ROM 16Kx8, since it would still only take 0.25 sq mm, but could be changed later to accommodate USB boot code, etc. I'd just need to make the 5-unique-instruction boot program read 16KB, instead of 4KB. That just means changing two 0's into 1's.

Sounds great! I like the idea of the full 512K being available as hub RAM.

Peter Jakacki · 2014-06-23 18:05

cgracey wrote: »

UPDATE:

I've got the cog working! The next step is to write the new boot loader. I'll need to update the assembler to do this. So far, I've been coding short test programs by hand and typing them into the memory files used by Quartus to compile the FPGA image.

Since we can't substitute ROM bit cells for RAM cells in the main memory (since we are using OnSemi's RAM), I had to instantiate a separate 4K*8 ROM (0.067 square mm) that is read via the CLKSET instruction. Each cog has a 5-unique-instruction program (that only cog0 uses) to load the ROM into the base of hub RAM at start-up. Doing it this way keeps the main memory simple, but it added a layer of complexity to the development. This has been slow-going, but once I get PNut.exe working with the downloader, things are going to really accelerate. That's when I'll add hub exec.

Thanks for your patience. I'm feeling confident about the direction of things. I think the chance of failure on the next chip will be very low.

Congratulations Chip, I know you must have been working hard on this and it certainly is a milestone in the P16X64A development (that is the name for "Next Chip"?).

Yes, it is annoying having to implement the ROM in this manner but no matter how good the tools are or having over 114,000 logic elements and memory to match to play with, engineers always have something to complain about

David Betz · 2014-06-23 18:39

cgracey wrote: »

UPDATE:

I've got the cog working! The next step is to write the new boot loader. I'll need to update the assembler to do this. So far, I've been coding short test programs by hand and typing them into the memory files used by Quartus to compile the FPGA image.

Since we can't substitute ROM bit cells for RAM cells in the main memory (since we are using OnSemi's RAM), I had to instantiate a separate 4K*8 ROM (0.067 square mm) that is read via the CLKSET instruction. Each cog has a 5-unique-instruction program (that only cog0 uses) to load the ROM into the base of hub RAM at start-up. Doing it this way keeps the main memory simple, but it added a layer of complexity to the development. This has been slow-going, but once I get PNut.exe working with the downloader, things are going to really accelerate. That's when I'll add hub exec.

Thanks for your patience. I'm feeling confident about the direction of things. I think the chance of failure on the next chip will be very low.

Congratulations on getting the COG working! Is it complete enough that you can post an instruction set?

Cluso99 · 2014-06-23 20:56

Fantastic work Chip!

With the full 512KB hub ram, we are not constrained with the ROM. This can be replaced from flash with whatever the user requires. While what you did was neat putting ROM into the RAM, OnSemi's cells make much more sense now. It's a shame of the wasted resources but the result will be way better.

IMHO, the only ROM use that makes sense to me, is to be able to boot to get running. If we can get USB working then that makes sense as one of the boot alternatives. The monitor code also makes sense to get a minimum user system running too. And of course we have the security code too.

Do OnSemi have any fuse bit cells? I am presuming they don't have flash cells, or that it increases the process steps ???

cgracey · 2014-06-23 22:13

I compiled all 16 cogs and it took ~83k LE's, or 72% of the DE2-115's FPGA.

Here's an updated instruction list:

5ns period
------------____________------------____________------------____________------------____________------------____________------------____________-

|                       |                       |                       |                       |                       |                       |
|-------+               |              rdRAM Ic |-------+               |              rdRAM Id |-------+               |              rdRAM Ie |
|       |               |                       |       |               |                       |       |               |                       |
|---+   +----> rdRAM Db |------------> latch Db |---+   +----> rdRAM Dc |------------> latch Dc |---+   +----> rdRAM Dd |------------> latch Dd |
|---+   +----> rdRAM Sb |------------> latch Sb |---+   +----> rdRAM Sc |------------> latch Sc |---+   +----> rdRAM Sd |------------> latch Sd |
|---+   +----> latch Ib |------------> latch Ib |---+   +----> latch Ic |------------> latch Ic |---+   +----> latch Id |------------> latch Id |
|   |                   |                       |   |                   |                       |   |                   |                       |
|   +------------------ALU-----------> wrRAM Ra |   +------------------ALU-----------> wrRAM Rb |   +------------------ALU-----------> wrRAM Rc |
|                       |                       |                       |                       |                       |                       |
|                       |       <wait a>        |                       |       <wait b>        |                       |       <wait c>        |
|                       |                       |                       |                       |                       |                       |



	-- addressable registers
	--
	--	addr		read		write		name
	--	----------------------------------------------------
	--
	--	000-1F7		RAM		RAM
	--
	--	1F8		PTRA		RAM+PTRA	PTRA
	--	1F9		PTRB		RAM+PTRB	PTRB
	--	1FA		INA		RAM		INA
	--	1FB		INB		RAM		INB
	--	1FC		RAM		RAM+OUTA	OUTA
	--	1FD		RAM		RAM+OUTB	OUTB
	--	1FE		RAM		RAM+DIRA	DIRA
	--	1FF		RAM		RAM+DIRB	DIRB



Current Instructions:

ZCDS (for D column: W=write, M=modify, R=read, L=read/immediate)
----------------------------------------------------------------------------------------------------------------------

ZCMS		0000000 ZCI CCCC DDDDDDDDD SSSSSSSSS		ROR	D,S/#		rot
ZCMS		0000001 ZCI CCCC DDDDDDDDD SSSSSSSSS		ROL	D,S/#		rot
ZCMS		0000010 ZCI CCCC DDDDDDDDD SSSSSSSSS		SHR	D,S/#		rot
ZCMS		0000011 ZCI CCCC DDDDDDDDD SSSSSSSSS		SHL	D,S/#		rot
ZCMS		0000100 ZCI CCCC DDDDDDDDD SSSSSSSSS		RCR	D,S/#		rot
ZCMS		0000101 ZCI CCCC DDDDDDDDD SSSSSSSSS		RCL	D,S/#		rot
ZCMS		0000110 ZCI CCCC DDDDDDDDD SSSSSSSSS		SAR	D,S/#		rot
ZCMS		0000111 ZCI CCCC DDDDDDDDD SSSSSSSSS		SAL	D,S/#		rot

ZCMS		0001000 ZCI CCCC DDDDDDDDD SSSSSSSSS		ADD	D,S/#		add
ZCMS		0001001 ZCI CCCC DDDDDDDDD SSSSSSSSS		ADDX	D,S/#		add
ZCMS		0001010 ZCI CCCC DDDDDDDDD SSSSSSSSS		ADDS	D,S/#		add
ZCMS		0001011 ZCI CCCC DDDDDDDDD SSSSSSSSS		ADDSX	D,S/#		add

ZCMS		0001100 ZCI CCCC DDDDDDDDD SSSSSSSSS		SUB	D,S/#		add
ZCMS		0001101 ZCI CCCC DDDDDDDDD SSSSSSSSS		SUBX	D,S/#		add
ZCMS		0001110 ZCI CCCC DDDDDDDDD SSSSSSSSS		SUBS	D,S/#		add
ZCMS		0001111 ZCI CCCC DDDDDDDDD SSSSSSSSS		SUBSX	D,S/#		add

ZCRS		0010000 ZCI CCCC DDDDDDDDD SSSSSSSSS		CMP	D,S/#		add
ZCRS		0010001 ZCI CCCC DDDDDDDDD SSSSSSSSS		CMPX	D,S/#		add
ZCRS		0010010 ZCI CCCC DDDDDDDDD SSSSSSSSS		CMPS	D,S/#		add
ZCRS		0010011 ZCI CCCC DDDDDDDDD SSSSSSSSS		CMPSX	D,S/#		add

ZCRS		0010100 ZCI CCCC DDDDDDDDD SSSSSSSSS		CMPR	D,S/#		add
ZCRS		0010101 ZCI CCCC DDDDDDDDD SSSSSSSSS		CMPM	D,S/#		add
ZCMS		0010110 ZCI CCCC DDDDDDDDD SSSSSSSSS		SUBR	D,S/#		add
ZCMS		0010111 ZCI CCCC DDDDDDDDD SSSSSSSSS		CMPSUB	D,S/#		add

ZCMS		0011000 ZCI CCCC DDDDDDDDD SSSSSSSSS		MIN	D,S/#		add
ZCMS		0011001 ZCI CCCC DDDDDDDDD SSSSSSSSS		MAX	D,S/#		add
ZCMS		0011010 ZCI CCCC DDDDDDDDD SSSSSSSSS		MINS	D,S/#		add
ZCMS		0011011 ZCI CCCC DDDDDDDDD SSSSSSSSS		MAXS	D,S/#		add

ZCMS		0011100 ZCI CCCC DDDDDDDDD SSSSSSSSS		SUMC	D,S/#		add
ZCMS		0011101 ZCI CCCC DDDDDDDDD SSSSSSSSS		SUMNC	D,S/#		add
ZCMS		0011110 ZCI CCCC DDDDDDDDD SSSSSSSSS		SUMZ	D,S/#		add
ZCMS		0011111 ZCI CCCC DDDDDDDDD SSSSSSSSS		SUMNZ	D,S/#		add

ZCMS		0100000 ZCI CCCC DDDDDDDDD SSSSSSSSS		ISOB	D,S/#		log
ZCMS		0100001 ZCI CCCC DDDDDDDDD SSSSSSSSS		NOTB	D,S/#		log
ZCMS		0100010 ZCI CCCC DDDDDDDDD SSSSSSSSS		CLRB	D,S/#		log
ZCMS		0100011 ZCI CCCC DDDDDDDDD SSSSSSSSS		SETB	D,S/#		log

ZCMS		0100100 ZCI CCCC DDDDDDDDD SSSSSSSSS		SETBC	D,S/#		log
ZCMS		0100101 ZCI CCCC DDDDDDDDD SSSSSSSSS		SETBNC	D,S/#		log
ZCMS		0100110 ZCI CCCC DDDDDDDDD SSSSSSSSS		SETBZ	D,S/#		log
ZCMS		0100111 ZCI CCCC DDDDDDDDD SSSSSSSSS		SETBNZ	D,S/#		log

ZCMS		0101000 ZCI CCCC DDDDDDDDD SSSSSSSSS		ANDN	D,S/#		log
ZCMS		0101001 ZCI CCCC DDDDDDDDD SSSSSSSSS		AND	D,S/#		log
ZCMS		0101010 ZCI CCCC DDDDDDDDD SSSSSSSSS		OR	D,S/#		log
ZCMS		0101011 ZCI CCCC DDDDDDDDD SSSSSSSSS		XOR	D,S/#		log

ZCMS		0101100 ZCI CCCC DDDDDDDDD SSSSSSSSS		MUXC	D,S/#		log
ZCMS		0101101 ZCI CCCC DDDDDDDDD SSSSSSSSS		MUXNC	D,S/#		log
ZCMS		0101110 ZCI CCCC DDDDDDDDD SSSSSSSSS		MUXZ	D,S/#		log
ZCMS		0101111 ZCI CCCC DDDDDDDDD SSSSSSSSS		MUXNZ	D,S/#		log

ZCWS		0110000 ZCI CCCC DDDDDDDDD SSSSSSSSS		MOV	D,S/#		inc
ZCWS		0110001 ZCI CCCC DDDDDDDDD SSSSSSSSS		NOT	D,S/#		inc
ZCWS		0110010 ZCI CCCC DDDDDDDDD SSSSSSSSS		ABS	D,S/#		inc
ZCWS		0110011 ZCI CCCC DDDDDDDDD SSSSSSSSS		NEG	D,S/#		inc

ZCWS		0110100 ZCI CCCC DDDDDDDDD SSSSSSSSS		NEGC	D,S/#		inc
ZCWS		0110101 ZCI CCCC DDDDDDDDD SSSSSSSSS		NEGNC	D,S/#		inc
ZCWS		0110110 ZCI CCCC DDDDDDDDD SSSSSSSSS		NEGZ	D,S/#		inc
ZCWS		0110111 ZCI CCCC DDDDDDDDD SSSSSSSSS		NEGNZ	D,S/#		inc

ZCMS		0111000 ZCI CCCC DDDDDDDDD SSSSSSSSS		ALTDS	D,S/#		inc		(modify D/S fields in D, redirect D/S fields in next instruction)
ZCWS		0111001 ZCI CCCC DDDDDDDDD SSSSSSSSS		DECOD	D,S/#		inc
ZCWS		0111010 ZCI CCCC DDDDDDDDD SSSSSSSSS		TOPONE	D,S/#		inc
ZCWS		0111011 ZCI CCCC DDDDDDDDD SSSSSSSSS		BOTONE	D,S/#		inc

ZCMS		0111100 ZCI CCCC DDDDDDDDD SSSSSSSSS		INCMOD	D,S/#		inc
ZCMS		0111101 ZCI CCCC DDDDDDDDD SSSSSSSSS		DECMOD	D,S/#		inc

ZCMS		0111110 00I CCCC DDDDDDDDD SSSSSSSSS		DJZ	D,S/@		inc/adr
ZCMS		0111110 01I CCCC DDDDDDDDD SSSSSSSSS		DJNZ	D,S/@		inc/adr
ZCMS		0111110 10I CCCC DDDDDDDDD SSSSSSSSS		DJS	D,S/@		inc/adr
ZCMS		0111110 11I CCCC DDDDDDDDD SSSSSSSSS		DJNS	D,S/@		inc/adr

ZCRS		0111111 00I CCCC DDDDDDDDD SSSSSSSSS		JZ	D,S/@		adr
ZCRS		0111111 01I CCCC DDDDDDDDD SSSSSSSSS		JNZ	D,S/@		adr
ZCRS		0111111 10I CCCC DDDDDDDDD SSSSSSSSS		JS	D,S/@		adr
ZCRS		0111111 11I CCCC DDDDDDDDD SSSSSSSSS		JNS	D,S/@		adr

--MS		100000n nnI CCCC DDDDDDDDD SSSSSSSSS		SETNIBn	D,S/#		mux
--WS		100001n nnI CCCC DDDDDDDDD SSSSSSSSS		GETNIBn	D,S/#		mux
--MS		100010n nnI CCCC DDDDDDDDD SSSSSSSSS		ROLNIBn	D,S/#		mux
--MS		1000110 nnI CCCC DDDDDDDDD SSSSSSSSS		SETBYTn	D,S/#		mux
--WS		1000111 nnI CCCC DDDDDDDDD SSSSSSSSS		GETBYTn	D,S/#		mux
--MS		1001000 nnI CCCC DDDDDDDDD SSSSSSSSS		ROLBYTn	D,S/#		mux
--MS		1001001 0nI CCCC DDDDDDDDD SSSSSSSSS		SETWRDn	D,S/#		mux
--WS		1001001 1nI CCCC DDDDDDDDD SSSSSSSSS		GETWRDn	D,S/#		mux
--MS		1001010 0nI CCCC DDDDDDDDD SSSSSSSSS		ROLWRDn	D,S/#		mux
--WS		1001010 10I CCCC DDDDDDDDD SSSSSSSSS		SETBYTS	D,S/#		mux
--MS		1001010 11I CCCC DDDDDDDDD SSSSSSSSS		MOVBYTS	D,S/#		mux
--WS		1001011 00I CCCC DDDDDDDDD SSSSSSSSS		SPLITB	D,S/#		mux
--WS		1001011 01I CCCC DDDDDDDDD SSSSSSSSS		MERGEB	D,S/#		mux
--WS		1001011 10I CCCC DDDDDDDDD SSSSSSSSS		SPLITW	D,S/#		mux
--WS		1001011 11I CCCC DDDDDDDDD SSSSSSSSS		MERGEW	D,S/#		mux
--MS		1001100 00I CCCC DDDDDDDDD SSSSSSSSS		SETS	D,S/#		mux
--WS		1001100 01I CCCC DDDDDDDDD SSSSSSSSS		GETS	D,S/#		mux
--MS		1001100 10I CCCC DDDDDDDDD SSSSSSSSS		SETD	D,S/#		mux
--WS		1001100 11I CCCC DDDDDDDDD SSSSSSSSS		GETD	D,S/#		mux
--MS		1001101 00I CCCC DDDDDDDDD SSSSSSSSS		SETDS	D,S/#		mux
--MS		1001101 01I CCCC DDDDDDDDD SSSSSSSSS		SETCOND	D,S/#		mux
--MS		1001101 10I CCCC DDDDDDDDD SSSSSSSSS		SETI	D,S/#		mux
--WS		1001101 11I CCCC DDDDDDDDD SSSSSSSSS		REV	D,S/#		mux

ZCMS		1001110 ZCI CCCC DDDDDDDDD SSSSSSSSS		MUL	D,S/#		mul		(16 x 16 unsigned multiply)
ZCMS		1001111 ZCI CCCC DDDDDDDDD SSSSSSSSS		MULS	D,S/#		mul		(16 x 16 signed multiply)

ZCRS		1010000 ZCI CCCC DDDDDDDDD SSSSSSSSS		TESTN	D,S/#		log		(ANDN without write)
ZCRS		1010001 ZCI CCCC DDDDDDDDD SSSSSSSSS		TEST	D,S/#		log		(AND  without write)
ZCRS		1010010 ZCI CCCC DDDDDDDDD SSSSSSSSS		ANYB	D,S/#		log		(OR   without write)
ZCRS		1010011 ZCI CCCC DDDDDDDDD SSSSSSSSS		TESTB	D,S/#		log		(ISOB without write)

ZCMS		1010100 ZCI CCCC DDDDDDDDD SSSSSSSSS		WAITCNT	D,S/#		add
ZCMS		1010101 ZCI CCCC DDDDDDDDD SSSSSSSSS		LINK	D,S/@		adr		(jump to S/@, write {Z,C,P[18:0]} to D, 'LINK INA,S' = 'JMP S')

--LS		1100000 L0I CCCC DDDDDDDDD SSSSSSSSS		JP	D/#,S/@		adr		(jump if pin IN high, pins registered at beginning of ALU cycle)
--LS		1100000 L1I CCCC DDDDDDDDD SSSSSSSSS		JNP	D/#,S/@		adr		(jump if pin IN low,  pins registered at beginning of ALU cycle)

--LS		1100001 L0I CCCC DDDDDDDDD SSSSSSSSS		REP	D/#,S/#				(begin repeat block of size D/# with S/# iterations)

--LS		1100010 L0I CCCC DDDDDDDDD SSSSSSSSS		WRFAST	D/#,S/PTRx	hub		(waits for mem)
--LS		1100010 L1I CCCC DDDDDDDDD SSSSSSSSS		WRBYTE	D/#,S/PTRx	hub		(waits for mem)
--LS		1100011 L0I CCCC DDDDDDDDD SSSSSSSSS		WRWORD	D/#,S/PTRx	hub		(waits for mem)
--LS		1100011 L1I CCCC DDDDDDDDD SSSSSSSSS		WRLONG	D/#,S/PTRx	hub		(waits for mem)
--LS		1100100 LxI CCCC DDDDDDDDD SSSSSSSSS		RDFAST	D/#,S/PTRx	hub		(waits for mem)
ZCWS		1100101 ZCI CCCC DDDDDDDDD SSSSSSSSS		RDBYTE	D,S/PTRx	mem		(waits for mem)
ZCWS		1100110 ZCI CCCC DDDDDDDDD SSSSSSSSS		RDWORD	D,S/PTRx	mem		(waits for mem)
ZCWS		1100111 ZCI CCCC DDDDDDDDD SSSSSSSSS		RDLONG	D,S/PTRx	mem		(waits for mem)

--LS		1101000 L0I CCCC DDDDDDDDD SSSSSSSSS		QSINCOS	D/#,S/#		2	..2	(1000)	wait sys
--LS		1101000 L1I CCCC DDDDDDDDD SSSSSSSSS		QARCTAN	D/#,S/#		2	..2	(1001)	wait sys
--LS		1101001 L0I CCCC DDDDDDDDD SSSSSSSSS		QROTATE	D/#,S/#		3	..2	(1010)	wait sys
--LS		1101001 L1I CCCC DDDDDDDDD SSSSSSSSS		QMUL	D/#,S/#		2	..2	(1011)	wait sys
--LS		1101010 L0I CCCC DDDDDDDDD SSSSSSSSS		QDIV	D/#,S/#		3	..2	(1100)	wait sys
--LS		1101010 L1I CCCC DDDDDDDDD SSSSSSSSS		QSQRT	D/#,S/#		2	..1	(1101)	wait sys
-CLS wr if !L	1101011 LCI CCCC DDDDDDDDD SSSSSSSSS		COGINIT	D/#,S/#		2	1?	(0010)	wait sys + 2 if WC and reg


ZCL- wr if C	1101111 0CL CCCC DDDDDDDDD 000000000		CLKSET	D/#		1	0	(0000)	wait sys + 2 if WC
ZCL-		1101111 Z1L CCCC DDDDDDDDD 000000001		COGID	D/#	WC	1	0/C	(0001)	wait sys + 2
ZCL- wr		1101111 Z00 CCCC DDDDDDDDD 000000001		COGID	D		0	1	(0001)	wait sys + 2
ZCL-		1101111 00L CCCC DDDDDDDDD 000000011		COGSTOP	D/#		1	0	(0011)	wait sys
ZCL- wr		1101111 ZC0 CCCC DDDDDDDDD 000000100		LOCKNEW	D		0	1/C	(0100)	wait sys + 2
ZCL-		1101111 00L CCCC DDDDDDDDD 000000101		LOCKRET	D/#		1	0	(0101)	wait sys
ZCL-		1101111 0CL CCCC DDDDDDDDD 000000110		LOCKCLR	D/#		1	0/C	(0110)	wait sys + 2 if WC
ZCL-		1101111 0CL CCCC DDDDDDDDD 000000111		LOCKSET	D/#		1	0/C	(0111)	wait sys + 2 if WC
ZCL-		1101111 00L CCCC DDDDDDDDD 000001110		QLOG	D/#		1	..1	(1110)	wait sys
ZCL-		1101111 00L CCCC DDDDDDDDD 000001111		QEXP	D/#		1	..1	(1111)	wait sys

ZCL-		1101111 00L CCCC DDDDDDDDD 000000000		SETQ	D/#
ZCL- wr		1101111 ZC0 CCCC DDDDDDDDD 000010001		RFBYTE	D
ZCL- wr		1101111 ZC0 CCCC DDDDDDDDD 000010010		RFWORD	D
ZCL- wr		1101111 ZC0 CCCC DDDDDDDDD 000010011		RFLONG	D
ZCL-		1101111 00L CCCC DDDDDDDDD 000010100		WFBYTE	D/#
ZCL-		1101111 00L CCCC DDDDDDDDD 000011000		WFWORD	D/#
ZCL-		1101111 00L CCCC DDDDDDDDD 000011100		WFLONG	D/#


----		11110nn nnn CCCC nnnnnnnnn nnnnnnnnn		AUGS	#23bits			(appends n to upper bits of next immediate S in same task)
----		11111nn nnn CCCC nnnnnnnnn nnnnnnnnn		AUGD	#23bits			(appends n to upper bits of next immediate D in same task)




Instructions to be added:


ZCWS	*	1011001 ZCI CCCC DDDDDDDDD SSSSSSSSS		MSGIN	D,S/#		msg	(waits up to 32 clocks to receive %010_data32 message on pin S/#, C=timeout)

--LS	*	1100000 1LI CCCC DDDDDDDDD SSSSSSSSS		WAITPAE	D/#,S/#			(waits for INA)
--LS	*	1100000 0LI CCCC DDDDDDDDD SSSSSSSSS		WAITPAN	D/#,S/#			(waits for INA)
--LS	*	1100001 1LI CCCC DDDDDDDDD SSSSSSSSS		WAITPBE	D/#,S/#			(waits for INB)
--LS	*	1100001 0LI CCCC DDDDDDDDD SSSSSSSSS		WAITPBN	D/#,S/#			(waits for INB)

--LS	*	1100100 0LI CCCC DDDDDDDDD SSSSSSSSS		MSGOUTA	D/#,S/#			(send message to pin(s) on DIRA)
--LS	*	1100100 1LI CCCC DDDDDDDDD SSSSSSSSS		MSGOUTB	D/#,S/#			(send message to pin(s) on DIRB)

--LS	*	1100110 0LI CCCC DDDDDDDDD SSSSSSSSS		PICKZC	D/#,S/#		adr	(always writes Z/C) - maybe change to {base5+offset4, base5) (1-based offset)


----


----	*	1101100 00n CCCC nnnnnnnnn nnnnnnnnn	*	JMP	#abs		adr	(jump to 19-bit absolute address)
----	*	1101100 01n CCCC nnnnnnnnn nnnnnnnnn	*	JMP	@rel		adr	(jump to 19-bit relative address)
----	*	1101100 10n CCCC nnnnnnnnn nnnnnnnnn	*	CALL	#abs		adr	(call to 19-bit absolute address, using 4-level stack)
----	*	1101100 11n CCCC nnnnnnnnn nnnnnnnnn	*	CALL	@rel		adr	(call to 19-bit relative address, using 4-level stack)
----	*	1101101 00n CCCC nnnnnnnnn nnnnnnnnn	*	CALLA	#abs		adr	(call to 19-bit absolute address using PTRA)
----	*	1101101 01n CCCC nnnnnnnnn nnnnnnnnn	*	CALLA	@rel		adr	(call to 19-bit relative address using PTRA)
----	*	1101101 00n CCCC nnnnnnnnn nnnnnnnnn	*	CALLB	#abs		adr	(call to 19-bit absolute address using PTRB)
----	*	1101101 01n CCCC nnnnnnnnn nnnnnnnnn	*	CALLB	@rel		adr	(call to 19-bit relative address using PTRB)

ZCW-	*	1101111 ZCx CCCC DDDDDDDDD xxxxx0000		GETCNT	D		cog	(get CNT into D)
ZCW-	*	1101111 ZCx CCCC DDDDDDDDD xxxxx0001		GETRND	D		cog	(get RND into D)
ZCW-	*	1101111 ZCx CCCC DDDDDDDDD xxxxx0010		POP	D		cog	(pop 4-level stack into D)	(D[20:19] into Z/C via WZ/WC for POP..CALLB D)
ZCR-	*	1101111 ZCx CCCC DDDDDDDDD xxxxx0011		CALL	D		adr	(call to D[18:0] using 4-level stack)
ZCR-	*	1101111 ZCx CCCC DDDDDDDDD xxxxx0100	*	CALLA	D		adr	(call to D[18:0] using PTRA stack)
ZCR-	*	1101111 ZCx CCCC DDDDDDDDD xxxxx0101	*	CALLB	D		adr	(call to D[18:0] using PTRB stack)
--L-	*	1101111 00L CCCC DDDDDDDDD xxxxx0110		PUSH	D/#			(push D/# into 4-level stack)
--L-	*	1101111 00L CCCC DDDDDDDDD xxxxx0111		SETVID	D/#			(set video mode)
--L-	*	1101111 00L CCCC DDDDDDDDD xxxxx1000		WAIT	D/#			(wait for some number of clocks, 0 same as 1)
--L-	*	1101111 00L CCCC DDDDDDDDD xxxxx1001		WAITPX	D/#			(wait for any edge on pin D/#)
--L-	*	1101111 00L CCCC DDDDDDDDD xxxxx1010		WAITPR	D/#			(wait for pos edge on pin D/#)
--L-	*	1101111 00L CCCC DDDDDDDDD xxxxx1011		WAITPF	D/#			(wait for neg edge on pin D/#)
----	*	1101111 00x CCCC xxxxxxxxx xxxxx1100		SETQ	D/#
----	*	1101111 ZCx CCCC xxxxxxxxx xxxxx1101		RET			cog	(return using 4-level stack)
----	*	1101111 ZCx CCCC xxxxxxxxx xxxxx1110	*	RETA			cog	(return using PTRA stack)
----	*	1101111 ZCx CCCC xxxxxxxxx xxxxx1111	*	RETB			cog	(return using PTRB stack)

----

---- wr	*	111000r rrn CCCC nnnnnnnnn nnnnnnnnn		LOCADDR	reg,#abs	adr	(write 19-bit absolute address to $1F2..$1F9, includes PTRA/PTRB)
---- wr	*	111001r rrn CCCC nnnnnnnnn nnnnnnnnn		LOCADDR	reg,@rel	adr	(write 19-bit relative address to $1F2..$1F9, includes PTRA/PTRB)
---- wr	*	111010r rrn CCCC nnnnnnnnn nnnnnnnnn	*	LINK	reg,#abs	adr	(jump to 19-bit absolute address, write {Z,C,P[18:0]} to $1F2..$1F9)
---- wr	*	111011r rrn CCCC nnnnnnnnn nnnnnnnnn	*	LINK	reg,@rel	adr	(jump to 19-bit relative address, write {Z,C,P[18:0]} to $1F2..$1F9)

cgracey · 2014-06-23 22:18

Cluso99 wrote: »

Do OnSemi have any fuse bit cells? I am presuming they don't have flash cells, or that it increases the process steps ???

They do. I will ask them about their construction. I think ours are fine, but they might have some other technique that is more reliable and compact.

ozpropdev · 2014-06-23 22:20

Nice work Chip!

Looking forward to flexing the new chips muscles...

Yanomani · 2014-06-23 22:50

Hi Chip

Great work!
Keeping the whole 512k Ram free is a bonus that rewards your efforts.

As for the fuses construction technique, when selecting between yours and OnSemi, besides reliability and compactness considerations, IMHO an important aspect to take in account is the ability to "keep them well buried" in the mix of transistors and wires, so they cannot be easily pinpointed by geometrically (electron beam / optically) differentiable irregularities introduced in the layout by their presence and programmed state.

As for the current design step, does the Ram speed still outperforms the maximum expected COG clock rate by about 2:1?

Yanomani

potatohead · 2014-06-23 23:13

Well done Chip!

With the ROM being copied to RAM now, we can design things that patch into the default code. Remember to add some clever hooks.

Tubular · 2014-06-23 23:14

Lots of nice instructions still in there, even for nibbles. ALTDS too

"LINK" appears both "current instructions" and "instructions to be added" with different opcodes. Perhaps these are different variants of Link

cgracey · 2014-06-24 00:50

Yanomani wrote: »

Hi Chip

Great work!
Keeping the whole 512k Ram free is a bonus that rewards your efforts.

As for the fuses construction technique, when selecting between yours and OnSemi, besides reliability and compactness considerations, IMHO an important aspect to take in account is the ability to "keep them well buried" in the mix of transistors and wires, so they cannot be easily pinpointed by geometrically (electron beam / optically) differentiable irregularities introduced in the layout by their presence and programmed state.

As for the current design step, does the Ram speed still outperforms the maximum expected COG clock rate by about 2:1?

Yanomani

The chip is designed so that the logic speed does not exceed the RAM speed, allowing the logic to be sufficiently complicated, while not going (much) under the speed limit imposed by the RAMs. So, the RAM speed is the target for everything else. RAMs are slow, compared to logic, so they set the speed limit.

cgracey · 2014-06-24 00:52

Tubular wrote: »

Lots of nice instructions still in there, even for nibbles. ALTDS too

"LINK" appears both "current instructions" and "instructions to be added" with different opcodes. Perhaps these are different variants of Link

The LINK instruction not implemented yet will be able to provide a 19-bit constant, but is limited on where the return address can be stored.

New Hub Scheme For Next Chip

Comments