The Block transfer opcode will stall the COG, as it needs 16 SysClks to fill the fifo ...
I guess, in that case, it's not really a FIFO -- maybe a RIFO (random-in first-out) or a BIFO (block-in first-out). It would be nice, I suppose, if reading from the "*IFO" could begin as soon as the first (lowest-hub-address) slot is filled, since further reads cannot keep up with subsequent filling. OTOH, that could affect determinism.
I guess, in that case, it's not really a FIFO -- maybe a RIFO (random-in first-out) or a BIFO (block-in first-out).
I've rephrased that a little better as I'm not sure if the Block read even needs the FIFO - the FIFO is really there for video-DMA style transfers, up to fSys/N. It fills when it can, and empties at a fixed rate. (& vice-versa on read to hub)
Block[16] transfers always needs 16 fSys, but does not have to use the FIFO, - it does needs the address / data lines to the HUB, but in theory, Block Read could work time-shared with a FIFO running video values less than fSys/1.
I've rephrased that a little better as I'm not sure if the Block read even needs the FIFO - the FIFO is really there for video-DMA style transfers, up to fSys/N. It fills when it can, and empties at a fixed rate. (& vice-versa on read to hub)
Block[16] transfers always needs 16 fSys, but does not have to use the FIFO, - it does needs the address / data lines to the HUB, but in theory, Block Read could work time-shared with a FIFO running video values less than fSys/1.
Hmm, I figured the FIFO was intended for Cog instructions to use. We'll just have to wait and see I guess.
Hmm, I figured the FIFO was intended for Cog instructions to use. We'll just have to wait and see I guess.
You mean for HUB exec style code fetch, or general data flows ?
HUB exec is still mentioned by Chip, but no details yet.
A fifo could be useful for LMM style code, but the block[16] read would be more deterministic, and that would free FIFOs for data flows.
It maybe possible to do a hw-assisted form of LMM, that behaves like HUB exec with a speed between LMM and COG code.
Straight line code could be 66% of the speed of COG code, and a size-definable block would allow software to tune/optimise the block sizes.(multiples of 16)
Chip, a question: Presumably the FIFO's are single ported. I'm going to guess that any Cog instruction that acesses it's FIFO while the hub is acessing it will stall the Cog, right?
I ask, trying not to be too greedy, because a Cog working on the FIFO contents concurently would seem the most streamlined ... Double buffering anyone? 2x8 FIFO per Cog maybe? /me ducks.
To reliably accommodate an automatic peripheral in the cog that is using the FIFO, the FIFO gets priority over RDxxxx/WRxxxx instructions.
The FIFO is 19 levels deep, maximally. This is to accommodate the worst case of a long being read or written on every clock. Most of the time, in practice, the FIFO will only be a few levels deep.
To reliably accommodate an automatic peripheral in the cog that is using the FIFO, the FIFO gets priority over RDxxxx/WRxxxx instructions.
The FIFO is 19 levels deep, maximally. This is to accommodate the worst case of a long being read or written on every clock. Most of the time, in practice, the FIFO will only be a few levels deep.
To reliably accommodate an automatic peripheral in the cog that is using the FIFO, the FIFO gets priority over RDxxxx/WRxxxx instructions.
What can be an automatic peripheral? What sort of flexibility is there?
The FIFO is 19 levels deep, maximally. This is to accommodate the worst case of a long being read or written on every clock. Most of the time, in practice, the FIFO will only be a few levels deep.
So the plan for the FIFO is mainly around having it pace it's stream at the rate needed for the I/O. So, then, the Cog can still RDxxx/WRxxx in between the FIFO's Hub accesses. Am I on to it now? That almost sounds a bit too luxurious for the Prop.
What can be an automatic peripheral? What sort of flexibility is there?
So the plan for the FIFO is mainly around having it pace it's stream at the rate needed for the I/O. So, then, the Cog can still RDxxx/WRxxx in between the FIFO's Hub accesses. Am I on to it now? That almost sounds a bit too luxurious for the Prop.
That's right. RDxxxx/WRxxxx must wait for cycles when the FIFO is not needing to issue reads or writes.
There will be simple state machines to read pins (byte/word/long groups) and write them the hub RAM via the FIFO per NCO setting. Data can also be read from the hub RAM via the FIFO and written to pins or DACs per NCO setting. Video is a case of the latter.
So if I understand correctly: the NCO controls the pin read/write rate, and the FIFO buffers, for both reading and writing pins (in groups of 8/16/32 pins for byte/word/long hub reads/writes), also for writes, it can send them to DAC's?
I have to ask...
1) how fast are the ADC's in the pins? The fifo engine would be more symmetric if the ADC's could also write to the hub - but I suspect this is not needed, as I don't think the ADC's are fast enough to require dma
2) what is the maximum NCO frequency? 200Mhz? (ie sysclk?)
Too bad mixing clock domains is not easy in the Altera software. It would be nice to allow an external clock input for use instead of an NCO frequency. 165MHz oscillator comes to mind.
That's right. RDxxxx/WRxxxx must wait for cycles when the FIFO is not needing to issue reads or writes.
There will be simple state machines to read pins (byte/word/long groups) and write them the hub RAM via the FIFO per NCO setting. Data can also be read from the hub RAM via the FIFO and written to pins or DACs per NCO setting. Video is a case of the latter.
Have You write before read ---- If Fifo's point to same destination?
This would be rare, - only in Pin streaming -> Hub, not for Hub -> video/pins.
It would also be tricky, as any element in the FIFO might be due to replace what you are about to read in SW.
Checking all FIFO elements will require too much logic.
Most apps would first start the FIFO/DMA and then know whether they were ahead or behind of the write pointer.
(usually SW would be slower, and behind the pointer, but at low NCO speeds, you may need to slow the SW.
There, and for burst cases, it could be useful to have some means to read/track the FIFO burst progress.
Read of the registers used to config this could give that ? This might already be planned ?
Too bad mixing clock domains is not easy in the Altera software. It would be nice to allow an external clock input for use instead of an NCO frequency. 165MHz oscillator comes to mind.
Why not clock at 165MHz in that case ?
External clock can be done, but it will always be sampled by SysCLK, as on most uC.
I'd expect smart pin counters to have External Edge option, & the sampling limits INC rates to < SysCLK/2 (100MHz)
So if I understand correctly: the NCO controls the pin read/write rate, and the FIFO buffers, for both reading and writing pins (in groups of 8/16/32 pins for byte/word/long hub reads/writes), also for writes, it can send them to DAC's?
Yep, that's my understanding now.
1) how fast are the ADC's in the pins? The fifo engine would be more symmetric if the ADC's could also write to the hub - but I suspect this is not needed, as I don't think the ADC's are fast enough to require dma
The counters are still in limbo I think so that part is still up in the air.
SERDES would be another candidate, but again not done yet.
I've got the cog working! The next step is to write the new boot loader. I'll need to update the assembler to do this. So far, I've been coding short test programs by hand and typing them into the memory files used by Quartus to compile the FPGA image.
Since we can't substitute ROM bit cells for RAM cells in the main memory (since we are using OnSemi's RAM), I had to instantiate a separate 4K*8 ROM (0.067 square mm) that is read via the CLKSET instruction. Each cog has a 5-unique-instruction program (that only cog0 uses) to load the ROM into the base of hub RAM at start-up. Doing it this way keeps the main memory simple, but it added a layer of complexity to the development. This has been slow-going, but once I get PNut.exe working with the downloader, things are going to really accelerate. That's when I'll add hub exec.
Thanks for your patience. I'm feeling confident about the direction of things. I think the chance of failure on the next chip will be very low.
... I had to instantiate a separate 4K*8 ROM (0.067 square mm) that is read via the CLKSET instruction. Each cog has a 5-unique-instruction program (that only cog0 uses) to load the ROM into the base of hub RAM at start-up.
Interesting. So this is a serial-like ROM, only accessible at Power-on ? (so the RAM is 100% RAM)
How does that compare with the previous ROM size ?
I noticed TI have released a ROM version of their Piccolo, with Motor control code/libraries in the ROM.
Interesting. So this is a serial-like ROM, only accessible at Power-on ? (so the RAM is 100% RAM)
How does that compare with the previous ROM size ?
I noticed TI have released a ROM version of their Piccolo, with Motor control code/libraries in the ROM.
You can always access it by using WC in the CLKSET instruction, so that D returns the next byte. Being only 8 bits wide, it doesn't cost a lot of gates and wires.
The previous ROM was 4KB, too. I might make this ROM 16Kx8, since it would still only take 0.25 sq mm, but could be changed later to accommodate USB boot code, etc. I'd just need to make the 5-unique-instruction boot program read 16KB, instead of 4KB. That just means changing two 0's into 1's.
You can always access it by using WC in the CLKSET instruction, so that D returns the next byte. Being only 8 bits wide, it doesn't cost a lot of gates and wires.
The previous ROM was 4KB, too. I might make this ROM 16Kx8, since it would still only take 0.25 sq mm, but could be changed later to accommodate USB boot code, etc. I'd just need to make the 5-unique-instruction boot program read 16KB, instead of 4KB. That just means changing two 0's into 1's.
Sounds great! I like the idea of the full 512K being available as hub RAM.
I've got the cog working! The next step is to write the new boot loader. I'll need to update the assembler to do this. So far, I've been coding short test programs by hand and typing them into the memory files used by Quartus to compile the FPGA image.
Since we can't substitute ROM bit cells for RAM cells in the main memory (since we are using OnSemi's RAM), I had to instantiate a separate 4K*8 ROM (0.067 square mm) that is read via the CLKSET instruction. Each cog has a 5-unique-instruction program (that only cog0 uses) to load the ROM into the base of hub RAM at start-up. Doing it this way keeps the main memory simple, but it added a layer of complexity to the development. This has been slow-going, but once I get PNut.exe working with the downloader, things are going to really accelerate. That's when I'll add hub exec.
Thanks for your patience. I'm feeling confident about the direction of things. I think the chance of failure on the next chip will be very low.
Congratulations Chip, I know you must have been working hard on this and it certainly is a milestone in the P16X64A development (that is the name for "Next Chip"?).
Yes, it is annoying having to implement the ROM in this manner but no matter how good the tools are or having over 114,000 logic elements and memory to match to play with, engineers always have something to complain about
I've got the cog working! The next step is to write the new boot loader. I'll need to update the assembler to do this. So far, I've been coding short test programs by hand and typing them into the memory files used by Quartus to compile the FPGA image.
Since we can't substitute ROM bit cells for RAM cells in the main memory (since we are using OnSemi's RAM), I had to instantiate a separate 4K*8 ROM (0.067 square mm) that is read via the CLKSET instruction. Each cog has a 5-unique-instruction program (that only cog0 uses) to load the ROM into the base of hub RAM at start-up. Doing it this way keeps the main memory simple, but it added a layer of complexity to the development. This has been slow-going, but once I get PNut.exe working with the downloader, things are going to really accelerate. That's when I'll add hub exec.
Thanks for your patience. I'm feeling confident about the direction of things. I think the chance of failure on the next chip will be very low.
Congratulations on getting the COG working! Is it complete enough that you can post an instruction set?
With the full 512KB hub ram, we are not constrained with the ROM. This can be replaced from flash with whatever the user requires. While what you did was neat putting ROM into the RAM, OnSemi's cells make much more sense now. It's a shame of the wasted resources but the result will be way better.
IMHO, the only ROM use that makes sense to me, is to be able to boot to get running. If we can get USB working then that makes sense as one of the boot alternatives. The monitor code also makes sense to get a minimum user system running too. And of course we have the security code too.
Do OnSemi have any fuse bit cells? I am presuming they don't have flash cells, or that it increases the process steps ???
Great work!
Keeping the whole 512k Ram free is a bonus that rewards your efforts.
As for the fuses construction technique, when selecting between yours and OnSemi, besides reliability and compactness considerations, IMHO an important aspect to take in account is the ability to "keep them well buried" in the mix of transistors and wires, so they cannot be easily pinpointed by geometrically (electron beam / optically) differentiable irregularities introduced in the layout by their presence and programmed state.
As for the current design step, does the Ram speed still outperforms the maximum expected COG clock rate by about 2:1?
Great work!
Keeping the whole 512k Ram free is a bonus that rewards your efforts.
As for the fuses construction technique, when selecting between yours and OnSemi, besides reliability and compactness considerations, IMHO an important aspect to take in account is the ability to "keep them well buried" in the mix of transistors and wires, so they cannot be easily pinpointed by geometrically (electron beam / optically) differentiable irregularities introduced in the layout by their presence and programmed state.
As for the current design step, does the Ram speed still outperforms the maximum expected COG clock rate by about 2:1?
Yanomani
The chip is designed so that the logic speed does not exceed the RAM speed, allowing the logic to be sufficiently complicated, while not going (much) under the speed limit imposed by the RAMs. So, the RAM speed is the target for everything else. RAMs are slow, compared to logic, so they set the speed limit.
Comments
-Phil
I've rephrased that a little better as I'm not sure if the Block read even needs the FIFO - the FIFO is really there for video-DMA style transfers, up to fSys/N. It fills when it can, and empties at a fixed rate. (& vice-versa on read to hub)
Block[16] transfers always needs 16 fSys, but does not have to use the FIFO, - it does needs the address / data lines to the HUB, but in theory, Block Read could work time-shared with a FIFO running video values less than fSys/1.
Hmm, I figured the FIFO was intended for Cog instructions to use. We'll just have to wait and see I guess.
-Phil
HUB exec is still mentioned by Chip, but no details yet.
A fifo could be useful for LMM style code, but the block[16] read would be more deterministic, and that would free FIFOs for data flows.
It maybe possible to do a hw-assisted form of LMM, that behaves like HUB exec with a speed between LMM and COG code.
Straight line code could be 66% of the speed of COG code, and a size-definable block would allow software to tune/optimise the block sizes.(multiples of 16)
To reliably accommodate an automatic peripheral in the cog that is using the FIFO, the FIFO gets priority over RDxxxx/WRxxxx instructions.
The FIFO is 19 levels deep, maximally. This is to accommodate the worst case of a long being read or written on every clock. Most of the time, in practice, the FIFO will only be a few levels deep.
Have You write before read ---- If Fifo's point to same destination?
What can be an automatic peripheral? What sort of flexibility is there?
So the plan for the FIFO is mainly around having it pace it's stream at the rate needed for the I/O. So, then, the Cog can still RDxxx/WRxxx in between the FIFO's Hub accesses. Am I on to it now? That almost sounds a bit too luxurious for the Prop.
That's right. RDxxxx/WRxxxx must wait for cycles when the FIFO is not needing to issue reads or writes.
There will be simple state machines to read pins (byte/word/long groups) and write them the hub RAM via the FIFO per NCO setting. Data can also be read from the hub RAM via the FIFO and written to pins or DACs per NCO setting. Video is a case of the latter.
No. Do you think it would that helpful to have?
I have to ask...
1) how fast are the ADC's in the pins? The fifo engine would be more symmetric if the ADC's could also write to the hub - but I suspect this is not needed, as I don't think the ADC's are fast enough to require dma
2) what is the maximum NCO frequency? 200Mhz? (ie sysclk?)
Too bad mixing clock domains is not easy in the Altera software. It would be nice to allow an external clock input for use instead of an NCO frequency. 165MHz oscillator comes to mind.
This would be rare, - only in Pin streaming -> Hub, not for Hub -> video/pins.
It would also be tricky, as any element in the FIFO might be due to replace what you are about to read in SW.
Checking all FIFO elements will require too much logic.
Most apps would first start the FIFO/DMA and then know whether they were ahead or behind of the write pointer.
(usually SW would be slower, and behind the pointer, but at low NCO speeds, you may need to slow the SW.
There, and for burst cases, it could be useful to have some means to read/track the FIFO burst progress.
Read of the registers used to config this could give that ? This might already be planned ?
Yes, SysClk/N, N >= 1,
Why not clock at 165MHz in that case ?
External clock can be done, but it will always be sampled by SysCLK, as on most uC.
I'd expect smart pin counters to have External Edge option, & the sampling limits INC rates to < SysCLK/2 (100MHz)
As I don't know how FIFO's You made works -- I can't answer that.
BUT I will always calculate on last values --- Not old ones
Yep, that's my understanding now.
The counters are still in limbo I think so that part is still up in the air.
SERDES would be another candidate, but again not done yet.
I've got the cog working! The next step is to write the new boot loader. I'll need to update the assembler to do this. So far, I've been coding short test programs by hand and typing them into the memory files used by Quartus to compile the FPGA image.
Since we can't substitute ROM bit cells for RAM cells in the main memory (since we are using OnSemi's RAM), I had to instantiate a separate 4K*8 ROM (0.067 square mm) that is read via the CLKSET instruction. Each cog has a 5-unique-instruction program (that only cog0 uses) to load the ROM into the base of hub RAM at start-up. Doing it this way keeps the main memory simple, but it added a layer of complexity to the development. This has been slow-going, but once I get PNut.exe working with the downloader, things are going to really accelerate. That's when I'll add hub exec.
Thanks for your patience. I'm feeling confident about the direction of things. I think the chance of failure on the next chip will be very low.
Interesting. So this is a serial-like ROM, only accessible at Power-on ? (so the RAM is 100% RAM)
How does that compare with the previous ROM size ?
I noticed TI have released a ROM version of their Piccolo, with Motor control code/libraries in the ROM.
You can always access it by using WC in the CLKSET instruction, so that D returns the next byte. Being only 8 bits wide, it doesn't cost a lot of gates and wires.
The previous ROM was 4KB, too. I might make this ROM 16Kx8, since it would still only take 0.25 sq mm, but could be changed later to accommodate USB boot code, etc. I'd just need to make the 5-unique-instruction boot program read 16KB, instead of 4KB. That just means changing two 0's into 1's.
Congratulations Chip, I know you must have been working hard on this and it certainly is a milestone in the P16X64A development (that is the name for "Next Chip"?).
Yes, it is annoying having to implement the ROM in this manner but no matter how good the tools are or having over 114,000 logic elements and memory to match to play with, engineers always have something to complain about
With the full 512KB hub ram, we are not constrained with the ROM. This can be replaced from flash with whatever the user requires. While what you did was neat putting ROM into the RAM, OnSemi's cells make much more sense now. It's a shame of the wasted resources but the result will be way better.
IMHO, the only ROM use that makes sense to me, is to be able to boot to get running. If we can get USB working then that makes sense as one of the boot alternatives. The monitor code also makes sense to get a minimum user system running too. And of course we have the security code too.
Do OnSemi have any fuse bit cells? I am presuming they don't have flash cells, or that it increases the process steps ???
Here's an updated instruction list:
5ns period ------------____________------------____________------------____________------------____________------------____________------------____________- | | | | | | | |-------+ | rdRAM Ic |-------+ | rdRAM Id |-------+ | rdRAM Ie | | | | | | | | | | | |---+ +----> rdRAM Db |------------> latch Db |---+ +----> rdRAM Dc |------------> latch Dc |---+ +----> rdRAM Dd |------------> latch Dd | |---+ +----> rdRAM Sb |------------> latch Sb |---+ +----> rdRAM Sc |------------> latch Sc |---+ +----> rdRAM Sd |------------> latch Sd | |---+ +----> latch Ib |------------> latch Ib |---+ +----> latch Ic |------------> latch Ic |---+ +----> latch Id |------------> latch Id | | | | | | | | | | | | +------------------ALU-----------> wrRAM Ra | +------------------ALU-----------> wrRAM Rb | +------------------ALU-----------> wrRAM Rc | | | | | | | | | | <wait a> | | <wait b> | | <wait c> | | | | | | | | -- addressable registers -- -- addr read write name -- ---------------------------------------------------- -- -- 000-1F7 RAM RAM -- -- 1F8 PTRA RAM+PTRA PTRA -- 1F9 PTRB RAM+PTRB PTRB -- 1FA INA RAM INA -- 1FB INB RAM INB -- 1FC RAM RAM+OUTA OUTA -- 1FD RAM RAM+OUTB OUTB -- 1FE RAM RAM+DIRA DIRA -- 1FF RAM RAM+DIRB DIRB Current Instructions: ZCDS (for D column: W=write, M=modify, R=read, L=read/immediate) ---------------------------------------------------------------------------------------------------------------------- ZCMS 0000000 ZCI CCCC DDDDDDDDD SSSSSSSSS ROR D,S/# rot ZCMS 0000001 ZCI CCCC DDDDDDDDD SSSSSSSSS ROL D,S/# rot ZCMS 0000010 ZCI CCCC DDDDDDDDD SSSSSSSSS SHR D,S/# rot ZCMS 0000011 ZCI CCCC DDDDDDDDD SSSSSSSSS SHL D,S/# rot ZCMS 0000100 ZCI CCCC DDDDDDDDD SSSSSSSSS RCR D,S/# rot ZCMS 0000101 ZCI CCCC DDDDDDDDD SSSSSSSSS RCL D,S/# rot ZCMS 0000110 ZCI CCCC DDDDDDDDD SSSSSSSSS SAR D,S/# rot ZCMS 0000111 ZCI CCCC DDDDDDDDD SSSSSSSSS SAL D,S/# rot ZCMS 0001000 ZCI CCCC DDDDDDDDD SSSSSSSSS ADD D,S/# add ZCMS 0001001 ZCI CCCC DDDDDDDDD SSSSSSSSS ADDX D,S/# add ZCMS 0001010 ZCI CCCC DDDDDDDDD SSSSSSSSS ADDS D,S/# add ZCMS 0001011 ZCI CCCC DDDDDDDDD SSSSSSSSS ADDSX D,S/# add ZCMS 0001100 ZCI CCCC DDDDDDDDD SSSSSSSSS SUB D,S/# add ZCMS 0001101 ZCI CCCC DDDDDDDDD SSSSSSSSS SUBX D,S/# add ZCMS 0001110 ZCI CCCC DDDDDDDDD SSSSSSSSS SUBS D,S/# add ZCMS 0001111 ZCI CCCC DDDDDDDDD SSSSSSSSS SUBSX D,S/# add ZCRS 0010000 ZCI CCCC DDDDDDDDD SSSSSSSSS CMP D,S/# add ZCRS 0010001 ZCI CCCC DDDDDDDDD SSSSSSSSS CMPX D,S/# add ZCRS 0010010 ZCI CCCC DDDDDDDDD SSSSSSSSS CMPS D,S/# add ZCRS 0010011 ZCI CCCC DDDDDDDDD SSSSSSSSS CMPSX D,S/# add ZCRS 0010100 ZCI CCCC DDDDDDDDD SSSSSSSSS CMPR D,S/# add ZCRS 0010101 ZCI CCCC DDDDDDDDD SSSSSSSSS CMPM D,S/# add ZCMS 0010110 ZCI CCCC DDDDDDDDD SSSSSSSSS SUBR D,S/# add ZCMS 0010111 ZCI CCCC DDDDDDDDD SSSSSSSSS CMPSUB D,S/# add ZCMS 0011000 ZCI CCCC DDDDDDDDD SSSSSSSSS MIN D,S/# add ZCMS 0011001 ZCI CCCC DDDDDDDDD SSSSSSSSS MAX D,S/# add ZCMS 0011010 ZCI CCCC DDDDDDDDD SSSSSSSSS MINS D,S/# add ZCMS 0011011 ZCI CCCC DDDDDDDDD SSSSSSSSS MAXS D,S/# add ZCMS 0011100 ZCI CCCC DDDDDDDDD SSSSSSSSS SUMC D,S/# add ZCMS 0011101 ZCI CCCC DDDDDDDDD SSSSSSSSS SUMNC D,S/# add ZCMS 0011110 ZCI CCCC DDDDDDDDD SSSSSSSSS SUMZ D,S/# add ZCMS 0011111 ZCI CCCC DDDDDDDDD SSSSSSSSS SUMNZ D,S/# add ZCMS 0100000 ZCI CCCC DDDDDDDDD SSSSSSSSS ISOB D,S/# log ZCMS 0100001 ZCI CCCC DDDDDDDDD SSSSSSSSS NOTB D,S/# log ZCMS 0100010 ZCI CCCC DDDDDDDDD SSSSSSSSS CLRB D,S/# log ZCMS 0100011 ZCI CCCC DDDDDDDDD SSSSSSSSS SETB D,S/# log ZCMS 0100100 ZCI CCCC DDDDDDDDD SSSSSSSSS SETBC D,S/# log ZCMS 0100101 ZCI CCCC DDDDDDDDD SSSSSSSSS SETBNC D,S/# log ZCMS 0100110 ZCI CCCC DDDDDDDDD SSSSSSSSS SETBZ D,S/# log ZCMS 0100111 ZCI CCCC DDDDDDDDD SSSSSSSSS SETBNZ D,S/# log ZCMS 0101000 ZCI CCCC DDDDDDDDD SSSSSSSSS ANDN D,S/# log ZCMS 0101001 ZCI CCCC DDDDDDDDD SSSSSSSSS AND D,S/# log ZCMS 0101010 ZCI CCCC DDDDDDDDD SSSSSSSSS OR D,S/# log ZCMS 0101011 ZCI CCCC DDDDDDDDD SSSSSSSSS XOR D,S/# log ZCMS 0101100 ZCI CCCC DDDDDDDDD SSSSSSSSS MUXC D,S/# log ZCMS 0101101 ZCI CCCC DDDDDDDDD SSSSSSSSS MUXNC D,S/# log ZCMS 0101110 ZCI CCCC DDDDDDDDD SSSSSSSSS MUXZ D,S/# log ZCMS 0101111 ZCI CCCC DDDDDDDDD SSSSSSSSS MUXNZ D,S/# log ZCWS 0110000 ZCI CCCC DDDDDDDDD SSSSSSSSS MOV D,S/# inc ZCWS 0110001 ZCI CCCC DDDDDDDDD SSSSSSSSS NOT D,S/# inc ZCWS 0110010 ZCI CCCC DDDDDDDDD SSSSSSSSS ABS D,S/# inc ZCWS 0110011 ZCI CCCC DDDDDDDDD SSSSSSSSS NEG D,S/# inc ZCWS 0110100 ZCI CCCC DDDDDDDDD SSSSSSSSS NEGC D,S/# inc ZCWS 0110101 ZCI CCCC DDDDDDDDD SSSSSSSSS NEGNC D,S/# inc ZCWS 0110110 ZCI CCCC DDDDDDDDD SSSSSSSSS NEGZ D,S/# inc ZCWS 0110111 ZCI CCCC DDDDDDDDD SSSSSSSSS NEGNZ D,S/# inc ZCMS 0111000 ZCI CCCC DDDDDDDDD SSSSSSSSS ALTDS D,S/# inc (modify D/S fields in D, redirect D/S fields in next instruction) ZCWS 0111001 ZCI CCCC DDDDDDDDD SSSSSSSSS DECOD D,S/# inc ZCWS 0111010 ZCI CCCC DDDDDDDDD SSSSSSSSS TOPONE D,S/# inc ZCWS 0111011 ZCI CCCC DDDDDDDDD SSSSSSSSS BOTONE D,S/# inc ZCMS 0111100 ZCI CCCC DDDDDDDDD SSSSSSSSS INCMOD D,S/# inc ZCMS 0111101 ZCI CCCC DDDDDDDDD SSSSSSSSS DECMOD D,S/# inc ZCMS 0111110 00I CCCC DDDDDDDDD SSSSSSSSS DJZ D,S/@ inc/adr ZCMS 0111110 01I CCCC DDDDDDDDD SSSSSSSSS DJNZ D,S/@ inc/adr ZCMS 0111110 10I CCCC DDDDDDDDD SSSSSSSSS DJS D,S/@ inc/adr ZCMS 0111110 11I CCCC DDDDDDDDD SSSSSSSSS DJNS D,S/@ inc/adr ZCRS 0111111 00I CCCC DDDDDDDDD SSSSSSSSS JZ D,S/@ adr ZCRS 0111111 01I CCCC DDDDDDDDD SSSSSSSSS JNZ D,S/@ adr ZCRS 0111111 10I CCCC DDDDDDDDD SSSSSSSSS JS D,S/@ adr ZCRS 0111111 11I CCCC DDDDDDDDD SSSSSSSSS JNS D,S/@ adr --MS 100000n nnI CCCC DDDDDDDDD SSSSSSSSS SETNIBn D,S/# mux --WS 100001n nnI CCCC DDDDDDDDD SSSSSSSSS GETNIBn D,S/# mux --MS 100010n nnI CCCC DDDDDDDDD SSSSSSSSS ROLNIBn D,S/# mux --MS 1000110 nnI CCCC DDDDDDDDD SSSSSSSSS SETBYTn D,S/# mux --WS 1000111 nnI CCCC DDDDDDDDD SSSSSSSSS GETBYTn D,S/# mux --MS 1001000 nnI CCCC DDDDDDDDD SSSSSSSSS ROLBYTn D,S/# mux --MS 1001001 0nI CCCC DDDDDDDDD SSSSSSSSS SETWRDn D,S/# mux --WS 1001001 1nI CCCC DDDDDDDDD SSSSSSSSS GETWRDn D,S/# mux --MS 1001010 0nI CCCC DDDDDDDDD SSSSSSSSS ROLWRDn D,S/# mux --WS 1001010 10I CCCC DDDDDDDDD SSSSSSSSS SETBYTS D,S/# mux --MS 1001010 11I CCCC DDDDDDDDD SSSSSSSSS MOVBYTS D,S/# mux --WS 1001011 00I CCCC DDDDDDDDD SSSSSSSSS SPLITB D,S/# mux --WS 1001011 01I CCCC DDDDDDDDD SSSSSSSSS MERGEB D,S/# mux --WS 1001011 10I CCCC DDDDDDDDD SSSSSSSSS SPLITW D,S/# mux --WS 1001011 11I CCCC DDDDDDDDD SSSSSSSSS MERGEW D,S/# mux --MS 1001100 00I CCCC DDDDDDDDD SSSSSSSSS SETS D,S/# mux --WS 1001100 01I CCCC DDDDDDDDD SSSSSSSSS GETS D,S/# mux --MS 1001100 10I CCCC DDDDDDDDD SSSSSSSSS SETD D,S/# mux --WS 1001100 11I CCCC DDDDDDDDD SSSSSSSSS GETD D,S/# mux --MS 1001101 00I CCCC DDDDDDDDD SSSSSSSSS SETDS D,S/# mux --MS 1001101 01I CCCC DDDDDDDDD SSSSSSSSS SETCOND D,S/# mux --MS 1001101 10I CCCC DDDDDDDDD SSSSSSSSS SETI D,S/# mux --WS 1001101 11I CCCC DDDDDDDDD SSSSSSSSS REV D,S/# mux ZCMS 1001110 ZCI CCCC DDDDDDDDD SSSSSSSSS MUL D,S/# mul (16 x 16 unsigned multiply) ZCMS 1001111 ZCI CCCC DDDDDDDDD SSSSSSSSS MULS D,S/# mul (16 x 16 signed multiply) ZCRS 1010000 ZCI CCCC DDDDDDDDD SSSSSSSSS TESTN D,S/# log (ANDN without write) ZCRS 1010001 ZCI CCCC DDDDDDDDD SSSSSSSSS TEST D,S/# log (AND without write) ZCRS 1010010 ZCI CCCC DDDDDDDDD SSSSSSSSS ANYB D,S/# log (OR without write) ZCRS 1010011 ZCI CCCC DDDDDDDDD SSSSSSSSS TESTB D,S/# log (ISOB without write) ZCMS 1010100 ZCI CCCC DDDDDDDDD SSSSSSSSS WAITCNT D,S/# add ZCMS 1010101 ZCI CCCC DDDDDDDDD SSSSSSSSS LINK D,S/@ adr (jump to S/@, write {Z,C,P[18:0]} to D, 'LINK INA,S' = 'JMP S') --LS 1100000 L0I CCCC DDDDDDDDD SSSSSSSSS JP D/#,S/@ adr (jump if pin IN high, pins registered at beginning of ALU cycle) --LS 1100000 L1I CCCC DDDDDDDDD SSSSSSSSS JNP D/#,S/@ adr (jump if pin IN low, pins registered at beginning of ALU cycle) --LS 1100001 L0I CCCC DDDDDDDDD SSSSSSSSS REP D/#,S/# (begin repeat block of size D/# with S/# iterations) --LS 1100010 L0I CCCC DDDDDDDDD SSSSSSSSS WRFAST D/#,S/PTRx hub (waits for mem) --LS 1100010 L1I CCCC DDDDDDDDD SSSSSSSSS WRBYTE D/#,S/PTRx hub (waits for mem) --LS 1100011 L0I CCCC DDDDDDDDD SSSSSSSSS WRWORD D/#,S/PTRx hub (waits for mem) --LS 1100011 L1I CCCC DDDDDDDDD SSSSSSSSS WRLONG D/#,S/PTRx hub (waits for mem) --LS 1100100 LxI CCCC DDDDDDDDD SSSSSSSSS RDFAST D/#,S/PTRx hub (waits for mem) ZCWS 1100101 ZCI CCCC DDDDDDDDD SSSSSSSSS RDBYTE D,S/PTRx mem (waits for mem) ZCWS 1100110 ZCI CCCC DDDDDDDDD SSSSSSSSS RDWORD D,S/PTRx mem (waits for mem) ZCWS 1100111 ZCI CCCC DDDDDDDDD SSSSSSSSS RDLONG D,S/PTRx mem (waits for mem) --LS 1101000 L0I CCCC DDDDDDDDD SSSSSSSSS QSINCOS D/#,S/# 2 ..2 (1000) wait sys --LS 1101000 L1I CCCC DDDDDDDDD SSSSSSSSS QARCTAN D/#,S/# 2 ..2 (1001) wait sys --LS 1101001 L0I CCCC DDDDDDDDD SSSSSSSSS QROTATE D/#,S/# 3 ..2 (1010) wait sys --LS 1101001 L1I CCCC DDDDDDDDD SSSSSSSSS QMUL D/#,S/# 2 ..2 (1011) wait sys --LS 1101010 L0I CCCC DDDDDDDDD SSSSSSSSS QDIV D/#,S/# 3 ..2 (1100) wait sys --LS 1101010 L1I CCCC DDDDDDDDD SSSSSSSSS QSQRT D/#,S/# 2 ..1 (1101) wait sys -CLS wr if !L 1101011 LCI CCCC DDDDDDDDD SSSSSSSSS COGINIT D/#,S/# 2 1? (0010) wait sys + 2 if WC and reg ZCL- wr if C 1101111 0CL CCCC DDDDDDDDD 000000000 CLKSET D/# 1 0 (0000) wait sys + 2 if WC ZCL- 1101111 Z1L CCCC DDDDDDDDD 000000001 COGID D/# WC 1 0/C (0001) wait sys + 2 ZCL- wr 1101111 Z00 CCCC DDDDDDDDD 000000001 COGID D 0 1 (0001) wait sys + 2 ZCL- 1101111 00L CCCC DDDDDDDDD 000000011 COGSTOP D/# 1 0 (0011) wait sys ZCL- wr 1101111 ZC0 CCCC DDDDDDDDD 000000100 LOCKNEW D 0 1/C (0100) wait sys + 2 ZCL- 1101111 00L CCCC DDDDDDDDD 000000101 LOCKRET D/# 1 0 (0101) wait sys ZCL- 1101111 0CL CCCC DDDDDDDDD 000000110 LOCKCLR D/# 1 0/C (0110) wait sys + 2 if WC ZCL- 1101111 0CL CCCC DDDDDDDDD 000000111 LOCKSET D/# 1 0/C (0111) wait sys + 2 if WC ZCL- 1101111 00L CCCC DDDDDDDDD 000001110 QLOG D/# 1 ..1 (1110) wait sys ZCL- 1101111 00L CCCC DDDDDDDDD 000001111 QEXP D/# 1 ..1 (1111) wait sys ZCL- 1101111 00L CCCC DDDDDDDDD 000000000 SETQ D/# ZCL- wr 1101111 ZC0 CCCC DDDDDDDDD 000010001 RFBYTE D ZCL- wr 1101111 ZC0 CCCC DDDDDDDDD 000010010 RFWORD D ZCL- wr 1101111 ZC0 CCCC DDDDDDDDD 000010011 RFLONG D ZCL- 1101111 00L CCCC DDDDDDDDD 000010100 WFBYTE D/# ZCL- 1101111 00L CCCC DDDDDDDDD 000011000 WFWORD D/# ZCL- 1101111 00L CCCC DDDDDDDDD 000011100 WFLONG D/# ---- 11110nn nnn CCCC nnnnnnnnn nnnnnnnnn AUGS #23bits (appends n to upper bits of next immediate S in same task) ---- 11111nn nnn CCCC nnnnnnnnn nnnnnnnnn AUGD #23bits (appends n to upper bits of next immediate D in same task) Instructions to be added: ZCWS * 1011001 ZCI CCCC DDDDDDDDD SSSSSSSSS MSGIN D,S/# msg (waits up to 32 clocks to receive %010_data32 message on pin S/#, C=timeout) --LS * 1100000 1LI CCCC DDDDDDDDD SSSSSSSSS WAITPAE D/#,S/# (waits for INA) --LS * 1100000 0LI CCCC DDDDDDDDD SSSSSSSSS WAITPAN D/#,S/# (waits for INA) --LS * 1100001 1LI CCCC DDDDDDDDD SSSSSSSSS WAITPBE D/#,S/# (waits for INB) --LS * 1100001 0LI CCCC DDDDDDDDD SSSSSSSSS WAITPBN D/#,S/# (waits for INB) --LS * 1100100 0LI CCCC DDDDDDDDD SSSSSSSSS MSGOUTA D/#,S/# (send message to pin(s) on DIRA) --LS * 1100100 1LI CCCC DDDDDDDDD SSSSSSSSS MSGOUTB D/#,S/# (send message to pin(s) on DIRB) --LS * 1100110 0LI CCCC DDDDDDDDD SSSSSSSSS PICKZC D/#,S/# adr (always writes Z/C) - maybe change to {base5+offset4, base5) (1-based offset) ---- ---- * 1101100 00n CCCC nnnnnnnnn nnnnnnnnn * JMP #abs adr (jump to 19-bit absolute address) ---- * 1101100 01n CCCC nnnnnnnnn nnnnnnnnn * JMP @rel adr (jump to 19-bit relative address) ---- * 1101100 10n CCCC nnnnnnnnn nnnnnnnnn * CALL #abs adr (call to 19-bit absolute address, using 4-level stack) ---- * 1101100 11n CCCC nnnnnnnnn nnnnnnnnn * CALL @rel adr (call to 19-bit relative address, using 4-level stack) ---- * 1101101 00n CCCC nnnnnnnnn nnnnnnnnn * CALLA #abs adr (call to 19-bit absolute address using PTRA) ---- * 1101101 01n CCCC nnnnnnnnn nnnnnnnnn * CALLA @rel adr (call to 19-bit relative address using PTRA) ---- * 1101101 00n CCCC nnnnnnnnn nnnnnnnnn * CALLB #abs adr (call to 19-bit absolute address using PTRB) ---- * 1101101 01n CCCC nnnnnnnnn nnnnnnnnn * CALLB @rel adr (call to 19-bit relative address using PTRB) ZCW- * 1101111 ZCx CCCC DDDDDDDDD xxxxx0000 GETCNT D cog (get CNT into D) ZCW- * 1101111 ZCx CCCC DDDDDDDDD xxxxx0001 GETRND D cog (get RND into D) ZCW- * 1101111 ZCx CCCC DDDDDDDDD xxxxx0010 POP D cog (pop 4-level stack into D) (D[20:19] into Z/C via WZ/WC for POP..CALLB D) ZCR- * 1101111 ZCx CCCC DDDDDDDDD xxxxx0011 CALL D adr (call to D[18:0] using 4-level stack) ZCR- * 1101111 ZCx CCCC DDDDDDDDD xxxxx0100 * CALLA D adr (call to D[18:0] using PTRA stack) ZCR- * 1101111 ZCx CCCC DDDDDDDDD xxxxx0101 * CALLB D adr (call to D[18:0] using PTRB stack) --L- * 1101111 00L CCCC DDDDDDDDD xxxxx0110 PUSH D/# (push D/# into 4-level stack) --L- * 1101111 00L CCCC DDDDDDDDD xxxxx0111 SETVID D/# (set video mode) --L- * 1101111 00L CCCC DDDDDDDDD xxxxx1000 WAIT D/# (wait for some number of clocks, 0 same as 1) --L- * 1101111 00L CCCC DDDDDDDDD xxxxx1001 WAITPX D/# (wait for any edge on pin D/#) --L- * 1101111 00L CCCC DDDDDDDDD xxxxx1010 WAITPR D/# (wait for pos edge on pin D/#) --L- * 1101111 00L CCCC DDDDDDDDD xxxxx1011 WAITPF D/# (wait for neg edge on pin D/#) ---- * 1101111 00x CCCC xxxxxxxxx xxxxx1100 SETQ D/# ---- * 1101111 ZCx CCCC xxxxxxxxx xxxxx1101 RET cog (return using 4-level stack) ---- * 1101111 ZCx CCCC xxxxxxxxx xxxxx1110 * RETA cog (return using PTRA stack) ---- * 1101111 ZCx CCCC xxxxxxxxx xxxxx1111 * RETB cog (return using PTRB stack) ---- ---- wr * 111000r rrn CCCC nnnnnnnnn nnnnnnnnn LOCADDR reg,#abs adr (write 19-bit absolute address to $1F2..$1F9, includes PTRA/PTRB) ---- wr * 111001r rrn CCCC nnnnnnnnn nnnnnnnnn LOCADDR reg,@rel adr (write 19-bit relative address to $1F2..$1F9, includes PTRA/PTRB) ---- wr * 111010r rrn CCCC nnnnnnnnn nnnnnnnnn * LINK reg,#abs adr (jump to 19-bit absolute address, write {Z,C,P[18:0]} to $1F2..$1F9) ---- wr * 111011r rrn CCCC nnnnnnnnn nnnnnnnnn * LINK reg,@rel adr (jump to 19-bit relative address, write {Z,C,P[18:0]} to $1F2..$1F9)
They do. I will ask them about their construction. I think ours are fine, but they might have some other technique that is more reliable and compact.
Looking forward to flexing the new chips muscles...
Great work!
Keeping the whole 512k Ram free is a bonus that rewards your efforts.
As for the fuses construction technique, when selecting between yours and OnSemi, besides reliability and compactness considerations, IMHO an important aspect to take in account is the ability to "keep them well buried" in the mix of transistors and wires, so they cannot be easily pinpointed by geometrically (electron beam / optically) differentiable irregularities introduced in the layout by their presence and programmed state.
As for the current design step, does the Ram speed still outperforms the maximum expected COG clock rate by about 2:1?
Yanomani
With the ROM being copied to RAM now, we can design things that patch into the default code. Remember to add some clever hooks.
"LINK" appears both "current instructions" and "instructions to be added" with different opcodes. Perhaps these are different variants of Link
The chip is designed so that the logic speed does not exceed the RAM speed, allowing the logic to be sufficiently complicated, while not going (much) under the speed limit imposed by the RAMs. So, the RAM speed is the target for everything else. RAMs are slow, compared to logic, so they set the speed limit.
The LINK instruction not implemented yet will be able to provide a 19-bit constant, but is limited on where the return address can be stored.