Pin Transfer (related to synchronous or clocked communication)
Ramon
Posts: 484
(I think this is somewhat related to SERDES or SPI too).
1) I am reading Prop2_Docs again and I think that there should be some more granularity for automatic transfers.
Currently there are two options for transfer size: 16 pins or 32 pins (for both AUX and WIDE). I wonder if it could be possible to create a MASK and be able to select the meaningful pins that we want to write. (And I say for write only, because in case of read there is a software solution: doing AND). In case that we want to use lower number of pins this avoid wasting pins (and maybe wasting current, on introducing a lot of digital noise). Typical examples would be SPI (1 pin) QUAD SPI (4 pins) or generic 8 bit interfaces for FPGA/CPU/USB transceivers, etc... (In case of synchronous/clocked protocols, it could be possible to define a pin that automatically toggles to generate the clock?, Is this clock source the same as CPU clock or can be different?)
2) (related to the last questions) How to synchronize the automatic pin transfers READs with a external clock instead of the COG/HUB clock? For example: some 1 pin serial synchronous communication protocol that has a slow clock of 10 Mhz and our P2 is running at 100 MHz.
3) Another question I want to ask is ... what is SPB? It is another name for PTRA/PTRB? or a different pointer? (There is no explanation for this in the doc).
1) I am reading Prop2_Docs again and I think that there should be some more granularity for automatic transfers.
Currently there are two options for transfer size: 16 pins or 32 pins (for both AUX and WIDE). I wonder if it could be possible to create a MASK and be able to select the meaningful pins that we want to write. (And I say for write only, because in case of read there is a software solution: doing AND). In case that we want to use lower number of pins this avoid wasting pins (and maybe wasting current, on introducing a lot of digital noise). Typical examples would be SPI (1 pin) QUAD SPI (4 pins) or generic 8 bit interfaces for FPGA/CPU/USB transceivers, etc... (In case of synchronous/clocked protocols, it could be possible to define a pin that automatically toggles to generate the clock?, Is this clock source the same as CPU clock or can be different?)
2) (related to the last questions) How to synchronize the automatic pin transfers READs with a external clock instead of the COG/HUB clock? For example: some 1 pin serial synchronous communication protocol that has a slow clock of 10 Mhz and our P2 is running at 100 MHz.
3) Another question I want to ask is ... what is SPB? It is another name for PTRA/PTRB? or a different pointer? (There is no explanation for this in the doc).
Comments
SPB is now called PTRY.
The serial transceiver is wonderful, but works asynchronously. I don't know if some kind of serial synchronous communication can be implemented too.
I'd also love to see:
8 bit XFER
clkfreq/n clock for XFER (or CTRA/B)
external clock input for XFER (supports QV camera modules)
4 bit XFER at clkfreq/2 ... would provide direct support for QSPI flash
To summarize:
Ideally XFER should be configurable for:
32/16/8/4 bit transfers
clkfreq/(clkfreq/n)/CTRA|B/ext clckin
XFER already has clkout (for SDRAM)
optional "strobe" pins for /RD and /WR (when XFER accesses pins)
pins/serdes to/from AUX/COG/WIDE
Adding 2 bit transfers (differential) would also handle USB data transfers, and other differential SERDES
I'll expand that a little
Allow variable length transfers 1..31 bits (once you have 4 choices, may as well give full control)
9 bits is a very common size & JTAG wants full clock-count control.
( P2 will make a killer JTAG hub.
If there is room for a 2 JTAG SPI sends, that would shrink code more
- needs Std SPI, plus a simple variant of Dual-out(TDI,TMS), 1(or2) in (TDO,(Spare)). Might come as part of QuadSPI options ? )
Support for QuadSPI & I2S(Audio) will be important.
Some micros have UART Baud formula that is fSys/(2*N), ie they are more granular that the ancient classic Baud of fSys/16/N
Better Baud granularity is always a good thing. It allows more Xtal/PLL choices, and lower system clocks.
(users can always make N a multiple of 8 if they want to)
I would like to see the counters able to be driven or output to pins (think we can output them). I would also like to see chaining the counters as possible. The counters could be used to do the clocking of SERDES. By inputting a clock pin to SERDES and the counters, we could use the counters to count the incoming clocks.
For extremely flexible SERDES, we need to be able to just have it free-running (clocked of course), and be able to read at any point in time and also be able to read how many clocks have occurred in that time.
SERDES time is fast approaching, so it's time to get those thinking caps on
jmg: I have never used jtag, so my understanding is limited. Do you have a short reference that sums up its capabilities?
I used this
http://www.fpga4fun.com/JTAG2.html
JTAG is possible with a standard SPI port (assumes full mode and endian and length controls) and Software for the State-change handling.
In the state-move modes, there are two output pins clocked as pairs, TDI and TMS.
Once a new state is reached, TMS tends to stay static while data is moved.
So a SERDES mode that was very similar to 2 bit SPI (shifting 16b+16b from a 32 b load ?) would speed up those State-modes (and shrink the SW engines)
The difference from 2 bit SPI, is the TDO pin is read in, in duplex, during the 2 bit transfers, whilst most Dual/Quad SPI is run half duplex on the pins.
The SPI 1 bit engine is inherently duplex, so that ability is likely sitting there in silicon.
Ideally JTAG needs the means to run in either of
a) 32b x 1 mode (3 pins) TMS static, or
b) 16b x 2 mode (4 pins) TDI+TMS data shifts at Baud speed, and DUT.TDO is read in on one pin (16x1)
JTAG does need full control over the number of clock pulses.
I think you have misunderstood.
I am not talking about adding a JTAG slave to the P2, as the original design does not allow for that.
This is about using the SerDes as a JTAG master, or more likely, a multi-master JTAG HUB in a ATE system.
Given SerDes will likely do x1 SPI and x4 SPI, the middle case of a x2 variant for JTAG, is more an in-between case, than anything radically new.
What are the benefits to the P2 if it can be a JTAG master or multi-master JTAG HUB in a ATE system? What kinds of things can be done with this feature? What are the costs? What is lost if the feature isn't provided?
Will the P2 need TAP controller logic?
JTAG has a protocol associated with it. Does that need to be built into the P2?
The experts are saying it needs JTAG support, ok, why? What is the upside, what is the downside? Costs, risks, benefits?
If I'm the only one that doesn't know all this, llease let me know. I'm really just trying to learn and understand.
1) I think XFER is the way to go for 32/16/8/4 bit clocked transfers between pins and AUX/COG ram.
The clock should be one of: clkfreq, clkfreq/n, CTRA/B, ext_clk_pin (input or output)
I think this should be separate from SERDES
I don't see any need for arbitrary width bits, the next size that fits can be chosen from 32/16/8/4
This would also allow for fast prop2-to-prop2 parallel transfers, and would also give us 100Mhz QSPI flash
2) After chewing on it, it seems to me that the logic for SERDES would be significantly simpler if it only supported single bit clocked transfers, as 4 bit transfers don't really fit the serial world, but sure do fit the XFER world...
3) I am still debating weather USB fits in SERDES or should be a separate block.
AND
with tasks, I can see one task in a cog servicing SERDES, and another controlling XFER transfers.
Adding CTRA/B as potential clock sources gives us back P1 style parallel VGA
a) 32/16/8/4 bit is ok. But If we want SPI to fit here, then I we need to add 1 bit too: (32/16/8/4/1) + clock .
b) arbitrary width bits is needed. Some examples: 8b/10b, I2S (24 bits) , and a lot of SPI ADCs with 10/12/14/18 bits
c) About tasks and cogs: without task I am thinking that a single cog can make RX or TX but not both. The pointers in AUX are not as powerful as those in PTRA or INDA. They lack the TOP and BOTTOM limits. I think that it is best to use AUX (instead of WIDE) due to higher buffer capacity (256 bytes instead of 8 LONGS). But at the same time, having no way to create a circular buffer we cannot split AUX in to buffers.
Example:
RX buffer -> AUX[PTR1] from 0 .. 127 bytes
TX buffer -> AUX[PTR2] from 128 .. 255 bytes
(PS: Sorry, I don't know if I have written correctly my concern, so all of you can understand what I mean this last point)
XFER = 32/16/8/4 bit parallel input/output, internally or externally clocked
SERDES = serial input/output of N bit works, clocked in/out serially (where N is say 1..32, even though less than 5 bits does not really make sense)
Regarding SERDES RX/TX - one cog must be able to RX/TX at the same time, some SPI chips require it.
Also for serial, I don't see a need for automatic buffer filling/emptying, as even a task would be able to handle full speed with just one level of buffering.
Two different subsystems.
It is not efficient to make ONE set of circuitry handle serial shifting as well as 4/8/16 bit to 32 bit multiplexing/demultiplexing.
Ah, so XFR is what you call a parallel IO port ?
Parallel i/o would naturally include 12, 18 and 24b modes for LCD displays, surely ?
Parallel LCD drive is an important market area, and one potential use here would be to replace a SSD1963 with a P2.
Does the pin-budget allow that ?
Then there is the FTDI Eve Device - a QuadSPI slave -> LCD Display stream might fit in a P2.
I think the RAM is similar, but the Eve has Font memory as well, but that could be an external SPI flash (again QuadSPI) ?
Depends on what you are doing - that less than 5 bits may be tacked on the end of a stream, to give a precise number of total clocks. The full 1..32 makes sense.
'efficient' can mean many things.
It is certainly more flexible having Two different subsystems, as you can then use both at once. That's a big plus.
However, most of the complexity is in the pin mapping and selection, each shift width option is only one more line in a Verilog MUX statement.
Two subsystems will need to duplicate all Control & Baud & Buffer registers.
Also note that moving the 4 bit over into XFR from SPI, is not the ideal, because QuadSPI needs both modes.
Most quad systems start as 1 bit and then flip-into Quad mode, when that make sense, so a system really does need 4/1
choice at the flip of a bit.
That same SPI may be SS muxed onto multiple parts, one of which is Quad. You do want full speed 1 bit on the others.
There are two different JTAG topics, some were asking for a JTAG slave, with JTAG pins. (as most FPGAs & CPLDs do)
That has advantages for Testing, and volume production board testing. That is not in present P2 silicon.
JTAG master is different, that is where the P2 can use any 4 pins, and the SerDes HW in SPI mode, with SW to create a
JTAG system.
It's another application area and market, and the P2 is well suited to this sort of use.
If it has a JTAG Slave then yes, it needs this in silicon, however JTAG master can do this in SW, or it can do it better with a
2-Bit SPI port, with one boolean option bit.
The best way to explain this, is to have the proposed SPI & SpiQuad and include SPI Dual on that.
That SPI dual mode, has a single option bit that does
a) SPI Dual, in Half-Duplex, ie CLK, IO0, IO1, SS << this mode is on many Flash chips, could be useful on pin-limited designs.
b) SPI Dual in Full-Duplex connected to a remote device as TCK, [TDI, TMS], and [TDO,Spare]
[TDI, TMS], are Data OUT mapped to IO0, IO1
[TDO,Spare] are Data replies, ( can naturally be on IO2, IO3 of SPI_Quad)
Spare Data in path comes for free, and could be user-looped to TDI stream, as part of Self-testing designs.
With b) you now have HW able to do high speed, compact JTAG master, by choosing SPI_x1 and SPI_x2_FD
Mode b) could use the same pin-mapping rules as SPI Quad,
Cost over a SPI_Dual/Quad is one noolean config bit, and Duplex choice mux.
Quite similar to the small HW cost of adding I2S Audio, which needs a Dual clock method, Fast CLK and frame/L-R clock.
I agree, USB and SERDES need to be different blocks. However, they (and SERIAL) could all share the baud generator logic.
SERDES also requires NRZ/NRZI mode too.
I think sharing Baud is not quite practical, because the USB Receive Baud bock includes DPLL edge resync, which is not needed on other Baud gens, and the USB can get away with a smaller just 8 bit Baud counter, which helps lower the cost of having this separate.
The Receive Baud free-runs on no-edges, and could likewise free run on transmit to manage edge timing on sends.
No, I am intentionally linking XFR and SERDES together. Being SERDES (or LVDS) an optional PHYsical feature.
XFR: "automatically move data between pins and WIDEs/AUX, in the background, while instructions execute normally." (Prop2_docs.txt)
I agree. The question is: how can a cog do simultaneous RX/TX without using TASKs? It needs two buffers, and currently AUX is a single block. (And WIDE is maybe too small to be used as synchronous TX/RX buffer.)
The key concept is "automatically move data between pins ... in the background, while instructions execute normally".
I don't want a task doing tranfers. I dont wan't even a single CPU cycle wasted to transfer data.
This is the most awesome and simple DMA we could ever imagine ! But I have difficulty trying to get the whole HW/SW picture of this.
What I am thinking now is how this DMA will be handled, we need to NOTIFY the cogs in some way: Maybe an instruction called WAITXFR?
WAIT_XFR_RX
WAIT_XFR_TX
WAIT_XFR_RX_buffer_FULL
WAIT_XFR_TX_buffer_FULL
But, If one cog is in charge of RX/TX communication, how other cogs gets notified?