P2 Serial / Shift Register discussion

jmg · 2013-10-06 16:57

Cluso99 wrote: »

[*]Regarding Q-SPI, read/write of data effectively goes thru a barrel shifter (think that's the correct term). D0-3 have to go to separate shift registers for D0, etc. Same applies to read. Others may refer this as muxing. We only need to be able to explain what is required and Chip will understand.

I would not really call them separate shift registers, as you want the Read/Write to be normal 32 bit, no shuffled nibbles or anything. One shifter(buffer) is read/written, just like std SPI.

The HW morphs the existing shifter, to a Quad-mode one - not hard to do in HDL.

Cluso99 · 2013-10-06 17:10

jmg wrote: »

This becomes semantics, one mans divider can be another mans counter.

Most modern BRG are reloadable dividers/counters, so they are not binary, but /N.
because RX needs to do a half-bit offset, there usually is an 'Even N' implied so the formula Fb = FSys/(2*N)

Next, practical Async sample-width/jitter considerations usually have N > 2, but a CLK-OUT Async mode could support FSys/2,
up to the pin-limit.

I prefer baud formula. to words. Makes it very clear what is supported.

So what are you saying here?

Do you want /n^2 (160MHz clock --> 80/40/20/10/5/2.5/1.25Mbsp etc)? This will not get 115200 baud etc.
Or /n (160MHz clock divided by n binary)? This is what we do in sw currently in P1.
Agreed we need to specify actual values of 2*baud for the 1/2 bit to sample in the middle of the bit.

Cluso99 · 2013-10-06 17:12

jmg wrote: »

Software is doing the bulk - see

Length=32, TxMode(drive pins) Send Address. [4 clocks]
Length=8, TxMode(drive pins) Send Mode [1 clock]
Length=24, RxMode(Float pins) Discard RX data [3 clocks]
Length=32 (DLP size), RxMode(Float pins) Discard RX data [4 clocks]
Length=UserChoice, RxMode(Float pins) Vaild RX data [Length/8 Clocks]

The HW is minimal, - a couple of lines of HDL to morph shifter to Quad compatible, (it reads/writes bit-correct, no shuffles needed)
and even DDR is not complex if you have a higher SysClk, which P2 does.

To me the pin-mapping is the more complex issue, but I think Chip had that covered.

in DDR, standard Shifter clocks from faster internal CLK, and that toggles a pin, for the slower DDR clk. (master mode)
Some Tpd care would be needed, to phase the CLK pin and Data (this could be Data from a other-edge FF, at SysClk speeds)

The Spansion part specs 66MHz DDR.Quad, so depending on the Prop 2 pins I'd guess 50~66MHz nibble rate.

The P2 already has a SDRAM driver in hw/sw. IMHO we don't need hw support for another DDR DRAM.

jmg · 2013-10-06 17:21

Cluso99 wrote: »

...This is one of the problems with FS USB. Without a shift register, FS is proving to be quite difficult at 80MHz but I think I will eventually get there.
I am basing this on the USB FS that I am working on.
Currently all I am trying to do is monitor the FS data. I am not getting enough time to get it going yet.

Could you use a 2nd COG to emulate a shifter, in the short term ?

If you want to phase-snap in SW, and then HW clock sample, the start-action of a shifter is going to be quite important.

Normally, in SPI, you would not disturb the Baud divider on Write, as you want gapless and aligned CLK edges.

Where a HW shifter is used for SW locked data sampling, you will want to be able to move that CLK edge.

That's more a Async property, so might be best coded as an Async sub-mode. (not in classic SPI )

Generic Async is
Start-Edge from Pin
Start bit and Stop bit. First Data is 2nd CLK edge.

Special Async would be
Start Edge from Software Write - ie your SW, not a pin, fires the State engine.
Start/Stop bits are removed, and First data bit is first clk edge (half a bit time from SW-Start)
Length and Baud are standard.
This probably needs an Enable rather than trigger scheme, so it locks on start, then continues until you Stop, or re-start it
Re-start would be a SW-resync action. Shifter delivers sampled data in 32 bit words.

There is no CLK out needed

jmg · 2013-10-06 17:23

Cluso99 wrote: »

The P2 already has a SDRAM driver in hw/sw. IMHO we don't need hw support for another DDR DRAM.

Grab the data on the S25FL128SA.
This is not another DDR DRAM device, it is DDR 8 pin Serial Flash

jmg · 2013-10-06 17:26

Cluso99 wrote: »

So what are you saying here?

Do you want /n^2 (160MHz clock --> 80/40/20/10/5/2.5/1.25Mbsp etc)? This will not get 115200 baud etc.
Or /n (160MHz clock divided by n binary)? This is what we do in sw currently in P1.
Agreed we need to specify actual values of 2*baud for the 1/2 bit to sample in the middle of the bit.

?
My formula is clear, it does not use ^2, it uses
Fb = FSys/(2*N)
I can paste that into a calculator.
N = 347 for 80MHz and nearest 115200 value. (0.06% error)

Cluso99 · 2013-10-06 18:17

Here is an updated drawing for one shift register. It corrects a number of errors and adds the start/stop bit muxes and the address register.

By allowing a "1" to be shifted in at the end, it allows the shift register to run continuously.
The 4 address bits can be bypassed via the mux.
The start and stop bits can be bypassed via the mux.
Chip has implemented something like the above (excluding the start/stop bypassing and "1" input).
Of course, I have made the shifter usable for both tx and rx, whereas Chip has a dedicated tx and rx set.

Cluso99 · 2013-10-06 18:21

jmg wrote: »

?
My formula is clear, it does not use ^2, it uses
Fb = FSys/(2*N)
I can paste that into a calculator.
N = 347 for 80MHz and nearest 115200 value. (0.06% error)

Great. We are both agreed on this then

That is also the basis of Chips work too. It provides the maximum flexibility and does not require special xtals like we had to use many years ago.

pjv · 2013-10-06 21:54

Cluso,

A swallow counter is a dual modulo (such as 31/32) fractional N divider frequently used in the digital feedback loop of RF pll's. It allows (nearly?) any frequency to be generated from a crystal. To be clear, I'm not looking for a seperate pll for the baud generator clock, but just a dual modulo counter process run off the standard clock.

Cheers,

Peter (pjv)

jmg · 2013-10-06 22:27

pjv wrote: »

A swallow counter is a dual modulo (such as 31/32) fractional N divider frequently used in the digital feedback loop of RF pll's. It allows (nearly?) any frequency to be generated from a crystal. To be clear, I'm not looking for a seperate pll for the baud generator clock, but just a dual modulo counter process run off the standard clock.

Do you need Dual modulo, given you start from 80~160MHz ?

The above formula give this :

80M/(348*2) = 114942.528
80M/(347*2) = 115273.775

1-115200/(80M/(347*2)) = 0.064%
1-115200/(80M/(348*2)) = -0.224%

and most UART-USB modules have virtual Baud values of 24MHz/N (or 96MHz/N for FT232H )
That means if you can hit 24MHz or 96MHz, you can match any baud rate a USB-Bridge can generate.

Whilst you could add a rate-Multiplier LSB modulator, to your Dual Modulo Counter
( probably a MOD 10 would give one more digit, and be useful on a single char of 10 bits) the question is do you need that extra precision ? - it costs 4 more bits of register space.

Remember the PLL itself also now has finer control on P2

Formula is then
N=347;K=2; 80M/((2*(N+1)*K+2*N*(10-K))/10) = 115207.373

Average Error
1-ans/115200 = -64.004ppm

ctwardell · 2013-10-07 07:25

Cluso99 wrote: »

Here is an updated drawing for one shift register. It corrects a number of errors and adds the start/stop bit muxes and the address register.

By allowing a "1" to be shifted in at the end, it allows the shift register to run continuously.
The 4 address bits can be bypassed via the mux.
The start and stop bits can be bypassed via the mux.
Chip has implemented something like the above (excluding the start/stop bypassing and "1" input).
Of course, I have made the shifter usable for both tx and rx, whereas Chip has a dedicated tx and rx set.

Do you have a plan for transmit and receive buffers?

It seems like they would be very helpful in allowing loose coupling to software.

We would need to set how many bits to send/receive before latching/loading.

It looks like Chip's current async has input buffering.

Chris Wardell

Cluso99 · 2013-10-07 11:01

ctwardell wrote: »

Do you have a plan for transmit and receive buffers?

It seems like they would be very helpful in allowing loose coupling to software.

We would need to set how many bits to send/receive before latching/loading.

It looks like Chip's current async has input buffering.

Chris Wardell

I am in two minds about this...
* allow 2 of the 4 shift registers to act in pairs to buffer/latch
* not buffer at all, but allow continual shifting and latch a bit counter on read/write

I have been thinking this thru overnight and currently think that at least the tx does not require buffering. By having the stop bits fill the unused space (and shifting them in as a bit is shifted out) the tx can run continuously. As long as the sw can detect that the byte(s)/block(s) have been sent, that's all we need to know to load a new value.
For instance, we could write 11_1cccccccc0_1bbbbbbbb0_1aaaaaaaa0 to the tx shifter (being 3 chars 8,N,1 including our own start/stops), read the global CNT register and add 3*10 bit times (which means we know when the 3 chars have been sent by waitcnt/passcnt, and go off to do other things.
To just send 1 char, write 11_1111111111_1111111111_1aaaaaaaa0, read CNT and add 1*10 bit times.
This method makes it easy to do 8,E,2 or any other async variation.

But to do async with auto start/stop, we could also just write 8 bits and the hw would effectively put _1111111111_1111111111_1aaaaaaaa0 into the shift reg.

Using the same hw is easy to do sync too.

I am not so sure Chip has buffers/latches in his serial implementation.

jmg · 2013-10-07 12:14

ctwardell wrote: »

Do you have a plan for transmit and receive buffers?

It seems like they would be very helpful in allowing loose coupling to software.

We would need to set how many bits to send/receive before latching/loading.

It looks like Chip's current async has input buffering.

Chris Wardell

Any serial port needs at least some buffering (some call it a holding register), but it does not need to be a deep FIFO buffer.

Usually there is also a flag showing when such a register can be written/read.

eg Tx, such a Ready flag is cleared by SW on Write, and set on transfer to the shifter.
A byte waits in the holding register, until the shifter can accept it, and that transfer should be gapless.

In Async RX, the flag is set on mid-stop, and often also captures Stop bit level (for Break detection), and clears on Read.

Software has a full char time, to read/write the holding registers.

pjv · 2013-10-07 13:30

Cuso;

I believe buffering is required. Perhaps not for conventional UART comms, but I need to be able to stream data continuously without bit-stuffing. Otherwise the receive end will lose sync.... a disaster, or at least a bad scene for one of my industrial applications. Hiccups in the data stream due to software timing requirements should just not be tolerated. Double buffering gets around that, at least to the extent that then software the software feeding the data has some flexibility in delivering it.

Cheers,

Peter (pjv)

ctwardell · 2013-10-07 14:02

We need some input from Chip/Ken as far as how much time there is to get something added to P2.

I'd love to see a super flexible SR that can do every protocol one could imagine, but I don't think we have the time to work that out.

If we could get a 'basic' USART similar to that on a PIC it would be really useful. Async and simple SPI I think are a must, I2C would be nice but I'm not sure if it still needs some type of license if specifically implemented.

Chris Wardell

jmg · 2013-10-07 14:20

average joe wrote: »

My interest for USB is simple devices such as keyboard and mouse. FLASH drives would be a great addition but not entirely necessary.

There are USB in Software solutions out there, on small uC at Low speed USB (1.5MHz ), but they are not quite fully valid.
They need clocks of >12MHz

For many apps, small payloads and 1.5MHz is ok, so perhaps it is better to target a functional 1.5MHz solution, in SW, as the 80MHz P2 threads, should allow quite a few cleanups.

For more serious apps, even 12Mbd is too slow, and there a FT232H can be used.

If the Prop can feed FT232H serially, to spec, at 50MHz that covers the serious transfer users.

Most low cost, USB-Uart devices struggle to deliver even 1MBd sustained data flows anyway, so a good 1.5MHz solution, can be useful.

jmg · 2013-10-08 22:38

Just checking the new intel Quark.
Data is still sparse, but they do mention
* i2c to 400KHz
* UART to 2.764800Mbd
* SPI to 25MHz (Quark as master)
* SD card interface mentions 50MHz
* 2 USB Ports mentioned, unclear if they are FS or HS - surely HS ?
* Ethernet 10/100

So a serial Block that can hit/exceed these speeds, will give an appealing P2 as Quark peripheral .

77.414400MHz ( 11.0592M * 7 ) can hit 2.7648Mbd exactly with a 2N divisor, and 19.3536MHz SPI with /4 as slave mode.

Or, 80MHz can get to 1.3824MBd, with 0.22% deviation, and give 20MHz SPI @ /4
100MHz is needed for 25MHz SPI slave @ /4
150MHz would do 25MHz SPI Slave with /6, and 37.5MHz SD with /4

The 50MHz SD speed, is likely to need final P2 silicon.

Cluso99 · 2013-10-10 15:53

I am still working on sw to read 12MHz. Without serial shift input this is a tall ask, at least with 80MHz clocking. I can sample at 7/6/7 (6.667 clocks) but there is no time to do anything else, including unstuffing.
So, IMHO we need a shifter for both in and out that isn't encumbered by start/stop. A free-running shifter will at least help as a minimal inclusion.

jmg · 2013-10-10 15:57

Cluso99 wrote: »

So, IMHO we need a shifter for both in and out that isn't encumbered by start/stop. A free-running shifter will at least help as a minimal inclusion.

Does 1.5Mhz work ok ?
I think Chip was including SPI and even QSPI, but you could proceed in the meantime with an external shifter or 4 or 8 bits, clocked from a counter ?

addit:
Thinking about this, some more, I think a HC595 with SHCP from one ctr @ 12MHz, phased in SW, and STCP driven from another counter, @ 1.5MHz, also phased in SW.
There is some creep between these two counters, but USB should never run over 1ms, which is 12000 SHCP and the numeric creep is well under that.

More of an issue will be Osc matching, with no re-sync-on edge.

USB spec only commits to 2,500ppm, and the note I find here, indicated for two PCs tested, we measured 184 & 179ppm faster than the nominal 1.000ms rate

One part in 12000 is 83ppm, but just maybe the Prop2 is nimble enough to capture and compensate that ?

A 80MHz p2 can capture 1ms to 12.5ppm, and the new maths opcodes may make this possible :
round(12000*2^32/79985) = 644365913 where that 79985 is the CalCapture

If you can work to that capture precision, you will clock align average drift by ~.0.15clk in 1ms

This is certainly using the Prop resource

Cluso99 · 2013-10-10 17:40

jmg wrote: »

Does 1.5Mhz work ok ?
I think Chip was including SPI and even QSPI, but you could proceed in the meantime with an external shifter or 4 or 8 bits, clocked from a counter ?

addit:
Thinking about this, some more, I think a HC595 with SHCP from one ctr @ 12MHz, phased in SW, and STCP driven from another counter, @ 1.5MHz, also phased in SW.
There is some creep between these two counters, but USB should never run over 1ms, which is 12000 SHCP and the numeric creep is well under that.

More of an issue will be Osc matching, with no re-sync-on edge.

USB spec only commits to 2,500ppm, and the note I find here, indicated for two PCs tested, we measured 184 & 179ppm faster than the nominal 1.000ms rate

One part in 12000 is 83ppm, but just maybe the Prop2 is nimble enough to capture and compensate that ?

A 80MHz p2 can capture 1ms to 12.5ppm, and the new maths opcodes may make this possible :
round(12000*2^32/79985) = 644365913 where that 79985 is the CalCapture

If you can work to that capture precision, you will clock align average drift by ~.0.15clk in 1ms

This is certainly using the Prop resource

Your posts are so negative! Why don't you want a generalised shift register? Using external devices to achieve this is just not going to cut it.

Why do you then want a UART when you can achieve that with a generalised shift register?

I don't mind if the shifter also has UART features, but not at the expense of a general shifter. The UART can simply be achieved in sw with a generalised shifter.
As to the resyncing issue, this is possible in sw anyway. USB is well within the P2's limits to read 32 bits without syncing on other than the first bit. And we could use oversampling too.
We can do SPI with a generalised shifter providing we can use external clocking.

Quad SPI looks a lot harder, but not impossible - it will just take some more time to flesh out the hw assist options.

jmg · 2013-10-10 18:07

Cluso99 wrote: »

Your posts are so negative! Why don't you want a generalised shift register? Using external devices to achieve this is just not going to cut it.

?? You need to read more carefully. Negative I am not.

I am all for a generalized shift register, but I take generalized to mean what it says, and I also apply the HW does Bits and SW does Bytes rules.

That means my generalized shift register, is like most users, and does UART and SPI and SPI Slave and I2S and does all those options in a flexible and easy to understand manner with no kludges.

The main issue I see with Quad SPI is pin mapping.
Actually making a shifter do QuadSPI, as an option, is a few lines of HDL and not much logic.

KC_Rob · 2013-10-11 16:25

This seemed an important topic. Has anything been settled? A consensus reached?

potatohead · 2013-10-11 20:50

Chip took the discussion and is thinking on it. I suspect we will learn what he did after he gets through sorting out the opcode space.

Cluso99 · 2013-10-12 03:30

AFAIK Chip is doing other more important things while we continue to discuss this.
Currently I am deep in USB FS to see what mods to a generic shift register would help.

MJB · 2013-10-12 04:49

I think I haven't seen the idea of using a QUADLOAD to load the 4 longs shown in the first post ...

seeing the picture just naturally triggered this question ...

Cluso99 · 2013-10-12 05:33

mjb: this pic has nothing to do with the quad r/w instructions, just in case you got the wrong idea.
But if you mean chaining all 4 shift registers, then yes, with my generalisead shifters, this should be easy. Obviously ths depends on what chip implements. I am just throwing out some concepts to stir the interest in whatmight bepossible with generalised shifters. I think there are enormous possibilities in addition to sync rx/tx and async rx/tx with/without start/stop handling.

I am away atm for a wedding. as soon as i gethome i will add another mux/xor feedback + a config bit, to feedback from the first rx bit and first tx bit.

MJB · 2013-10-12 11:29

Cluso99 wrote: »

mjb: this pic has nothing to do with the quad r/w instructions, just in case you got the wrong idea.
... snip ...

I got you right ...;-)

but again - what about loading your 4 longs with one quadRD (like) instruction?
would give plenty of time for other work in between ...

Cluso99 · 2013-10-12 16:33

mjb. i can see far better instructions to add in the p2 than this one.

Cluso99 · 2013-10-13 02:20

This generic shift register block diag adds an optional (configurable) feedback mux/xor required for NRZI as used in USB.

I have assumed that we can optionally invert the data input, data output, and the input and output clocks - these are not shown.

pjv · 2013-10-13 08:58

Cluso,

Although you do not show it, I trust your concept includes in/out buffering with auto-load on transmitting. Without those features the value of the shifter is severely restricted.

Cheers,

Peter (pjv)

P2 Serial / Shift Register discussion

Comments