Big update for DE2-115 and DE0-Nano users w/add-on boards

Sapieha · 2013-10-01 16:08

Hi Chip.

How many FIFO registers have any SER port?

Bill Henning · 2013-10-01 16:20

I have a dumb question...

Why not simply have a 2:1 mux, and

- use the (optionally divided) P2 SysCLK

*OR*

- use the external pin defined as the clock input?

I don't think inputs are registered, so unless I am missing something, this should work...

jmg wrote: »

Usually external Clocks, are handled essentially as a clock enable - you sample to find the defined edge, and then enable shift using master clock, on that edge.
The Shifters are always clocked by P2 SysCLK - the remote slave clock simply says 'wait for me' .
In master mode, the Clock Enable can feed from the ClkDivider carry, so Shifters are always clocked by P2 SysCLK

That imposes a lower slave clock rate, which is common.

jmg · 2013-10-01 16:28

Bill Henning wrote: »

I have a dumb question...

Why not simply have a 2:1 mux, and

- use the (optionally divided) P2 SysCLK

*OR*

- use the external pin defined as the clock input?

I don't think inputs are registered, so unless I am missing something, this should work...

You are missing something, the next step in the chain.

Simple clock muxing 'works' only to a point : the shifter might be ok, but what does the Prop core do, when it wants to read that shifter ? Worst outcome is a Core read right when the external clock moves everything.

That is why it is better to design with one clock, and use synchronized clock enables. All actions are now Ts.Th safe.

Prop 2 sysclk speeds are quite high, so even a SPI slave will have decent speed.

Cluso99 · 2013-10-01 16:54

cgracey wrote: »

You know, because we have nine bits in S immediate and only 8 needed for random access, the top bit could trigger SPA/SPB w/offset and pre-/post- add/sub, just like RDXXXX/WRXXXX instructions. I'd need to rob an INST D,S instruction code. Maybe ENC could become a D-only instruction and in it's place I could put a RDSTACK/WRSTACK instructions which key rd/wr from the WR bit, just like the hub-memory instructions do. This new instruction would supplant PUSHA/PUSHAR/PUSHB/PUSHBR/POPA/POPAR/POPB/POPBR and provide way more flexibility. SPA and SPB would continue to exist (just like PTRA and PTRB), but become a lot more flexible. Great ideas, Bill!

Absolutely fantastic!!! Makes the fifo totally usable as fast random variables too.

Differential I/O is actually a configurable pin function, and doesn't need two output signals. All even/odd pin pairs can use their neighbor's OUT signal from the core as their own OUT signal, and can specify inversion, too. A pin pair can be set to input as differential signal on either or both IN signals.

Also fantastic!! Not sure how you have the differential input working. USB sends both pins the same polarity for certain flag functions.

jmg · 2013-10-01 17:31

Cluso99 wrote: »

USB sends both pins the same polarity for certain flag functions.

There are more fish-hooks in USB - it also needs bit stuff, on Tx/Rx and whist that is not a lot of logic, it also needs to synchronize in order to make that decision.
At moderate speeds that usually meant a PLL, but it might be possible at higher speeds (say >120MHz) to get away with a reloadable 4-5 bit divider ?

What frequency does the PLL run at in P2 ?
For the FPGA emulators, what clocks are available above 80MHz ? ( 240MHz, 320MHz ? )

cgracey · 2013-10-01 17:39

jmg wrote: »

There are more fish-hooks in USB - it also needs bit stuff, on Tx/Rx and whist that is not a lot of logic, it also needs to synchronize in order to make that decision.
At moderate speeds that usually meant a PLL, but it might be possible at higher speeds (say >120MHz) to get away with a reloadable 4-5 bit divider ?

What frequency does the PLL run at in P2 ?
For the FPGA emulators, what clocks are available above 80MHz ? ( 240MHz, 320MHz ? )

We can get whatever we need on the FPGA, as far as frequencies go, but whatever goes on the chip will have to be, like you said, using clock enables. A divider is one way to go, but an overflow accumulator tracks better, over multiple cycles. And they don't need much circuitry, either.

Hey, the info about catching the start bit, then testing it again at 1/2 bit period is a great idea. I will change our shifters to do that. In two state bits, I was only using 3 states, so that empty state will now get used, as this adds one state to the machine.

cgracey · 2013-10-01 17:46

Sapieha wrote: »

Hi Chip.

How many FIFO registers have any SER port?

There is one buffer register each way.

Cluso99 · 2013-10-01 17:48

Here is my take on a simple serial interface...

In this case, we need to be able to read/write to both the receive serialiser and transmit serialiser so we can do any software checking required. We can insert the start bit in our bitstream and stop bits too by writing the appropriate bit pattern into the transmit register.
Therefore, we do not want to block read/writes to the registers.

Since we can invert pins at the pin interface, we do not need to invert the pins in the serialiser, so we can treat the register as normal serial where a start =0, stop=1. Therefore, we can shift in a stop=1 into the transmitter register as a bit is shifted out. Normally we shift LSB first so that is what I have done. Note it is therefore possible to do any length bitstream up to 32 bits including the start bit. By timing the shifting, the stop bit(s) would automatically be appended.

What could be helpful if there was a way to stall the read/write to the middle of the sampling by perhaps setting the WC or WZ bit.

I don't see any need for the hardware to add the extra functions for detecting framing errors. These can be done by software in the background, or increasing the clock and oversampling each bit.

With the above circuit I think that we can do all 6/7/8/9/10 bit UART functions, and also do basic/primitive (but far better than P1) I2C and SPI, and aid USB and 1-wire etc. So, it is a simple generalised serial in and serial out shifter circuit.

David Betz · 2013-10-01 17:48

cgracey wrote: »

- Both the DE2-115 and DE0-Nano now operate at 80MHz. After loading a large app (F11/F12), the cogs are running at 80MHz. After downloading a loader (F10), they are going 20MHz. See SDRAM_Graphics6 to see CLKSET ($FF) switch the clock up to 80MHz.

So you're saying that at reset the P2 runs at 20mhz but your second-stage loader changes the clock to 80mhz? I'm having some trouble getting the PropGCC loader working with the new FPGA configuration. Since I don't have any code in my second-stage loader to change the clock I'm just trying to run at 20mhz all the time. Shouldn't that work?

Bill Henning · 2013-10-01 17:55

Thanks. That makes sense.

I was assuming that the shifter would be copied to a latch when all the expected bits were received, so another shift could start.

Perhaps that latching could sync to the p2 clock? If so, we could still have a non-synced external input clock, and possibly do p2-p2 comms, and spi slave, at up to clkfreq.

However if that won't work, we could still sync the clocks, and still get clkfreq/2 SPI input and p2-p2 comm...

jmg wrote: »

You are missing something, the next step in the chain.

Simple clock muxing 'works' only to a point : the shifter might be ok, but what does the Prop core do, when it wants to read that shifter ? Worst outcome is a Core read right when the external clock moves everything.

That is why it is better to design with one clock, and use synchronized clock enables. All actions are now Ts.Th safe.

Prop 2 sysclk speeds are quite high, so even a SPI slave will have decent speed.

Cluso99 · 2013-10-01 17:55

jmg: USB bit stuffing and unstuffing can be achieved by software with the above circuitry. It may not be ideal, but I would rather a general purpose serialiser than a special purpose one which cannot be used for other things. Once the input is offloaded to the serialiser circuitry, the software (which is at least 8x faster than the P1) can handle this. P1 can do USB FS but required a number of cogs because 1 cog is just handling the serialising input full time. LS is much easier in a single cog plus a helper cog to do the upper level stuff. IIRC USB LS is 1.5Mbps and FS is 12Mbps so that is 8x faster.

jmg · 2013-10-01 18:06

You can run two separate clock domains, and a local slave-clocked state engine, and add dual port RAM as a FIFO interface, but that tends to morph well past simple.
Clock enable keeps one clock domain simple and single and is easy to test.

Bill Henning wrote: »

Perhaps that latching could sync to the p2 clock? If so, we could still have a non-synced external input clock, and possibly do p2-p2 comms, and spi slave, at up to clkfreq.

I think the port pins are not going to be able to toggle at clkfreq anyway, so there will be some IO imposed limit on all this, which will be well under the coreclk speeds.

jmg · 2013-10-01 18:13

The 'CLK muxes' you draw, have to be symbolic only, - the shifters and buffers all have to use sysclk, to avoid mixing clock domains.

Cluso99 wrote: »

With the above circuit I think that we can do all 6/7/8/9/10 bit UART functions, and also do basic/primitive (but far better than P1) I2C and SPI, and aid USB and 1-wire etc. So, it is a simple generalised serial in and serial out shifter circuit.

but it lacks any start-bit RX handling, which is less than what Chip has now ?
I would keep the classic Async state engine, but add SPI options as well. (which become like your symbolic drawing)
Most small uC manage that.

jmg · 2013-10-01 18:18

cgracey wrote: »

Hey, the info about catching the start bit, then testing it again at 1/2 bit period is a great idea. I will change our shifters to do that. In two state bits, I was only using 3 states, so that empty state will now get used, as this adds one state to the machine.

Good. I've just had a field report from a friend, who found they needed to add that start-bit-check, to get rid of noise impulse errors.

What baud granularity does this have on RX ? Do you have a baud formula ?

cgracey · 2013-10-01 18:22

That's right. My second stage loader changes the clock to 80MHz. Yours should run at 20MHz, though. Be sure to reassemble your code with the new PNUT so that you catch any instruction changes in the assembler.

cgracey · 2013-10-01 18:26

jmg wrote: »

Good. I've just had a field report from a friend, who found they needed to add that start-bit-check, to get rid of noise impulse errors.

What baud granularity does this have on RX ? Do you have a baud formula ?

I'm not sure what you are asking. We just handle baud as a number of clocks per bit.

David Betz · 2013-10-01 18:31

cgracey wrote: »

That's right. My second stage loader changes the clock to 80MHz. Yours should run at 20MHz, though. Be sure to reassemble your code with the new PNUT so that you catch any instruction changes in the assembler.

My code has to be built with the propgcc toolchain. Are there any other opcodes that you changed other than the ones mentioned in your original message in this thread?

jmg · 2013-10-01 18:31

Cluso99 wrote: »

jmg: USB bit stuffing and unstuffing can be achieved by software with the above circuitry.

Yes and no. You can SW bit stuff, but the cost is usually some violation of the USB standard, and you use a lot of very costly (logic gates) COG resource to replace quite small counters+gates.

For some hint of the issues with USB, here is one link

http://opencores.org/project,usbhostslave

Cluso99 wrote:

It may not be ideal, but I would rather a general purpose serialiser than a special purpose one which cannot be used for other things.

Certainly the serialiser needs to be general, and minimal, but it should also avoid passing upstream to software, that which needs very small amounts of logic to implement.

Hardware should do the bit-level footwork, and the software should be byte/word level.

The right balance of hardware support, means you can continually stream with no clock jitter/jumps.
I've seen quite a few serial designs over the years miss that boat, it takes attention to detail to get right.

Cluso99 · 2013-10-01 18:33

jmg: I specifically don't want start bit detection in the hardware for what I am proposing. You cannot do USB if start bit detection is there because USB is effectively synchronous comms. So, either start bit detection is done in software or we need the ability to disable it (and the stop bit insertion at the end). What I am after is a more general purpose serialiser mode. We will need to be able to enable/disable both the receiver and transmitter to be able to control this properly.

Yes, the mux is a bit more complicated, but Chip understands this better than I. I added this in to allow for I2C, SPI and other such modes.

David Betz · 2013-10-01 18:43

With the changes to settask should this code still work to start the serial receive task?

                        jmptask #rx_task,#%0010         'enable serial receiver task
                        settask #0x44 '%%1010

jmg · 2013-10-01 18:48

cgracey wrote: »

I'm not sure what you are asking. We just handle baud as a number of clocks per bit.

The classic Async runs a /16 counter on RX, that is reset on the Start-bit, and then subsequent samples are taken from some mid-point of that counter. This is found in most small uC.
Their baud formula is Fb = Xtal/16/N N = 1..

Slightly smarter serial designs, (sometimes called fractional baud) have a wider reloadable counter feeding a /2 stage, as you need to do half bit sampling.
These designs have a baud formula Fb = Xtal/2/N, and N can have a minimum value.

I think you describe the second kind ?

I generally rewrite the baud formula using a virtual baud clock, for Fb = (VirtualBaudClk/N)
On FTDI FS parts that VirtualBaudClk is 24MHz and 96MHz on high speed parts.

If you follow the above, a 160MHz P2 would have a 80MHz VirtualBaudClk, and a FPGA P2 would have VirtualBaudClk = 40MHz.

This gives baud granularity at a baud target of 115200 of
40M/347 = 115273.775
40M/348 = 114942.528

Bill Henning · 2013-10-01 18:48

Argh.

I hope we can at least get clkfreq/2 (though I wish for clkfreq)

jmg wrote: »

You can run two separate clock domains, and a local slave-clocked state engine, and add dual port RAM as a FIFO interface, but that tends to morph well past simple.
Clock enable keeps one clock domain simple and single and is easy to test.

I think the port pins are not going to be able to toggle at clkfreq anyway, so there will be some IO imposed limit on all this, which will be well under the coreclk speeds.

David Betz · 2013-10-01 18:54

David Betz wrote: »
With the changes to settask should this code still work to start the serial receive task?
                        jmptask #rx_task,#%0010         'enable serial receiver task
                        settask #0x44 '%%1010

Hmmm... It looks like jmptask also changed opcodes. How many other opcodes changed?

jmg · 2013-10-01 18:57

Cluso99 wrote: »

jmg: I specifically don't want start bit detection in the hardware for what I am proposing. You cannot do USB if start bit detection is there because USB is effectively synchronous comms. So, either start bit detection is done in software or we need the ability to disable it (and the stop bit insertion at the end).

Enable / disable of Async Start/Stop is quite simple, lots of tiny uC do this now. You can have both.

Handling start bit in SW, is much more complex, and wastes precious COG memory, as well as costing buffer tolerance.
Chip already has that working.

David Betz · 2013-10-01 19:32

David Betz wrote: »

Hmmm... It looks like jmptask also changed opcodes. How many other opcodes changed?

Okay, the propgcc loader works if I update the encodings of both settask and jmptask. Later I'll go through the entire opcode table to see if any others have changed and to add the new ones. Anyway, propgcc for P2 is back in business.

Cluso99 · 2013-10-01 20:19

jmg wrote: »

Enable / disable of Async Start/Stop is quite simple, lots of tiny uC do this now. You can have both.

Handling start bit in SW, is much more complex, and wastes precious COG memory, as well as costing buffer tolerance.
Chip already has that working.

Start/stop might be working, but it is of no use for lots of other purposes if it cannot be disabled !!! The async that Chip has added is a nice touch, but that has always been possible by software, even in the P1. It is the "other" protocols that need hardware assistance so that we can do them. And they require a more basic serialiser (or else a lot more complex options which seems too much to ask for).

If we cannot have an enable/disable for start/stop, then IMHO not having it there is a preference. We need to be able to let the serialiser run continuously (and be able to enable/disable it by software) to do any form of synchronous comms. I2C, SPI, USB, Ethernet, and other protocols are all forms of synchronous comms.

Handling start bit in SW, is much more complex, and wastes precious COG memory, as well as costing buffer tolerance.
Chip already has that working.

For 8,N,1 load the following 10 (32) bits into the serialiser, where X = the 8 bit char to TX..

          ROL       X,#1             'add start bit=0
          OR        X,StopBits        'add stop bits=1 
'now put into serialiser and start it, save CNT and add 10 bit times to it. This is now ready for a passcnt/waitcnt instruction after housekeeping etc.

StopBits  LONG        $FFFF_FE00

cgracey · 2013-10-01 20:25

David Betz wrote: »
With the changes to settask should this code still work to start the serial receive task?
                        jmptask #rx_task,#%0010         'enable serial receiver task
                        settask #0x44 '%%1010

That looks good. Just make sure you are using the latest opcodes for those instructions.

David Betz · 2013-10-01 20:36

cgracey wrote: »

That looks good. Just make sure you are using the latest opcodes for those instructions.

Have a lot of opcodes changed? Should I be looking at each one or is there a smaller list you can give me to update? I've already done settask and jmptask. Are there any others that aren't mentioned in the first message of this thread?

Bill Henning · 2013-10-01 20:51

Adding start/stop bits to xmit is easy.

Detecting a start bit on receive... not so easy, especially at high bit rates.

I don't think it should be too tough to have start/stop bit generation and detection be an option for the hardware.

Cluso99 wrote: »
Start/stop might be working, but it is of no use for lots of other purposes if it cannot be disabled !!! The async that Chip has added is a nice touch, but that has always been possible by software, even in the P1. It is the "other" protocols that need hardware assistance so that we can do them. And they require a more basic serialiser (or else a lot more complex options which seems too much to ask for).

If we cannot have an enable/disable for start/stop, then IMHO not having it there is a preference. We need to be able to let the serialiser run continuously (and be able to enable/disable it by software) to do any form of synchronous comms. I2C, SPI, USB, Ethernet, and other protocols are all forms of synchronous comms.

For 8,N,1 load the following 10 (32) bits into the serialiser, where X = the 8 bit char to TX..
          ROL       X,#1             'add start bit=0
          OR        X,StopBits        'add stop bits=1 
'now put into serialiser and start it, save CNT and add 10 bit times to it. This is now ready for a passcnt/waitcnt instruction after housekeeping etc.

StopBits  LONG        $FFFF_FE00

Bill Henning · 2013-10-01 21:04

jmg,

I gave this some more thought, and I think (as long as pins can toggle at clkfreq, which Chip can tell us if it is possible)

assume external clock is used to shift N bits - let's say shift is on falling edge of ECLK

once N bits are shifted, the rising edge of ECLK latches the shift register into the "Async Latch", sets internal "BITSAVAIL" flip flop

next p2 clock rising or falling clock "Async Latch" is latched into "P2 Receive Latch", clears "BITSAVAIL"

so all it takes to bridge the ECLK and P2CLK domains is an extra latch, and it will work transparently to the P2CLK and ECLK

This should even work where ECLK > P2CLK as long as enough bit periods elapse between the N bit words that the cog can read out the latches in a loop.

This could be viewed as a semi-fifo, but all it really does is synchronize two different clock domains using a flipflop and a pair of latches.

1x clkfreq send:

Worst case, if output pins are registered, a logic path is needed to bypass registering for serial output capable pins.

Theoretically, it may even be possible to send at 2x 4x clkfreq if the process can handle it, as the sync send could use a faster clock than clkfreq, and receive could clock as fast as the shift register in the process used could run... if the process supports it 320Mbps @ 160Mhz or faster...

It should be possible to test the above in the FPGA's

jmg wrote: »

You can run two separate clock domains, and a local slave-clocked state engine, and add dual port RAM as a FIFO interface, but that tends to morph well past simple.
Clock enable keeps one clock domain simple and single and is easy to test.

I think the port pins are not going to be able to toggle at clkfreq anyway, so there will be some IO imposed limit on all this, which will be well under the coreclk speeds.

Big update for DE2-115 and DE0-Nano users w/add-on boards

Comments