Big update for DE2-115 and DE0-Nano users w/add-on boards

Sapieha · 2013-10-01 10:06

Hi Chip.

2 good pages on SPI

http://www.byteparadigm.com/applications/introduction-to-i2c-and-spi-protocols/

http://www.best-microcontroller-projects.com/spi-interface.html

cgracey wrote: »

Well, I'm always needing UARTs. The SPI stuff can always be bit-banged without precise timing concerns, unlike asynchronous serial. So, async demands a hardware solution. The concept of 1-wire async serial, where you have a rest state, a notifying start state, data bits, and a return to a rest state, is something pure and simple - universally useful and irreducible. All both parties need to know is rate and number of bits.

I'm not opposed to expanding the serial circuits to do a lot more, but I'm not sure what, specifically, they ought to be doing. I know SPI is desired, but can someone lay out a case for it, and describe, essentially, what is wanted? I just need to get the gumption and then it will happen.

cgracey · 2013-10-01 10:07

Bill Henning wrote: »

I just got to the stack ram section in the new zip, and (for Prop 3) the following two additions would be great:

RDSTACK cog_dst, stack_src/#stack_addr

WRSTACK cog_src, stack_dst/#stack_addr

Currently it takes two instructions to use a stack location as a temp var (unless you push/pull) this would add fast random access spare registers, so even if these had to take two cycles, it would save valuable cog ram

You know, because we have nine bits in S immediate and only 8 needed for random access, the top bit could trigger SPA/SPB w/offset and pre-/post- add/sub, just like RDXXXX/WRXXXX instructions. I'd need to rob an INST D,S instruction code. Maybe ENC could become a D-only instruction and in it's place I could put a RDSTACK/WRSTACK instructions which key rd/wr from the WR bit, just like the hub-memory instructions do. This new instruction would supplant PUSHA/PUSHAR/PUSHB/PUSHBR/POPA/POPAR/POPB/POPBR and provide way more flexibility. SPA and SPB would continue to exist (just like PTRA and PTRB), but become a lot more flexible. Great ideas, Bill!

Seairth · 2013-10-01 10:08

A few questions about the serial receive:

What happens when there's a framing error?
Are the 4-bit IDs transparent to the SEROUTx/SERINx? In other words, does SERINx (or maybe the receiver itself) strip it? In which case, what happens if the wrong 4-bit value is received?

cgracey · 2013-10-01 10:11

Bill Henning wrote: »

For USB, differential input and output support would be GREAT

Differential I/O is actually a configurable pin function, and doesn't need two output signals. All even/odd pin pairs can use their neighbor's OUT signal from the core as their own OUT signal, and can specify inversion, too. A pin pair can be set to input as differential signal on either or both IN signals.

Bill Henning · 2013-10-01 10:13

Thank you!

By the way, you made it way better with treating the pointers (with index) like PTRA/B... this also allows local variables

wrstack r0,spa[1] ... save local variable 1
rdstack r0,spa[2] ... read local variable 2

wrstack r0,--spb ... push r0
rdstack r0,spb++ ... pop r0

FYI, this will help GCC a LOT for programs that don't need a huge stack

cgracey wrote: »

You know, because we have nine bits in S immediate and only 8 needed for random access, the top bit could trigger SPA/SPB w/offset and pre-/post- add/sub, just like RDXXXX/WRXXXX instructions. I'd need to rob an INST D,S instruction code. Maybe ENC could become a D-only instruction and in it's place I could put a RDSTACK/WRSTACK instructions which key rd/wr from the WR bit, just like the hub-memory instructions do. This new instruction would supplant PUSHA/PUSHAR/PUSHB/PUSHBR/POPA/POPAR/POPB/POPBR and provide way more flexibility. SPA and SPB would continue to exist (just like PTRA and PTRB), but become a lot more flexible. Great ideas, Bill!

Bill Henning · 2013-10-01 10:15

Great!

Now we just need ser_clk_in / ser_clk_out capability, with a "enable_start_stop_clock" and we have SPI nailed, and good hardware support for USB, ethernet, etc..

cgracey wrote: »

Differential I/O is actually a configurable pin function, and doesn't need to signals output. All even/odd pin pairs can use their neighbor's OUT signal from the core as their own OUT signal, and can specify inversion, too. A pin pair can be set to input as differential signal on either or both IN signals.

cgracey · 2013-10-01 10:17

Seairth wrote: »

A few questions about the serial receive:
What happens when there's a framing error?

Are the 4-bit IDs transparent to the SEROUTx/SERINx? In other words, does SERINx (or maybe the receiver itself) strip it? In which case, what happens if the wrong 4-bit value is received?

I didn't cover for framing errors, because I figured it's easier to do the data-integrity checks at higher levels where the bandwidth requirement is lowered, if they are needed.

Those extra 4 bits must be enabled on both ends. The transmitter tacks them on after the data bits and the receiver strips them off, and only passes the received data if the filter matched. This way, you could have a 'master' prop spewing out data to other chips and only messages intended for chip X are received by chip X. It just lowers the busyness level on the receivers' side.

Seairth · 2013-10-01 10:31

cgracey wrote: »

I'm not opposed to expanding the serial circuits to do a lot more, but I'm not sure what, specifically, they ought to be doing. I know SPI is desired, but can someone lay out a case for it, and describe, essentially, what is wanted? I just need to get the gumption and then it will happen.

I think the case for it is that SPI interfaces are more prevalent than UARTs, at least at the TTL level. Of course, SPI is a tricky beast to generally support.

Allow daisy chaining?
Allow variable-length frames?
Tie master clock to slave select enable?
Allow half-duplex ("three-wire") mode?
Allow quad-SPI mode?
How to handle underflow/overflow?
etc.

Seairth · 2013-10-01 10:39

cgracey wrote: »

I didn't cover for framing errors, because I figured it's easier to do the data-integrity checks at higher levels where the bandwidth requirement is lowered, if they are needed.

Would it be possible to set the Z flag if there was an error? This would at least at least give an indication for the scenario where the TX has ID enabled, but RX does not.

cgracey · 2013-10-01 10:40

Seairth wrote: »

I think the case for it is that SPI interfaces are more prevalent than UARTs, at least at the TTL level. Of course, SPI is a tricky beast to generally support.
Allow daisy chaining?

Allow variable-length frames?

Tie master clock to slave select enable?

Allow half-duplex ("three-wire") mode?

Allow quad-SPI mode?

How to handle underflow/overflow?

etc.

Yes, that is over the top in complexity. We need to identify the root functionality that takes care of 90% of the cases.

tonyp12 · 2013-10-01 10:52

>Well, I'm always needing UARTs
If IC's stopped using them we would be better off.
SPI and I2C is so much better than 1970's uart.
But as it's still around, I guess if it easy to implement in the P2 why not.

jazzed · 2013-10-01 11:01

Well if nothing else, being able to grab or send a single byte or long at a time with a counter output acting as the clock is a 500% improvement over pure bit-banged serial.

Would be best if the start bit was optional for a receive to begin (Propeller as a SPI master), but even that can be improvised by some method (most likely successful by wasting a COG).

Bill Henning · 2013-10-01 11:05

Seairth,

Forgive me, but I think you are over complicating the issue. I will address your points one by one:

1) Daisy chaining - can be handled in software, see MCP23S17, allows 8 of those chips on one chain, no extra hardware needed. The address is embedded in the first byte.

2) Variable length frames - I'd say over 90% of the frames are 8 bit, some are 16 bit... but most cases could be handled with multiple eight bit frames

3) Tie master clock to slave select enable - SPI devices ignore the clock when not selected, and clock may be stretched when selected, so I think we can ignore this

4) Allow half-duplex mode - we add an external resistor between MOSI and MISO, problem solved

5) Allow quad-SPI mode - that is not a SER/DES issue, we need not address it (too much additional complexity right now, but nice for Prop3)

6) Underflow/overflow - ignore extra bits, generate dummy bits. Not perfect, but will handle most SPI devices. Most uC SPI ports have same issue.

A simple change to the engine, as follows, would handle 90%-95% of cases:

1) bit to disable start/bits (or at least disable clocks during their time slot)

2) generate external bit clock from internal ser clock, with option to invert (for devices that sample on rising or falling edge of clock)

3) allow external clock for receive, settable for rising or falling edge - this should also allow at least clkfreq/2 prop to prop comms without shared crystal, possibly even clkfreq

A more complicated addition would catch most of your other points:

4) setbits - set number of bits to xmit/receive, so instead of 8/12/32/36 it is 1..32 (+4 optionally) ... I am not sure this is worth the silicon

Seairth wrote: »

I think the case for it is that SPI interfaces are more prevalent than UARTs, at least at the TTL level. Of course, SPI is a tricky beast to generally support.

Note any pin could be used as a slave select for tx, and waiting on a pin allows starting to receive ~5 clocks after /CS is asserted by a master when the prop2 is the slave.
Allow daisy chaining?

Allow variable-length frames?

Tie master clock to slave select enable?

Allow half-duplex ("three-wire") mode?

Allow quad-SPI mode?

How to handle underflow/overflow?

etc.

Seairth · 2013-10-01 11:10

cgracey wrote: »

Yes, that is over the top in complexity. We need to identify the root functionality that takes care of 90% of the cases.

Agreed. Here is my list:

Support 8-bit, 16-bit, or 32-bit mode.
Support for PHA/POL configuration
Support for both master and slave mode.
In both master and slave mode, you first call SEROUTx to latch a value to be clocked out, then call SERINx to read the simultaneous value that gets clocked in and latched.
When master, only clock during SEROUTx.
When slave, SEROUTx sets Z if underrun (i.e. master clocks a frame before SEROUTx is called).
When slave, SERINx sets Z if overrun (i.e. master clocks two or more frames before SERINx is called).

(I'm assuming that the current UART already uses a shift register and latch register)

Ariba · 2013-10-01 11:12

cgracey wrote: »

Well, I'm always needing UARTs. The SPI stuff can always be bit-banged without precise timing concerns, unlike asynchronous serial. So, async demands a hardware solution. The concept of 1-wire async serial, where you have a rest state, a notifying start state, data bits, and a return to a rest state, is something pure and simple - universally useful and irreducible. All both parties need to know is rate and number of bits.

I'm not opposed to expanding the serial circuits to do a lot more, but I'm not sure what, specifically, they ought to be doing. I know SPI is desired, but can someone lay out a case for it, and describe, essentially, what is wanted? I just need to get the gumption and then it will happen.

The main problem on Prop 1 was fast SPI input. We can use the Videogenerator or the counter trick to do fast SPI out (up to 20 MHz @80MHz sysclock). But the input was limited to 10 MHz with the help of a counter.
This is more worse with an external clock. If the Prop should act as a SPI slave the max bitrate was something like 1 MHz. Every little 50cent PIC beats that with it's hardware SPI peripheral.

So what we need is a shiftregister that gets clocked by an external clock input, or internal clock. After an amount of bits the value gets latched and is readable by the CPU. Something like that:

This is for 8 bits fixed, perhaps it can be configurable from 4..32bits, but 8 bit is what most microcontrollers have.

I dont think it helps a lot for USB. You will need to do the bit stuffing and unstuffing in an USB stream per hardware, otherwise a serializer/deserializer does not help much. You anyway need to test and count every bit later for the stuffing, so you can do that also while you receive or send it per bitbanging. With 160 MIPS USB fullspeed should not be a problem.

Andy

jazzed · 2013-10-01 11:13

Bill Henning wrote: »

A simple change to the engine, as follows, would handle 90%-95% of cases:

1) bit to disable start/bits (or at least disable clocks during their time slot)

2) generate external bit clock from internal ser clock, with option to invert (for devices that sample on rising or falling edge of clock)

3) allow external clock for receive, settable for rising or falling edge - this should also allow at least clkfreq/2 prop to prop comms without shared crystal, possibly even clkfreq

+1

Not so sure of the value of 3 though. It will be difficult to make the uart a generic SPI slave device interface without a receive queue buffer.

Bill Henning · 2013-10-01 11:18

Thanks...

3 is needed for SPI receive faster than ~ clkfreq/6

spi normally sends bytes, and using movf, and such an spi engine, a p2 should have no problem receiving large packets even at clkfreq (given external clock input for the shift register clock) as the cog would have 8 cycles per byte to handle it - heck, synced to the hub, it may be possible to fill the hub with incoming SPI data at clkcreq

One other nice thing to have would be selecting LSB / MSB first for send/receive, however we can reverse the bit order in the cog in two cycles (reverse, shift), and sending/receiving eight bits would take at least eight cycles, leaving room to do the reversal in software

jazzed wrote: »

+1

Not so sure of the value of 3 though. It will be difficult to make the uart a generic SPI slave device interface without a receive dma buffer.

cgracey · 2013-10-01 11:19

Seairth wrote: »

Would it be possible to set the Z flag if there was an error? This would at least at least give an indication for the scenario where the TX has ID enabled, but RX does not.

Right now, things are so simple that there is no sense of there being an error, or not. I'd have to start oversampling the input to determine if an error had occurred.

Seairth · 2013-10-01 11:20

Bill Henning wrote: »

4) setbits - set number of bits to xmit/receive, so instead of 8/12/32/36 it is 1..32 (+4 optionally) ... I am not sure this is worth the silicon

Heh. I definitely wasn't suggesting that he support all of those features. Was just pointing out how complex SPI can be.

As for the optional 4-bit ID value, it wouldn't generally be needed with SPI unless there were multiple slaves listening at the same time. That would also mean that the slaves would either not have their MISO line connected or would have to avoid driving those lines. More complication that we don't generally need. So I'd just leave the 4-bit ID option out altogether.

Incidentally, the 4-bit ID value is only 3 bits away from being able to support 7-bit I2C slave addresses.

pedward · 2013-10-01 11:26

cgracey wrote: »

Right now, things are so simple that there is no sense of there being an error, or not. I'd have to start oversampling the input to determine if an error had occurred.

Will the state machine hang if the input data is corrupt, or is there some watchdog that prevents a hang in the state machine?

Sapieha · 2013-10-01 11:43

Hi Chip.

Instead of hardware error checking --- I think it is simpler to add one PRITY code/decode instruction that can solve error checking on most SER Inputs/Outputs

cgracey wrote: »

Right now, things are so simple that there is no sense of there being an error, or not. I'd have to start oversampling the input to determine if an error had occurred.

Seairth · 2013-10-01 11:43

If there are two UARTs (that supported SPI) per COG, what would it take to allow interleaved access as in the following scenario:

* Two identical SPI flash chips
* Both chips connected to the same MOSI and CLK (from SERA, for example)
* Each chip connected to separate MISO pins (SERA and SERB).

If you were to set up SERB as a slave, I think the wiring would be:

SERA CLK => CHIP1 CLK, CHIP2 CLK, SERB CLK
SERA MOSI => CHIP1 MISO, CHIP2 MISO
SERA MISO <= CHIP1 MOSI
SERB MISO (disconnected)
SERB MOSI <= CHIP2 MOSI
SERB CS <= always active, or connected to CHIP2 CS (or CHIP1 and CHIP2 CS if controlled by a single CS)

(note the unusual MOSI-to-MOSI connection between SERB and CHIP2. In essence, SERA is driving the data flow from CHIP2 to SERB.)

Yeah, I know we don't even have SPI yet, but it's still fun to ponder.

cgracey · 2013-10-01 12:07

pedward wrote: »

Will the state machine hang if the input data is corrupt, or is there some watchdog that prevents a hang in the state machine?

This is all it does:

1) wait for STOP state
2) wait for START state
3) delay 1.5 bit periods
4) sample data bit 0
5) wait 1 bit period
6) sample data bit 1
7) loop 6 times to (5) to get data bits 2..7
8) done, pass received data via SERINA/SERINB, loop to (1).

Note that at (8), RX still reads bit 7, which may be low, and will be followed by the STOP state from the transmitter. The receiver loops to (1) where it retriggers on the next STOP-to-START transition.

You see, there is no error possibility from the receiver's perspective. As long as it sees a negative edge, you've got new data. Transmit is even simpler - totally deterministic.

evanh · 2013-10-01 12:41

I'm liking it. A UART at SPI speeds is not to be sniffed at. Obviously having SPI in hardware is desirable for compatibility with other chips, including I2S, but, for comms, not having to deal with a clock is very desirable.

User Name · 2013-10-01 13:24

I agree. My first reaction to this news was very positive! I don't like Master/Slave protocols even though I'm compelled to implement them all the time.

cgracey · 2013-10-01 14:07

User Name wrote: »

I agree. My first reaction to this news was very positive! I don't like Master/Slave protocols even though I'm compelled to implement them all the time.

There is something irreducibly beautiful about asynchronous serial (HIGH-->LOW-->D0-->D1-->D2-->Dn-->HIGH....). It just can't be any simpler. And it doesn't demand anything back. Those with receivers, let them receive. It affords elegance because it doesn't entangle the transmitter and receiver in a fitful game of Twister, and so it pushes failure handling up higher where it can be dealt with more gracefully at the macro level.

I understand the desire for fast, synchronous shifting, though, and I'm trying to figure out how to best do it without getting too down in the trenches with specific protocol requirements.

jmg · 2013-10-01 14:53

cgracey wrote: »

This is all it does:

1) wait for STOP state
2) wait for START state
3) delay 1.5 bit periods
4) sample data bit 0
5) wait 1 bit period
6) sample data bit 1
7) loop 6 times to (5) to get data bits 2..7
8) done, pass received data via SERINA/SERINB, loop to (1).

Note that at (8), RX still reads bit 7, which may be low, and will be followed by the STOP state from the transmitter. The receiver loops to (1) where it retriggers on the next STOP-to-START transition.

You see, there is no error possibility from the receiver's perspective. As long as it sees a negative edge, you've got new data. Transmit is even simpler - totally deterministic.

What Baud rates does this support ?

You also really need to catch false start bits, aka noise spikes, usually done as

1) wait for STOP state (check stop bit=1 @ 0.5, proceed immediately to 2)
2) wait for START edge
2b) delay 0.5 bit periods
2c) Check Start still = 0, if not GOTO 2
3) delay 1.0 bit periods
4) sample data bit 0
5) wait 1 bit period
6) sample data bit 1

and it is important to start checking the START edge, half way thru the STOP time, to tolerant the usual baud skews
You are probably doing that, but I do not see it mentioned.

Some Async protocols use a Break, which can be easily sent in SW, but the RX side needs to then signal a Frame Error, from the failure of STOP=1

Since you need a 5 bit loadable counter to do 8 or 32 bits, if there is Register Room, it is a very good idea to make that fully user adjustable - ie allow user define of any length.

Also, common is 9 bit UART modes, so your 4 bit address frame is really nice, but needs a 1 bit option.
This should also have Software access, as well as the hardware compare you do now.

Sometimes that 9th bit is not used for address, but for command/data tags, and in those cases, software access to those extra 'address/tag' bits is required.

jazzed · 2013-10-01 14:59

cgracey wrote: »

I understand the desire for fast, synchronous shifting, though, and I'm trying to figure out how to best do it without getting too down in the trenches with specific protocol requirements.

Thanks Chip.

Both transmit and receive clocking are desirable for a generic solution. If you omit the external receive clocking though, I would understand especially considering the problems that could create. Don't go chasing rabbits too far.

Just my 2 cents. Others may want more.

jmg · 2013-10-01 15:25

cgracey wrote: »

I'm not opposed to expanding the serial circuits to do a lot more, but I'm not sure what, specifically, they ought to be doing. I know SPI is desired, but can someone lay out a case for it, and describe, essentially, what is wanted? I just need to get the gumption and then it will happen.

In the simplest form, SPI involves removing stuff (well, ok, to do both SPI and UART means adding option bits and a Mux or three)
- you disable Start and Stop bit sensing, and add a CLK out

Typical SPI option bits include control of
* Clock Phase
* Clock Polarity
* LSB First or MSB First Data Transfer
* Master or Slave
* Clock enable/disable (some SPI apps need just MISO or MOSI)
* Shift lengths (8 is common, but many 32 bit uC do 8..32 bit SPI )
With 32 bits already in UART, 32 bit SPI is natural.

Slave commonly has lower CLK speeds, as it needs to sync the remote clock, to the chip clock.

Quad SPI really just involves taps and a mux, to shift from one-bit-per-edge, to 4 bits per edge.
The biggest hit from QuadSPI is likely to be in pin-management, and how to select where those 4 bits go ?

Better SPI ports can continually send, with no gaps in the CLK. Not all small uC manage this.

jmg · 2013-10-01 15:33

jazzed wrote: »

If you omit the external receive clocking though, I would understand especially considering the problems that could create.

Usually external Clocks, are handled essentially as a clock enable - you sample to find the defined edge, and then enable shift using master clock, on that edge.
The Shifters are always clocked by P2 SysCLK - the remote slave clock simply says 'wait for me' .
In master mode, the Clock Enable can feed from the ClkDivider carry, so Shifters are always clocked by P2 SysCLK

That imposes a lower slave clock rate, which is common.

Big update for DE2-115 and DE0-Nano users w/add-on boards

Comments