I don't think it should be too tough to have start/stop bit generation and detection be an option for the hardware.
Agreed. Most of the work is in finding register space, and mapping the control and flag bits.
Also easy should be a CLKOUT option bit, even for 'async', as that (with addition of fully variable length control) then allows the P2 to use the FTDI High Speed Serial, which can manage 50MHz clock speeds.
Streaming at 50Mbd over a standard VCP into a P2 has real appeal, and it likely to be about the Pin-limit.
jmg,
I gave this some more thought, and I think (as long as pins can toggle at clkfreq, which Chip can tell us if it is possible)
Most uC cannot toggle pins above 100MHz, and many limit their Serial specs to somewhat under that, to allow for Ts/Th delay effects through the relatively large and slow Pin buffers.
assume external clock is used to shift N bits - let's say shift is on falling edge of ECLK
once N bits are shifted, the rising edge of ECLK latches the shift register into the "Async Latch", sets internal "BITSAVAIL" flip flop
next p2 clock rising or falling clock "Async Latch" is latched into "P2 Receive Latch", clears "BITSAVAIL"
...
Sure, but you have now duplicated state machines for master and slave modes, as well as buffer registers.
Testing is a lot more complex, and you have to hope you sliced those clock domains in the correct places.
All of this effort, is to try to reach /1 on shift clocks ?, but I do not recall seeing any uC that offer /1 shifter clocks - /2 is common, in master mode, and even that for sub 50MHz devices. I think some allow /2 in slave, if the two chips have sync'd clocks.
Even looking at a 'still coming' part like the 240MHz Analog Devices M4, shows a max of 50MHz spec'd for any pins, and a max of (fSCLK/4) MHz is mentioned on external clock modes. (that is still 60MHz at the pins).
I found a NXP device, where early data mentioned 80Mbd, but 2013 data now says 52Mbd, on their SPI ports. The real world always conspires to lower practical clock speeds.
Imho, For SPI (and other high speed serial) a 32-bit wide version of the 74HC195 (or 74HC194) chip extended with an output latch or a 32-bit version of the 74HC299 would do wonders for speed. Nicer would be some muxes for clock source, which edge to clock on, where the pins connect, etc. If the shift register included a counter to (optionally) latch the register every 8-16-32 clocks and set a flag, it should deal with asynchronous bit clocks too?
I have just been "playing" with some of your new instructions.
I just modified my VGA driver in Invaders to utilize the new non-polled WAITVID instruction as well as the new ESWAP8 Endian reversal.
This made a huge impact on my drivers size and performance. It's size was reduced from 57 longs to 41 longs!
Using ESWAP8 not only reduced the driver size but dramatically improved performance too.
I can "nearly" get away running it at 1/16 time slot but 2/16 works great. Sure the 20MHz clock increase helped too!
I think this shows that the real silicon @ 160MHz could easily do similar tasks in just 1/16 time slot.
I like your new SETTASK format too.
Very nice enhancements!
I have just been "playing" with some of your new instructions.
I just modified my VGA driver in Invaders to utilize the new non-polled WAITVID instruction as well as the new ESWAP8 Endian reversal.
This made a huge impact on my drivers size and performance. It's size was reduced from 57 longs to 41 longs!
Using ESWAP8 not only reduced the driver size but dramatically improved performance too.
I can "nearly" get away running it at 1/16 time slot but 2/16 works great. Sure the 20MHz clock increase helped too!
I think this shows that the real silicon @ 160MHz could easily do similar tasks in just 1/16 time slot.
I like your new SETTASK format too.
Very nice enhancements!
I have just been "playing" with some of your new instructions.
I just modified my VGA driver in Invaders to utilize the new non-polled WAITVID instruction as well as the new ESWAP8 Endian reversal.
This made a huge impact on my drivers size and performance. It's size was reduced from 57 longs to 41 longs!
Using ESWAP8 not only reduced the driver size but dramatically improved performance too.
I can "nearly" get away running it at 1/16 time slot but 2/16 works great. Sure the 20MHz clock increase helped too!
I think this shows that the real silicon @ 160MHz could easily do similar tasks in just 1/16 time slot.
I like your new SETTASK format too.
Very nice enhancements!
Cheers
Brian
Glad it's all working. I ran the 80MHz version on my DE0-Nano and I watched it on and off for hours while I worked. It's good It has an 'attract' mode.
That SETTASK idea was Tubular's, I believe. It's a really nice solution to variable-length task loops.
You could use SERA/SERB to get your serial going with less overhead, too.
Have You on SERA/SERB any status register else that it set Zero/Cary flag to step over if no any character are awaiting?
We can make it do almost anything, so yes.
I've been working on the RDSTACK/WRSTACK instructions today and I think I'll have it done tomorrow. Hopefully, I'll glean some clear direction from all this synchronous serial discussion by then and we'll have a nice and tidy solution.
I've been working on the RDTASK/WRTASK instructions today and I think I'll have it done tomorrow. Hopefully, I'll glean some clear direction from all this synchronous serial discussion by then and we'll have a nice and tidy solution.
I have be thinking at if it is possible Z-flag can be used as Character ready to read --- C-flag can be used as Overflow -- Income Character in buffer has be overwritten by next one (Lose of Character in stream)
What if P2 only supported SPI master mode? I suspect that master mode will be used the majority of the time. And we would still have the UART for chip-to-chip communication. This might simplify the implementation:
Set phase, polarity, bit order, MISO, MOSI, CLK, divider
Allow only 8-bit frames (larger can be handled in software)
SEROUTx outputs without start bit, ID, or stop bit
SERINx reads buffered input (captured on the appropriate clock edge during SEROUTx)
CS is handled in software (i.e. SPI hardware has no knowledge of CS)
The biggest difference I see here is that the receiver is now tied to the transmitter in a way that the UART isn't.
[*] SERINx reads buffered input (captured on the appropriate clock edge during SEROUTx)
Master only does sound like a decent compromise. After all, the main reason for wanting SPI is to connect small pin count peripheral chips. I believe raw USB or Ethernet has been shot down on the basis of needing complex framing in the hardware, right?
Agreed. Most of the work is in finding register space, and mapping the control and flag bits.
Also easy should be a CLKOUT option bit, even for 'async', as that (with addition of fully variable length control) then allows the P2 to use the FTDI High Speed Serial, which can manage 50MHz clock speeds.
Streaming at 50Mbd over a standard VCP into a P2 has real appeal, and it likely to be about the Pin-limit.
I totally agree that it is not worth a lot of pain to reach clkfreq/1 on serial, looks like my thought experiments under estimated the level of effort required to implement/test it.
/2 would still be nice if the level of effort required was reasonable, however your examples below suggest it too may require too much effort.
Most uC cannot toggle pins above 100MHz, and many limit their Serial specs to somewhat under that, to allow for Ts/Th delay effects through the relatively large and slow Pin buffers.
Sure, but you have now duplicated state machines for master and slave modes, as well as buffer registers.
Testing is a lot more complex, and you have to hope you sliced those clock domains in the correct places.
All of this effort, is to try to reach /1 on shift clocks ?, but I do not recall seeing any uC that offer /1 shifter clocks - /2 is common, in master mode, and even that for sub 50MHz devices. I think some allow /2 in slave, if the two chips have sync'd clocks.
Even looking at a 'still coming' part like the 240MHz Analog Devices M4, shows a max of 50MHz spec'd for any pins, and a max of (fSCLK/4) MHz is mentioned on external clock modes. (that is still 60MHz at the pins).
I found a NXP device, where early data mentioned 80Mbd, but 2013 data now says 52Mbd, on their SPI ports. The real world always conspires to lower practical clock speeds.
What if P2 only supported SPI master mode? I suspect that master mode will be used the majority of the time. And we would still have the UART for chip-to-chip communication. This might simplify the implementation:
Set phase, polarity, bit order, MISO, MOSI, CLK, divider
Allow only 8-bit frames (larger can be handled in software)
SEROUTx outputs without start bit, ID, or stop bit
SERINx reads buffered input (captured on the appropriate clock edge during SEROUTx)
CS is handled in software (i.e. SPI hardware has no knowledge of CS)
The biggest difference I see here is that the receiver is now tied to the transmitter in a way that the UART isn't.
FYI, there are more opcodes that have changed than are listed in Chip's message at the head of this thread. I wrote a program to compare Chip's opcode table with the one in PropGCC and found numerous other differences.
What if P2 only supported SPI master mode? I suspect that master mode will be used the majority of the time.
I have to strongly disagree with that assumption.
The Prop and Prop II especially can make excellent peripheral chips to larger systems. Say an ARM board running Linux. Very common systems now a days.
As such SPI slave is very important as it is far easier to operate the host end as a master. Nobody is going to want to start writing Linux device drivers to use the SPI interface as a slave device.
Master only does sound like a decent compromise. After all, the main reason for wanting SPI is to connect small pin count peripheral chips.
Bill,
Fast SPI master is far more important than fast slave,
Again, I have to say: slave is important. Consider you want to connect a Prop to an ARM machine or some bigger system? The Prop is the peripheral chip. Fast SPI slave woudld be a great way to sell the Prop as periheral device.
If it's just not practcal to do then so be it. I just think SLAVE can be more important than MASTER.
Again, I have to say: slave is important. Consider you want to connect a Prop to an ARM machine or some bigger system? The Prop is the peripheral chip. Fast SPI slave woudld be a great way to sell the Prop as periheral device.
If it's just not practcal to do then so be it. I just think SLAVE can be more important than MASTER.
It's not ideal, but SPI slave can still be done entirely in software. At the end of the day, I'd rather have fast hardware SPI master support than no hardware SPI support at all. Also, many chips (including ARM, Microchip, AVR) support TTL UARTs that should be able to work with the existing P2 UART functionality.
You are right, I had my prop-centric hat on. Fast slave is also very important, as fast as is consistent with not requiring too great an effort.
Just for yucks, let's figure out the fastest possible externally clocked software SPI slave code - I'll start here:
getspibyte ' wait for rising spiclk, sample miso
waitpeq clkpin, clkpin
shr pinsa,#miso wc,nr
waitpne clkpin, clkpin
rlc spibyte, #1
' 7 more copies of above, with loop unrolled we avoid loop overhead
getspibyte_ret
ret
Prop2_Docs.txt does not give the minimum number of cycles for waitpeq.
1 cycle wait*: 4 clocks per bit in ideal circumstances (which are not likely), and invalid results if spiclk is too fast.
2 cycle wat*: 6 clocks per bit
general formula: W*2+2 clocks minimum for one SPI clock, where W is the minimum wait* time
external clock software receive:
26.6Mbps @ 160Mhz - assuming W=2 is feasible
20.0Mbps @ 160Mhz - assuming W=3 is feasible
16.0Mbps @ 160Mhz - assuming W=4 is feasible
13.3Mbps @ 160Mhz - assuming W=5
external clock hardware receive:
80.0Mbps @ 160Mhz, if clkfreq/2 is possible
53.3Mbps @ 160Mhz, at clkfreq/3
40.0Mbps @ 160Mhz, at clkfreq/4
Conclusions:
- hardware externally clocked SPI slave is much faster than software based externally clocked SPI slave
- hardware SPI slave can run as a task in a cog
- software SPI slave needs a whole cog
Ok heater, you got me - much nicer to be a fast hardware peripheral to an external processor (than a slow one)
Again, I have to say: slave is important. Consider you want to connect a Prop to an ARM machine or some bigger system? The Prop is the peripheral chip. Fast SPI slave woudld be a great way to sell the Prop as periheral device.
If it's just not practcal to do then so be it. I just think SLAVE can be more important than MASTER.
- hardware SPI slave can run as a task in a cog
- software SPI slave needs a whole cog
I think this is a key observation. This means that these simple peripherals like UART, I2C, SPI, etc don't have to consume an entire COG. They can be combined with some higher level logic that runs in another thread. I think this makes the Propeller far more useful. You can use all 8 COGs to partition your application without giving up any to handle the tasks that are supported by custom hardware on other microcontrollers.
No doubt true. But...
There are many systems out there can can act as MASTER to any slave peripherals. Prop II as a SLAVE peripheral would be perfect. It's not always possible to create that software on the MASTER end.
Two examples:
1) Any Linux based system. That's a lot. Linux has no SLAVE side driver. I am certainly not up to creating one. It would be great to be able to attach a Prop II as a peripheral to a Beagle Bone, Raspberry Pi and many others.
2) The Espruino has SPI MASTER support for attaching peripheral chips. Again, anyone want to create that SLAVE side driver.
Philosophically the Prop is the slave in these situations so it makes sense to have SPI SLAVE support.
Anyway as I said, if HOST only is easier and SLAVE is impractical, so be it.
We have to stop messing with the PII design at some point:)
We have to stop messing with the PII design at some point:)
We need to keep that in mind.
I agree that SPI slave opens up some really nice use cases, but at some point we are going to ask for one thing too many and end up getting less than we could have had...
I can see Chip saying he has Slave SPI implemented and then someone will chime in with a request for yet another protocol to be supported and...
I agree that SPI slave opens up some really nice use cases, but at some point we are going to ask for one thing too many and end up getting less than we could have had...
I can see Chip saying he has Slave SPI implemented and then someone will chime in with a request for yet another protocol to be supported and...
Comments
Agreed. Most of the work is in finding register space, and mapping the control and flag bits.
Also easy should be a CLKOUT option bit, even for 'async', as that (with addition of fully variable length control) then allows the P2 to use the FTDI High Speed Serial, which can manage 50MHz clock speeds.
Streaming at 50Mbd over a standard VCP into a P2 has real appeal, and it likely to be about the Pin-limit.
Most uC cannot toggle pins above 100MHz, and many limit their Serial specs to somewhat under that, to allow for Ts/Th delay effects through the relatively large and slow Pin buffers.
Sure, but you have now duplicated state machines for master and slave modes, as well as buffer registers.
Testing is a lot more complex, and you have to hope you sliced those clock domains in the correct places.
All of this effort, is to try to reach /1 on shift clocks ?, but I do not recall seeing any uC that offer /1 shifter clocks - /2 is common, in master mode, and even that for sub 50MHz devices. I think some allow /2 in slave, if the two chips have sync'd clocks.
Even looking at a 'still coming' part like the 240MHz Analog Devices M4, shows a max of 50MHz spec'd for any pins, and a max of (fSCLK/4) MHz is mentioned on external clock modes. (that is still 60MHz at the pins).
I found a NXP device, where early data mentioned 80Mbd, but 2013 data now says 52Mbd, on their SPI ports. The real world always conspires to lower practical clock speeds.
Marty
I have just been "playing" with some of your new instructions.
I just modified my VGA driver in Invaders to utilize the new non-polled WAITVID instruction as well as the new ESWAP8 Endian reversal.
This made a huge impact on my drivers size and performance. It's size was reduced from 57 longs to 41 longs!
Using ESWAP8 not only reduced the driver size but dramatically improved performance too.
I can "nearly" get away running it at 1/16 time slot but 2/16 works great. Sure the 20MHz clock increase helped too!
I think this shows that the real silicon @ 160MHz could easily do similar tasks in just 1/16 time slot.
I like your new SETTASK format too.
Very nice enhancements!
Cheers
Brian
Have You on SERA/SERB any status register else that it set Zero/Cary flag to step over if no any character are awaiting?
Glad it's all working. I ran the 80MHz version on my DE0-Nano and I watched it on and off for hours while I worked. It's good It has an 'attract' mode.
That SETTASK idea was Tubular's, I believe. It's a really nice solution to variable-length task loops.
You could use SERA/SERB to get your serial going with less overhead, too.
We can make it do almost anything, so yes.
I've been working on the RDSTACK/WRSTACK instructions today and I think I'll have it done tomorrow. Hopefully, I'll glean some clear direction from all this synchronous serial discussion by then and we'll have a nice and tidy solution.
Thanks.
That possibility will give SER function most like standard Serial port.
Call to it --- Test if Character --- If Not step ahead --- If Character wait for it and return
I have be thinking at if it is possible Z-flag can be used as Character ready to read --- C-flag can be used as Overflow -- Income Character in buffer has be overwritten by next one (Lose of Character in stream)
Chip, That was next on my list to play with.....
Cheers
Brian
This might be better with 2 possible phases : Call them Skip and Wait.
Skip jumps over, knowing it will get back again inside 1 char time. Good for small loops
Wait, knows/expects another Char is due, and does nothing else until it arrives. Use mostly during data streams ?
The HW needs to have 1 char of buffering, to give practical handshake times, and workable with packed data streams.
Here is an update to the Instruction Set for the new info from Chip. P2_Instructions_20131002a.spin
Note replaced with "a" version.
The biggest difference I see here is that the receiver is now tied to the transmitter in a way that the UART isn't.
Have You made that changes to Debugger ?
Master only does sound like a decent compromise. After all, the main reason for wanting SPI is to connect small pin count peripheral chips. I believe raw USB or Ethernet has been shot down on the basis of needing complex framing in the hardware, right?
C.W.
50Mbaud is very attractive for communicating with USB hosts with an FTDI - thanks for pointing it out, I had not run across that before.
Real world limits can be a pain
I totally agree that it is not worth a lot of pain to reach clkfreq/1 on serial, looks like my thought experiments under estimated the level of effort required to implement/test it.
/2 would still be nice if the level of effort required was reasonable, however your examples below suggest it too may require too much effort.
Fast SPI master is far more important than fast slave, so if it easier to only support fast master, it really is a good compromise.
I have to strongly disagree with that assumption.
The Prop and Prop II especially can make excellent peripheral chips to larger systems. Say an ARM board running Linux. Very common systems now a days.
As such SPI slave is very important as it is far easier to operate the host end as a master. Nobody is going to want to start writing Linux device drivers to use the SPI interface as a slave device.
Again, I have to say: slave is important. Consider you want to connect a Prop to an ARM machine or some bigger system? The Prop is the peripheral chip. Fast SPI slave woudld be a great way to sell the Prop as periheral device.
If it's just not practcal to do then so be it. I just think SLAVE can be more important than MASTER.
It's not ideal, but SPI slave can still be done entirely in software. At the end of the day, I'd rather have fast hardware SPI master support than no hardware SPI support at all. Also, many chips (including ARM, Microchip, AVR) support TTL UARTs that should be able to work with the existing P2 UART functionality.
You are right, I had my prop-centric hat on. Fast slave is also very important, as fast as is consistent with not requiring too great an effort.
Just for yucks, let's figure out the fastest possible externally clocked software SPI slave code - I'll start here:
Prop2_Docs.txt does not give the minimum number of cycles for waitpeq.
1 cycle wait*: 4 clocks per bit in ideal circumstances (which are not likely), and invalid results if spiclk is too fast.
2 cycle wat*: 6 clocks per bit
general formula: W*2+2 clocks minimum for one SPI clock, where W is the minimum wait* time
external clock software receive:
26.6Mbps @ 160Mhz - assuming W=2 is feasible
20.0Mbps @ 160Mhz - assuming W=3 is feasible
16.0Mbps @ 160Mhz - assuming W=4 is feasible
13.3Mbps @ 160Mhz - assuming W=5
external clock hardware receive:
80.0Mbps @ 160Mhz, if clkfreq/2 is possible
53.3Mbps @ 160Mhz, at clkfreq/3
40.0Mbps @ 160Mhz, at clkfreq/4
Conclusions:
- hardware externally clocked SPI slave is much faster than software based externally clocked SPI slave
- hardware SPI slave can run as a task in a cog
- software SPI slave needs a whole cog
Ok heater, you got me - much nicer to be a fast hardware peripheral to an external processor (than a slow one)
There are many systems out there can can act as MASTER to any slave peripherals. Prop II as a SLAVE peripheral would be perfect. It's not always possible to create that software on the MASTER end.
Two examples:
1) Any Linux based system. That's a lot. Linux has no SLAVE side driver. I am certainly not up to creating one. It would be great to be able to attach a Prop II as a peripheral to a Beagle Bone, Raspberry Pi and many others.
2) The Espruino has SPI MASTER support for attaching peripheral chips. Again, anyone want to create that SLAVE side driver.
Philosophically the Prop is the slave in these situations so it makes sense to have SPI SLAVE support.
Anyway as I said, if HOST only is easier and SLAVE is impractical, so be it.
We have to stop messing with the PII design at some point:)
We need to keep that in mind.
I agree that SPI slave opens up some really nice use cases, but at some point we are going to ask for one thing too many and end up getting less than we could have had...
I can see Chip saying he has Slave SPI implemented and then someone will chime in with a request for yet another protocol to be supported and...
C.W.