Prop II question : Serial Chip-To-Chip Communication
jmg
Posts: 15,183
The new Prop II spec
Attachment not found.
says
["Chip-To-Chip Communication
Each cog now also features high-speed serial transfer and receive hardware for chip-to-chip communication. The hardware requires three I/O pins (SO, SI, CLK). Opcodes are SNDSER RCVSER SETSER "]
but this seems to exclude both Standard SPI and QuadSPI support ?
That's a large blind spot - standard SPI receives as it sends, and QuadSPI is the emerging standard for large FLASH memory.
So near, and yet so far...
SPI does not appear anywhere in the DOCs ... Do parallax imagine Prop IIs will ONLY talk to other Prop IIs ?
Attachment not found.
says
["Chip-To-Chip Communication
Each cog now also features high-speed serial transfer and receive hardware for chip-to-chip communication. The hardware requires three I/O pins (SO, SI, CLK). Opcodes are SNDSER RCVSER SETSER "]
but this seems to exclude both Standard SPI and QuadSPI support ?
That's a large blind spot - standard SPI receives as it sends, and QuadSPI is the emerging standard for large FLASH memory.
So near, and yet so far...
SPI does not appear anywhere in the DOCs ... Do parallax imagine Prop IIs will ONLY talk to other Prop IIs ?
Comments
This stuff is a specific protocol for chip to chip between prop 2s (or other chip that can do the protocol and keep up).
It uses a built-in state machine in each cog to quickly move data between cogs in different chips. The instruction is only for users who are making propeller chip meshes.
I believe Chip designed that instruction to move data at some insane speed between prop chip cores. It does not send even SPI like data I believe. The instruction is for users who want to really make a multi-processor prop chip system. It should make that possible.
---
SPI will be bit banged as before. It would be nice to have hardware SPI support. Given the speed of each core and the limits of the SPI bus there will be more than enough time to bit bang out data for the SPI bus. Since there are no interrupts its not like the core will be doing anything else.
---
In general, when there are no interrupts in the whole system... Having a fast processor and then just making it do everything manually gives you more control. Since, again, the processor has nothing else todo.
I guess you can get into the details of using the JMPRET instruction. But, that's a different manner.
QuadSPI really is an emerging standard, it seems in the rush for the fancy stuff, the basics got overlooked here ...
Details of this 'special protocol' is where ?
I can see a pin-mapping opcode, but no mention of speed control ?
If this goes across isolation barriers, some speed choice will be needed.
That's incorrect - I have plenty of more important things things for the core to do, than twiddle pins.
Using your logic, no fast serial silicon at all would be included.
There IS fast serial silicon, which is a clear admission that speed and low overhead really does matter.
Why limit that, to just talking to another Prop II ??
I am into dynamic loading of the cogs. While you can do this now, the code must be in hub to load from. While it has been done, I am trying to make it more general to dynamically load from SD card. Of course, the concept will also work for I2C eeprom and flash too.
So, what we have is a huge set of peripherals that are soft configured. For the prop 1 you could have 16 2wire (TX & RX) serial ports, all running at different speeds (=32 I/Opins). ANd the same chip on a different board can have 16 I2C busses, or 8+ SPI busses, etc, etc, etc. Do you see where this is leading. We dont have interrupts to disrupt our code. And our software peripherals are much smarter than just a simple old UART, yet are actually simpler to use because we dont have lots of registers to control them. And within the software peripheral we can also do some processing on the data as it comes in and/or out. So, we can also do some protocols here too. There are lots of objects to do lots of different peripherals. Only downside is what are good and what are not, which have good comments and which do not. But this is getting better all the time.
While on UARTs, how many character buffers do you want? If you take over the whole prop, you could have an input buffer of almost 32KB! Just takes thinking in a different way.
How many instructions does it take to flash a led on other processors (and I don't mean using C or similar where they load many KB worth of data just to start with)?
Hope I got this right!
And here is one to output a message to the serial port. One of the cogs is used to perform the software uart (in the object FullDuplexSerial)
Only now in Prop II, I DO have the hardware, and yes, I hate to waste it.
I want to use the hardware that is already paid for and sitting there, designed for fast serial IO, for fast Serial IO.
I would not waste to much time in discussing the specs at this point in time! On one hand the chip is not finalized yet, but on the other hand each change to the basic architecture or bigger building blocks would delay the release date some more years!
jmg, I understand your point: when there is some buildin serial interface why not support a standard instead of doing your own stuff? Well, I don't know why but I believe there must be a reason and I trust Chip in this matter.
And in the end Chip is the one who designs the prop II. Listening to each and every comment coming from outside simply adds delay. Parallax has to pay the development and thus they have the right to design the silicon in their way! You on the other hand have the freedom to buy the prop II or use another uC.
As all other already stated, it's the goal to add as little specialized hardware as possible. I like the C2C idea even if it is not standard. Having a prop II dedicated to do all the user interface stuff and another one which is dedicated to the system-critical stuff and both can communicate with little delay and software resources makes sense.
There is a full duplex driver for 4 serial interfaces for the prop I. Having 10 times more MIPS per COG theoretically allows to have 40 serial interfaces in one COG. I bet there will be drivers which allow to have SPI, I2C and serial interface driven by one COG. So, 7 COGs left for doing your more important stuff.
You have almost exactly described a SPI port there. That is all it does, too.
For the want of a hand full of flip-flips & a few SFR bits, this Serial shifter COULD still be a lot more flexible.
It could actually support a standard interface, as well as the 'in-house' one.
QuadSPI parts can clock to 100MHz/104MHz, and they are cheap, and widely available.
Programmable Logic can do QuadSPI using DDR IO.
> It could actually support a standard interface, as well as the 'in-house' one.
now that makes sense.... You don't want to tie pins to a task - i.e. an ADC pin, or SPI pin as other manufacturers do, but having some hardware functions per-cog that can drive fast SPI would be very useful and very powerful.
James
I am inclined to agree. There was much discussion about this a long while ago and myself and others were proposing serdes hardware assistance in such a way that different serial input and output could be configured. Perhaps not optimal for serious high speed chip to chip comms but general purpose.
I would like to be proven wrong though.
QuadSPI is simply an electrical variant, so there is no license.
Quite a few uC already offer this, and I believe the new Infineon XMC45xx has this down to quite low cost parts.
You may be thinking of some Consumer Storage cards, where the form factor and some of the format/control, is licensed by some companies.
I don't really see the problem here.
Try
http://www.winbond-usa.com/hq/enu/ProductAndSales/ProductLines/FlashMemory/SerialFlash/
They have had 104MHz devices for a while. Faster ones WILL come, in the next few years.
This is not just for SD cards, it is also for fast table access, external code fetch, and communicating to other devices that are NOT Prop IIs using an Industry Standard interface.
If I was doing a new chip design now, I would even include DDR (optional) QuadSPI , as a half speed CLK is easier to transport and isolate than a full speed one.
The point is, the Prop core can do many other things besides flip pins, and a 100MHz QuadSPI would deliver/accept 1 Word every 16 machine cycles (A DDR one would do it in 8 cycles) - and those cycles are far better spent doing data manipulation, than pin flips.
A relatively minor silicon change, would mean a BIG increase in usable processing + data bandwidth.
FLASH is also one thing a Prop lacks, so being smart about how you talk to external flash, is pretty obviously important.
No doubt we will be able to use this for other things once the details are known.
One thing that starts to stand out in the Prop II threads is the increased value of faster Cogs. Throwing a whole Cog at a single minded task of one I/O function/engine is looking more and more wasteful as more and more transistors go into the Propeller design. Or, the more emphasis on MIPS per Cog the more precious each Cog becomes and the more pressure there is to have them doing more than waiting on a pin change to occur.
Needless to say this all leads to more demands for specialised hardware to perform common I/O functions.
Is this the only path? Can there be a smaller core design that can multiply out to 16, 32, 64 cores without losing a lot of individual bandwidth and the niceties of a register rich 32 bit architecture?
In effect, 16 Cogs with the Hub access of 8 Cogs.
Sure, but that breaks the software compatibility.
An easier 'family variant', could be one with 4 cogs dropped out, and MORE RAM dropped in, even PSRAM.
NOT much design effort there, we just need a number on how much RAM a COG maps to.
Or one that does not allocate full Maths to ALL COGs.
Another number here, for Die cost of Maths in All Cogs ?
The way I see it is that if one had 64 smaller cores whatever bandwidth they have to the outside world is going to be wasted because now you have 8 times more contention for the HUB RAM. There is not much space in COG so data has to go to HUB and there is you bottle neck.
This was discussed a long while ago when Chip was musing over having 16 COGs or more RAM. With any shared memory architecture and especially if you have the round robin HUB access of the prop adding more cores has diminishing returns.
A very simple topology to take would be like JTAG and establish a token ring configuration.
Let's say you have 4 devices, they are all slaved off of an external oscillator with equal trace lengths to the clock bus. The output of one chip is tied to the input of it's neighbor, repeat until all chips are connected. If you want to communicate from the first chip to the last chip the data has to pass through 2 chips to get there.
The alternative is to tie the tx and rx together into a single wire, but you need some sort of arbitration scheme then, like CSMA/CD.
It boils down to if you want to do 200Mbps comms, you have some compromises you have to accept to get that level of speed.
The alternative is to use a coded message passing scheme. If it were me, I'd use a token ring topology and the JTAG chaining.
Provided the P2 takes off, developing new P2 variations are not so expensive time and money wise now - Chip has learnt a lot and there are better tools
(rather than polute this thread, see my musings thread where heater put forward some ideas. I am going to put some up there now too)
That is SPI chaining as per default recommended wiring. I guess that figures since JTAG uses a SPI port.
Of course, this slaughters determinism for the low priority thread. Not sure if that's so desirable.
EDIT: I guess this *is* more like an interrupt than a thread. O_o
That sure brings on the anticipation.
Err, a flaw, maybe a fatal one ... any Hub accesses from the interrupt code are indeterminate ... or not, Hub accesses after a WAIT on the Prop1 are indeterminate anyway, right?