Parallel Prop2's

fixmax · 2018-03-20 15:12

Something I'm curious about. With it locked into an 8 core Prop 2 running 120mhz, I'm curious about any plans on linking together multiple Prop2 chips to get more cores than 8. I know several folks have done 2, 4, or more chip board version for Prop1's.

Is there a mechanism built into, or designed for, that will provide a seamless (relatively speaking) method of utilizing more than 8 cores in a design? Something along the lines of the XMOS chip to utilize more cores than the chip has by using the link mechanism?

Even if it's just an application note or something, I would be interested in hearing about this. I had been interested in seeing this done using Forth (couldn't remember whose version), and with the Prop2 having Tachyon in some form, maybe this will be easily done? Anyone thought about this somewhat?

What would really be cool is a compiler with a define where you could have multiple Prop targets in one project file, and it loads each Props portion of code, with a mechanism to pass data between them and a synchronizing method (clk or something)

K2 · 2018-03-20 17:07

160 MHz. 120 MHz was the hypothetical 16-cog version.

fixmax · 2018-03-20 17:55

Yep, my mistake.

Seairth · 2018-03-20 19:12

There have been a number of discussions on the topic of multi-chip configurations, though most/all are buried deep in larger threads. I know it drove some of the design of the smart pins, particularly to make the mechanism fairly simple and to maximize throughput. There is no dedicated hardware support (i.e. a special link driver just for that purpose), but the current smart pins should open up quite a number of possibilities to meet varying needs.

jmg · 2018-03-20 19:16

Seairth wrote: »

There have been a number of discussions on the topic of multi-chip configurations, though most/all are buried deep in larger threads. I know it drove some of the design of the smart pins, particularly to make the mechanism fairly simple and to maximize throughput. There is no dedicated hardware support (i.e. a special link driver just for that purpose), but the current smart pins should open up quite a number of possibilities to meet varying needs.

I suspect multiple P2's will need to bypass each ones PLL, and drive from a common 160MHz SysCLK, in order to have predictable smart pin phase from unit to unit.
That's not trivial and a lot of EMC issues ...
Lower speed links might be able to send CLK:DATA and use a slave clock setup, and so allow the PLL to be used.

Seairth · 2018-03-20 19:29

jmg wrote: »

I suspect multiple P2's will need to bypass each ones PLL, and drive from a common 160MHz SysCLK, in order to have predictable smart pin phase from unit to unit.
That's not trivial and a lot of EMC issues ...
Lower speed links might be able to send CLK:DATA and use a slave clock setup, and so allow the PLL to be used.

True, but that's primarily a concern if you are trying to achieve maximum transfer rates. In many cases, simple two-wire async serial will be more then adequate. If I recall, you had run some numbers at one point for the maximum reasonable baud rate between two P2 running at 160 MHz (or maybe it was 120 MHz, at the time). Or may @cgracey came up with those numbers. Either way, those numbers (4+ MBps?) seemed quite fast enough for a lot of use cases. And if the thing you're trying to do with the P2 is massive parallel processing, I suspect you've picked the wrong architecture in the first place.

lardom · 2018-03-20 19:34

I am starting to wonder about communication between two P1's using SPI or I2C. My current project will need more than 8 cogs.

jmg · 2018-03-20 19:44

Seairth wrote: »

True, but that's primarily a concern if you are trying to achieve maximum transfer rates. In many cases, simple two-wire async serial will be more then adequate. If I recall, you had run some numbers at one point for the maximum reasonable baud rate between two P2 running at 160 MHz (or maybe it was 120 MHz, at the time). Or may @cgracey came up with those numbers. Either way, those numbers (4+ MBps?) seemed quite fast enough for a lot of use cases. And if the thing you're trying to do with the P2 is massive parallel processing, I suspect you've picked the wrong architecture in the first place.

Standard UARTS are quite comfortable at SysCLK/8, which means 20MBd at 160MHz. You might push SysCLK/6 or SysCLK/4, but are also likely to bump into SW timing at those speeds.
IIRC P2 does support 32b Async, so that drops the 'char' rate Sw has to support.

jmg · 2018-03-20 19:53

lardom wrote: »

I am starting to wonder about communication between two P1's using SPI or I2C. My current project will need more than 8 cogs.

A custom P1-P1 i2c could be interesting.
i2c has speed specs at 1MHz, 3.4MHz and 5MHz
A DDR i2c variant might work well on P1, ie use standard i2c for address, and once ACK is seen, you either keep std i2c, or change gears to DDR for data packets ?

rjo__ · 2018-03-20 20:07

Chip mentioned that in the future LVDS will be in his thinking... P2.5? I think this was mostly for noise immunity, more peripherals, and the ability to get a dispersed set of P2 boards talking with as few pins as possible. But in fact, I don't see a reason why multiple P2's can't be on a single PCB talking pretty well over a common bus. Throw some extra memory in and you are good to go:)

BUT...

you have to admit, the idea of just plugging a bunch of P2 boards together with in-built LVDS seems pretty attractive.

lardom · 2018-03-20 20:43

jmg wrote: »

A custom P1-P1 i2c could be interesting.
i2c has speed specs at 1MHz, 3.4MHz and 5MHz
A DDR i2c variant might work well on P1, ie use standard i2c for address, and once ACK is seen, you either keep std i2c, or change gears to DDR for data packets ?

Thanks for the reply. I was just curious if SPI or I2c made sense for P1 - P1 communication. My current project already uses 7 cogs and I need more. My thought was that one P1 would loop until it received a specified value. If that is doable it would solve a lot of problems.

Peter Jakacki · 2018-03-20 21:14

@lardom - Besides smartpins the P2 also has interrupts but a lot of the time the limitation is in how the software is written. I do lots of stuff now with the P1 and very rarely do I ever need more than 3 or 4 cogs and then mostly then as UARTs whereas the P2 can support up to 32 full-duplex UART channels probably just with a single cog.

I have 2M baud 9-bit asynch half-duplex coms for networking multiple P1s which can use RS485 or just a single I/O if they are relatively close together. The network simulates full-duplex so that I can even chat or download code to an individual P1 or address a group etc.

lardom · 2018-03-20 22:22

@Peter Jakacki, I remember watching that video a while back but not understanding it! I'm going to learn about UART or RS485. My education continues... Thanks.

msrobots · 2018-03-20 23:11

I think the closest we have here as concept is @Beau,s HS ring buffer.

It works nice on a P1 and should do the same on a P2.

The basic idea is to share some HUB ram buffer by sending it continuously from one Prop to the next one in a ring of propellers.

Every Propeller can read the whole buffer in its own HUBRAM and modify its content. If I remember correctly @Beau used one cog for sending and one COG for receiving the HUB buffer continuously.

This allows, with careful planning of the buffer structure, to utilize the common mailbox system used with PASM on one Prop to extend to multiple Props.

I guess using the smart pins continuous sending and receiving can be done in one COG instead of two as @Beau did.

Certainly I will try that whenever I can get hold of multiple P2 boards.

Mike

jmg · 2018-03-21 01:07

lardom wrote: »

jmg wrote: »

A custom P1-P1 i2c could be interesting.
i2c has speed specs at 1MHz, 3.4MHz and 5MHz
A DDR i2c variant might work well on P1, ie use standard i2c for address, and once ACK is seen, you either keep std i2c, or change gears to DDR for data packets ?

Thanks for the reply. I was just curious if SPI or I2c made sense for P1 - P1 communication. My current project already uses 7 cogs and I need more. My thought was that one P1 would loop until it received a specified value. If that is doable it would solve a lot of problems.

That networking of P1's probably needs a separate thread, but 'some megabaud' looks doable.
A Wait-test-rotate-djnz compact loop rounds down to a 4-4.44MHz data rate.
You could send a frame of [Address.R/W] as standard i2c, and then send [VarAdr16][Data32]..[Data32] for a compact but highly flexible P1-P1 interface.

lardom · 2018-03-21 14:00

@jmg, Agreed. My apologies to @fixmax, Thanks for helping me think through the issue of Parallel 'P1' communication. I think I can now come up with a customized solution.

kbash · 2018-03-26 00:15

The Ring buffer concept works fine for hooking multiple props together. I haven't used Beau's ring buffer but I wrote something like it many years ago. I had a board with three props on it.

It required one cog to be dedicated to producing a clock signal, each processor used two pins, one for the clock, one for data with a pull-down resistor on the data line.

I never took the time to optimize the code in assembly, the spin version worked fine for my application, but the basics of the scheme was:

A sync pulse of some length ( I don't recall the count... 4 - 8 time cycles) was sent at the beginning of a packet transfer followed by 384 clock "Ticks". When the other processors needed to "Listen" or "Talk" They would wait for this low sync pulse, then as soon as the clock went high again, this was the start of the packet for processor A, with a bit transferred at the start of each clock tick. After 128 "Ticks" it was processor B's turn to transfer data, after 256 ticks, Processor C's turn.

Each processor had to have a slightly different version of the program so they knew where their block of data started, but with only 3 processors, it wasn't a big deal to change the code for each. As their "Turn" to talk came around, they could pull the data line high or leave it low for each data bit. The other processors would "Listen" for whatever data from the other processors was relevant to their needs.

This wasn't really elegant code... I wouldn't use it for connecting more than two or three processors together, but it worked fine to let me have a controller board with lots of IO and 23 available cogs working together as a team.

Depending on the application, the buffer could be smaller or larger or more data lines could increase transfer speeds if needed.

Parallel Prop2's

Comments