What is needed to integrate a dual/quad P2 system?
samuell
Posts: 554
Hi,
I'm studying the hypothesis of creating a system integrating two, or even four, P2 chips. My idea is to seamlessly emulate a 16 or 32-core MCU. Power delivery is not an issue (it is easy for me to design DC-DC converter modules that are able to supply 5A or more). Also, I'm expecting to design the clock distribution system so that the chips can be clocked simultaneously. Probably I'll have to disable the PLL on the chips, and clock them directly (300MHz or more, if possible). I'll implement active heat sinking, of course.
As for the power delivery, I'll supply 12V to the system, that will be down-converted to 3.3V and 1.8V to the chips. The DC-DC converter for 1.8V will have to be beefy, for sure. The 3.3V will probably be supplied by a very low noise DC-DC converter, as local LDOs are not really needed (except for any pins that are to be analog in nature).
My question is, what pins have to be connected to integrate both (or four) chips into a single 16-core (or 32-core) emulated MCU?
Kind regards, Samuel Lourenço
I'm studying the hypothesis of creating a system integrating two, or even four, P2 chips. My idea is to seamlessly emulate a 16 or 32-core MCU. Power delivery is not an issue (it is easy for me to design DC-DC converter modules that are able to supply 5A or more). Also, I'm expecting to design the clock distribution system so that the chips can be clocked simultaneously. Probably I'll have to disable the PLL on the chips, and clock them directly (300MHz or more, if possible). I'll implement active heat sinking, of course.
As for the power delivery, I'll supply 12V to the system, that will be down-converted to 3.3V and 1.8V to the chips. The DC-DC converter for 1.8V will have to be beefy, for sure. The 3.3V will probably be supplied by a very low noise DC-DC converter, as local LDOs are not really needed (except for any pins that are to be analog in nature).
My question is, what pins have to be connected to integrate both (or four) chips into a single 16-core (or 32-core) emulated MCU?
Kind regards, Samuel Lourenço
Comments
How many free I/O pins do you want?
Do you want all cogs to have access to all hub memories?
Will inter P2 communications be cog to cog or simply P2 to P2?
An interesting idea and I'm sure there are a lot more things to decide.
I don't need to have many free I/O pins. Probably, one of the P2 chips will have to dedicate some pins to USB communication. I don't know if the other should have access to an external EEPROM, or a dedicated EEPROM for each one, for that matter. Anyways, the more pins they share, the better.
Definitely. That would be more seamless.
Cog to cog, definitely, if possible.
This is just an expensive experiment, and it is still in the thought phase (not even in the design phase). But, I'm glad you are interested. All options are open, for now.
Kind regards, Samuel Lourenço
That would allow you to precisely phase shift P2's.
Kind regards, Samuel Lourenço
There is a 10 pin version on latest P2D2's, that has 3 clock outs - even that could drive 4 P2's.
Plus it have a 4bit integrated SD interface.
I also think that an Si5351A is more than enough to deliver the clock to the P2s
As long as we don't need fractional derived clocks, clock skew between the P2s should not be a problem. And transmitting a 150MHz clock is hardly an issue, I think.
I'm yet to figure what pins to use. Certainly, one of the P2s will act as a master, and will be interfaced via USB, manage the clock generator snd access the flash memory. The other could be programmed via the first. The bus between them would use dedicated pins. I wonder if it is doable.
Kind regards, Samuel Lourenço
Oh, I am pretty sure it's doable using either a shared bus between all the P2's or direct connections between the individual chips. Took a quick look at routing a shared 16 pin bus for 4 P2's last night and it does not appear to be too hard. Will do the same for direct P2-P2 connections tonight if time permits.
currently I have something running on 2 P2 rev a but had no time to convert it to P2 rev b, yet
There is a long winding thread about it, but not up to date anymore.
I call it Ringbuffer, It is not the fastest solution, but the basic Idea goes like this.
You dedicate a certain amount of HUB ram as shared. It has not to be at the same location in each P2, but has to have the same size. This buffer will be send around and is locked with a P2 software lock.
Each P2 gives up one COG and 2 to 64 pins (64 makes no sense but could work, theoretically)
Half of the pins are input, the other half output. all P2 needs to be daisy chained, the last back to the first, so all P2 are connected as a Ring.
One master-Cog needs to start the communication, and then the Buffer gets send from P2 to P2 and when a new buffer is received, the lock gets released so other COGs can take possession of the buffer (locking it) do their respective read or write and unlock the buffer. The communication COG locks it again and sends it to the next P2.
So there is ONE buffer circulating round robin from P2 to P2.
But basically one has now a shared HUB protected by a lock across multiple P2s.
The use could be like normal mailboxes used on P1 except the COG who wants to read or write needs to aquire a lock before doing so. Else it is completely self contained in the COG and the other COGs can be completely unaware of it, except the need to use a lock when using the shared buffer.
theoretically, since the streamer modes changed.
Mike
Kind regards, Samuel Lourenço
So my questions is this: What do you intend to do with a quad array of P2s, and how is this more effective than currently available solutions?
The usual way to talk to COGs on the P1 is a Mailbox interface in HUB ram. So I share the mailboxes from multiple P2s in a buffer getting send around.
Say I have one P2 with @rogloh's videodriver using a lot of HUB RAM and sharing a mailbox buffer with the rest of the P2s. And I have another P2 talking to the video driver as if it is on its own HUB. Maybe video driver is wrong here as example it would be more the graphics engine attached to the video driver I would need to talk to via ring buffer.
The basic Idea is to have more COGs and more Ram and more Pins in a expandable way, but relatively transparent to the software. Each P2 has access to the buffer, one at a time, and it gets send around by the streamer, so the code is pretty much the same independent of the data bus width.
So I have one P2 with Keyboard, Mouse, Video (even 2?) and that is connected to some other P2s I can program and they can share the resources Keyboard and Mouse and Screens by accessing their own HUB ram.
Would even work transparent with a TAQOZ P2, Once loaded the Ringbuffer Cog is independent, and TAQOZ could read and write its local copy of the Buffer in HUB provided it is using the lock the PASM Ringbuffer uses to avoid collision and to synchronize all buffer.
need for it? Not sure, but why not?
Mike
And, why not?
Kind regards, Samuel Lourenço
Kind regards, Samuel Lourenço
Can you assemble them without separating them? I might start saving my pennies if you can.