Is it possible to use two P2 chips, in order to emulate a single MCU with 16 cogs?

samuellsamuell Posts: 465
edited 2019-10-27 - 14:18:11 in Propeller 2
Hi,

This question may be rhetorical in nature, although I'm looking for a practical application. Is it feasible, or even possible, to use two P2 chips together to form a 16 cog MCU? Any loss of usable pins is acceptable, by the way, as long as the link between chips is fast enough so as not to represent a bottleneck.

And what about a configuration consisting of four chips, totaling 32 cogs?

Kind regards, Samuel Lourenço

Comments

  • evanhevanh Posts: 8,115
    edited 2019-10-27 - 14:35:51
    A fast link is possible using a streamer in each chip. Maybe even up to sysclock data rate. But that still would not be a patch on what hubram would be capable of in a 16-cog prop2. A 16-cog prop2 literally has double the internal bandwidth of an 8-cog prop2.

    PS: There is twice the number of hubram slices to go with twice the cogs. Each has its own data bus and address bus with a hulking cross-point switch, named the "egg-beater", muxing them all together.
    I love the whooshing sound of deadlines as they fly by.
  • evanhevanh Posts: 8,115
    edited 2019-10-27 - 15:11:50
    The egg-beater sequences the switching in an orderly manner so that each hubram slice gets presented to each cog once every N clocks. N = number of cogs and slices.

    It's like two concentric rings made of N segments, where one ring is rotated in relation to the other. Each of the outer ring segments (cogs) mate up to a corresponding inner ring segment (ram slice) for one clock all together. Then the next clock steps the ring by one segment to the neighbouring slices, and so on ...
    Addressing of the first slice is ordered as:   0, N, N*2, N*3, ...
    Addressing of the second slice is ordered as:  1, 1+N, 1+N*2, 1+N*3, ...
    Addressing of the third slice is ordered as:   2, 2+N, 2+N*2, 2+N*3, ...
    Addressing of the fourth slice is ordered as:  3, 3+N, 3+N*2, 3+N*3, ...
    ...
    
    I love the whooshing sound of deadlines as they fly by.
  • Paralleling chips always will be less powerful than inter-cog communication on a chip. But with the P1 I realized a transparent communication protocol between cogs, using serial communication to connect multiple chips. As the P2 is so much faster, having multiple chips connected will allow to solve problems that can be distributed, what most of the problems are. With the P2 available I will go to Peters P2D2 and solve such problems myself.
    no reason to reason if you feel feelings: in love with the propeller

    How-2-TACHYON
  • jmgjmg Posts: 14,027
    samuell wrote: »
    This question may be rhetorical in nature, although I'm looking for a practical application. Is it feasible, or even possible, to use two P2 chips together to form a 16 cog MCU? Any loss of usable pins is acceptable, by the way, as long as the link between chips is fast enough so as not to represent a bottleneck.
    Sure, you can connect as many as you like really.
    The last 'bottleneck' bit is vague, any link connection forms something of a choke point, but the P2 streamer is giving very high burst speeds on Video.

    Async links will be very forgiving, and run to speeds well above FS-USB.

    Running multiple P2's from a common CMOS clock source might lock well enough in the PLL, to use Sync links.

    I've not seen any numbers go past yet, on jitter spreads across 2 or more PLL's.


  • Cog2Cog show how to communicate between cogs on a single chip via hub memory. This method can be expanded to communicate between two separate chips. I soon will be able to focus a little on P2 and Tachyon and check how communication can take place using the tachyon inter cog communication mechanism
    no reason to reason if you feel feelings: in love with the propeller

    How-2-TACHYON
  • evanh wrote: »
    A fast link is possible using a streamer in each chip. Maybe even up to sysclock data rate. But that still would not be a patch on what hubram would be capable of in a 16-cog prop2. A 16-cog prop2 literally has double the internal bandwidth of an 8-cog prop2.

    PS: There is twice the number of hubram slices to go with twice the cogs. Each has its own data bus and address bus with a hulking cross-point switch, named the "egg-beater", muxing them all together.
    ErNa wrote: »
    Paralleling chips always will be less powerful than inter-cog communication on a chip. But with the P1 I realized a transparent communication protocol between cogs, using serial communication to connect multiple chips. As the P2 is so much faster, having multiple chips connected will allow to solve problems that can be distributed, what most of the problems are. With the P2 available I will go to Peters P2D2 and solve such problems myself.
    That is the bottleneck I was "afraid" of. But it may be fast enough for parallel processing. Maybe I'm over my head, but I'm thinking using two chips in a configuration such that they can be programmed from the same header. Probably, chip B will have to be programmed via chip A, which is programmed via the USB port. That implies a bootloader, I'm imagining.
    jmg wrote: »
    samuell wrote: »
    This question may be rhetorical in nature, although I'm looking for a practical application. Is it feasible, or even possible, to use two P2 chips together to form a 16 cog MCU? Any loss of usable pins is acceptable, by the way, as long as the link between chips is fast enough so as not to represent a bottleneck.
    Sure, you can connect as many as you like really.
    The last 'bottleneck' bit is vague, any link connection forms something of a choke point, but the P2 streamer is giving very high burst speeds on Video.

    Async links will be very forgiving, and run to speeds well above FS-USB.

    Running multiple P2's from a common CMOS clock source might lock well enough in the PLL, to use Sync links.

    I've not seen any numbers go past yet, on jitter spreads across 2 or more PLL's.
    That looks promising! For the clock distribution, I'll probably use some low jitter solution, such as the CDCLVC1102. I've used this chip before, with success.

    Kind regards, Samuel Lourenço
  • And suddenly I've thought about an octa-chip design. Lets call it P2B4, in homage to Peter's P2D2. For the power supply section, I'm imagining 5V chips with big inductors on the side.

    Kind regards, Samuel Lourenço
  • go thru this thread, it works for the first silicon, the streamer commands changed and I do not have a rev B yet.

    http://forums.parallax.com/discussion/170216/ringbuffer-was-streamer-questions-how-to-sync/p1

    Mike
    I am just another Code Monkey.
    A determined coder can write COBOL programs in any language. -- Author unknown.
    Press any key to continue, any other key to quit

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this post are to be interpreted as described in RFC 2119.
Sign In or Register to comment.