Ringbuffer (was Streamer Questions - how to sync)

msrobots · 2019-07-04 21:04

Well - my problem was more that when using a interrupt on falling edge of my control/data line I had to prepare the streamer (wrfast...) and keep it prepared while waiting for the interrupt.

Sadly the main program in the COG can do no HUB writes without destroying the wrfast setup needed in the interrupt.

Anyways, this is not really a issue, I was just thinking to find a way to allow access to the rest of the COG memory not needed by the ringnet driver.

But as is I can now daisy chain P2's loosing just one COG per P2 and bus-wide times 2 Pins, so a minimum of two pins and one COG per P2-node. (or 4-16-32-64 pins depending on bus wideness)

All P2s have then a shared HUB memory buffer accessible with locks and pushed around the ring, completely transparent for the P2-nodes. Get a lock read/write release lock.

The shared buffer needs to have the same size on all P2-nodes but can sit anywhere in the HUB on any node.

While the ringnet driver COG is running you have some sort of common HUB buffer between the P2-nodes, when locked transmission stalls until unlocked, so no conflicts.

That allows the use of Mailboxes across P2's. I am thinking of porting that back to the P1, can't be as fast as on the P2 using streamers but the basic concept should work there also.

Fun Times,

Mike

cgracey · 2019-07-04 22:51

Sounds neat, Mike. We should make a 4x4 P2 array board. We would need a lot more than a 2.5 amp USB charger, though.

msrobots · 2019-07-04 23:44

cgracey wrote: »

Sounds neat, Mike. We should make a 4x4 P2 array board. We would need a lot more than a 2.5 amp USB charger, though.

Yeah, since I am more a software guy I hope @Cluso99 will do a 4 prop2 board.

But even with normal single boards and some (twisted? with resistors?) servo cables for the ring it should run at sys clock/3,/4 or /5. Ground, two data lines. 3 wire connection, 2 times (it's a ring) so 4 pins and one COG total per P2. Minimum would be one data line each, so 2 pins per P2.

On one P2 it runs at sys clock/1, so I hope to get /5/4/3 and maybe /2 running over wires and still compensate for clock phase issues between the boards.

You really need to start selling them, I can't really test this with just having just one P2-eval.

But in theory you loose one COG each per P2 but have a common HUB for using the proven Mailbox concept, alas with the need of using a lock for read/write.

The nice thing is that the code is more or less the same for 1-2-4-8-16-32 data lines. The streamer is cool with that.

32 bits in and out might be a challenge, just having 64 pins, but might not be needed anyways.

Enjoy!

Mike

Cluso99 · 2019-07-04 23:51

You will be able to stack my boards, so that any pins you want joined can be stacked with the headers, otherwise use right angled headers to bring them out

msrobots · 2019-07-05 00:03

Cluso99 wrote: »

You will be able to stack my boards, so that any pins you want joined can be stacked with the headers, otherwise use right angled headers to bring them out

cool. I was just trying to tease you, since you are the only one who did a 4-p1 board, and I like that you do them boards small, usually.

Sadly I still do not have any P1 running CP/M. Any news there of when I can order one Ramblade, or what is needed to do so?

It would be nice to have a running P1 version of CP/M, since I will try to tackle emulation of other processors next.

Enjoy!

Mike

Cluso99 · 2019-07-05 00:51

OT - I will look to see what I am missing to build a RamBlade.

Just checked - have a board half-built (ex Prop & RAM) and have both chips so I'll PM you

msrobots · 2019-07-05 02:40

Cluso99 wrote: »

OT - I will look to see what I am missing to build a RamBlade.

Just checked - have a board half-built (ex Prop & RAM) and have both chips so I'll PM you

cool

rjo__ · 2019-07-05 03:12

I like the idea of stacking with user-defined connections:0)

On the streamer issue. P2<->P2 all depends upon what you are trying to do.
My favorite application is "same data," parallel compute. Compute time will greatly exceed transmission time... even at sysclock/10. So, very simple, bit banging with an 8 or 16 bit bus is all I need.

Chip:
I am all in for a P2x4x4. 16 bit bus. It will give growing brains a lot to think about... and it could be a wonderful way to move a lot of chips in a short time.

cgracey · 2019-07-05 03:26

rjo__ wrote: »

I like the idea of stacking with user-defined connections:0)

On the streamer issue. P2<->P2 all depends upon what you are trying to do.
My favorite application is "same data," parallel compute. Compute time will greatly exceed transmission time... even at sysclock/10. So, very simple, bit banging with an 8 or 16 bit bus is all I need.

Chip:
I am all in for a P2x4x4. 16 bit bus. It will give growing brains a lot to think about... and it could be a wonderful way to move a lot of chips in a short time.

I thought about an 8x8, too. That might need 150 amps at 1.8V. We'd need a PC power supply with local switchers.

evanh · 2019-07-05 03:49

I think Mike's ring bus idea is intended for daisy chaining. To make it stackable would need different I/O pins going up vs down. And then the top board would need a loop cable back to the bottom board to complete the ring.

msrobots · 2019-07-05 04:05

rjo__ wrote: »

I like the idea of stacking with user-defined connections:0)

On the streamer issue. P2<->P2 all depends upon what you are trying to do.
My favorite application is "same data," parallel compute. Compute time will greatly exceed transmission time... even at sysclock/10. So, very simple, bit banging with an 8 or 16 bit bus is all I need.

Chip:
I am all in for a P2x4x4. 16 bit bus. It will give growing brains a lot to think about... and it could be a wonderful way to move a lot of chips in a short time.

@rjo__ , different application.

The ringnet is not supposed to be the fastest parallel connection, actual the connection speed might be quite slow when running over long wires/trans receivers/whatever.

The idea is that it just shares a common data buffer for mailboxes across COGs. Sure it WILL work for a shared VGA memory buffer - but each P2 in the ring looses that amount of memory.

A 16 bit bus would use 32 pins per P2 and 1 COG, leaving 7 COGS and 32 pins for use. 16 bit width is not really needed, Say 180Mhz, 16 bits, clock/2 (maybe?), this would transfer 16 bits every 2 clocks or a byte per clock...Makes a 'cluster' of 28 COGs and 128 pins, sharing X longs of HUB...

Enjoy!

Mike

rjo__ · 2019-07-05 04:37

Correct. I was talking to readers of your thread who might come away with the feeling..."God that is complicated, maybe I should stick with my Arduino."
P2<->P2 is not a problem. There is a nearly infinite number of solutions. Some are dog simple. Some stretch the limits.

"If you want to get it done... you will."

About the bit width... remember we have different applications. It all depends on the controller. I think it can be done for many applications with part of a Cog.

rjo__ · 2019-07-05 04:46

To be complete, I should say that it also depends upon the number of P2's... asthe count goes up, your solution begins to prevail in my setting.

evanh · 2019-07-05 04:47

Your typical parallel databus setup, oriented around a main CPU with a bunch of peripheral processors, could make use of this streamer burst copying mechanism too. But, at the moment, the implementation is very basic in that it is unidirectional and relies quite heavily on a fixed burst length for both ends to know when it has completed.

So there is plenty more features to add for more flexible solutions.

msrobots · 2019-07-05 21:22

Yes, I also think there are lots of uses for the streamer. My ringnet implementation is just one of them. That it is unidirectional is sort of irrelevant since a ring does not have two ends, it has none ends

Once started it basically circles one buffer around, having it unlocked on the P2 having the current buffer and locked on all other P2s. Sure this is not the fastest possible way, and sure the overall transmit rate will vary with the number of P2s, the size of the buffer and the amount of data changed by each COG.

But the usage is dead simple, a single node does not even need to know about the ringnet, it just waits to get a lock on its local copy of the buffer, changes its local copy of the buffer and releases the lock. So I do not understand this part: "God that is complicated, maybe I should stick with my Arduino."

evanh wrote: »

I think Mike's ring bus idea is intended for daisy chaining. To make it stackable would need different I/O pins going up vs down. And then the top board would need a loop cable back to the bottom board to complete the ring.

Exactly, minimum would be two pins per P2-node, one incoming one outgoing. with more then one bit bus-wideness one would need to cross connect the pins to make sure the streamer has the bits in right order.

So when using 3 pins in and 3 pins out, with servo cables, one need to twist the cable going from P2 to P2 and provide a separate ground connection.

To prevent damage to pins thru wiring mistakes or software config errors one should have a resistors (1K?) in between the data lines/data pins.

Enjoy!

Mike

msrobots · 2019-09-13 05:32

Weekend is coming and thanks to @DaveJenson and @frida I have now 3 P2ES.

Not really a Ring, more a triangle, and in reality a small tower, because they stack nicely atop of each other @W9GFO made a nice job and I might need another small housing for Frida's P2.

I just need to move around some 20-40 boxes to find the basic stuff, Iron, resistors, servo wires, maybe the PropScope and then I will find out how fast I can go between two chips with different clock sources.

Somewhere I do have quite long servo wires, to check that too. Moving is a hassle but basically the new shop is ready to get filled, I just need to figure out the ground problem of half of them outlets. One half has ground the other one not.

And it somehow burns under my fingers to get that ringbuffer stable running over a number of P2's not just simulated on one.

Anyways thanks to Dave and Frida and stay tuned...

Mike

cgracey · 2019-09-13 10:21

Mike, the PLL on the new silicon is going to work much better for apps like this, as its jitter is ~1/10th.

samuell · 2019-09-13 16:06

cgracey wrote: »

Sounds neat, Mike. We should make a 4x4 P2 array board. We would need a lot more than a 2.5 amp USB charger, though.

I'm planning to build a 1.8V VRM capable of delivering at least 20A. Initially, it was just for recreational purposes, but it might be useful for this purpose too. I have many of the components, but two years passed and I still didn't designed the board.

It will be a proper VRM with two phases. I will have no use for it. I guess you could use it integrated in your design.

Kind regards, Samuel Lourenço

Ringbuffer (was Streamer Questions - how to sync)

Comments