Cog to Cog comms?

Baggers · 2015-12-28 10:19

Hi Chip, or anyone else on the forum,

What's the best/fastest way to set a flag/variable on one cog to be read by another, IIRC we used to have portD but correct me if I'm wrong, but that has gone now.

As writing a byte/word/long to HUB ram may be slow, as having another cog stuck in a read loop would have to wait for the HUB to come around again for each read.

Cheers,
Jim.

Rayman · 2015-12-28 14:30

Hmm That's a good question.
I don't think there is any shared cog memory.
So, I guess there's no way to avoid having to wait for the HUB to roll around to share between cogs...

Rayman · 2015-12-28 14:41

I wonder if "locks" are an exception... Maybe that is a shared register?

kwinn · 2015-12-28 14:53

I wonder if the smart pins could have served that function. That is, to allow a shared pin to transfer data between two cogs.

pjv · 2015-12-28 16:04

Baggers,

That is what the special registers (16 longs per cog ?) I asked for way back are all about. Just write to the specified hub ram, and the associated cog will do an interrupt. Eliminates waiting and polling in a loop.

Cheers,

Peter (pjv)

cgracey · 2015-12-28 16:45

Baggers wrote: »

Hi Chip, or anyone else on the forum,

What's the best/fastest way to set a flag/variable on one cog to be read by another, IIRC we used to have portD but correct me if I'm wrong, but that has gone now.

As writing a byte/word/long to HUB ram may be slow, as having another cog stuck in a read loop would have to wait for the HUB to come around again for each read.

Cheers,
Jim.

Aside from those special 16 hub longs which pjv mentioned, we only have external pins. Well, we have locks, but they are tied to hub timing, as well.

Jim, can you elaborate on what you are wanting to do?

Baggers · 2015-12-28 18:52

Thanks for the feedback guys

Chip, it's for my display driver letting the render cogs know what scanline it's up to, I don't want it interrupting the cog just setting a flag to say go!

And one for a procedural cog to say "start this function" assuming it's also set a flag to say it's ready to receive.

Cheers,
Jim.

cgracey · 2015-12-28 19:11

Baggers wrote: »

Thanks for the feedback guys

Chip, it's for my display driver letting the render cogs know what scanline it's up to, I don't want it interrupting the cog just setting a flag to say go!

And one for a procedural cog to say "start this function" assuming it's also set a flag to say it's ready to receive.

Cheers,
Jim.

Can the other cogs know ahead of time, on their own, from an initial signal, what scanline they are on and what they are supposed to do? I would think to use GETCT, but maybe because the streamer NCO tracks differently, this wouldn't work. Do you need to convey a single-bit signal, or also word data?

jmg · 2015-12-28 19:18

Baggers wrote: »

Chip, it's for my display driver letting the render cogs know what scanline it's up to, I don't want it interrupting the cog just setting a flag to say go!

And one for a procedural cog to say "start this function" assuming it's also set a flag to say it's ready to receive.

There are also many general COG-Sync-up apps, where it would be nice to not consume a pin to WAIT Sync to the same SysCLK.

cgracey wrote: »

Do you need to convey a single-bit signal, or also word data?

In the simplest form, a single bit is workable, but of course, data is nice too

A small variation supports OR, to allow multiple slave COGs to signal DONE from the slowest one.
ie taking open collector model, they all pull low and release when done, & master waits on _/=

potatohead · 2015-12-28 19:35

Usually, writing them to watch a signal means having them do the processing needed and recycle for another buffer, or something else, quickly.

Sprites, for example, vary widely. High number of them per line means a lot of work NOW, low number of them means less work now, so maybe they can do something else... Or say it's desirable to use a variable number of them. If they are written to just blast through the work on signal, it's possible to pick, say 3, and the evaluate whether they get it done. Run the sprites and watch. If they fail, and are latched to the signals, they fail nicely with something not drawn that instant, everything else stays synced.

Seems like the locks might just be the quickest. Bill mentioned using them for that purpose. I've not done it myself.

Rayman · 2015-12-28 22:10

The 16 special hub longs also have to wait for hub access to write, don't they?

Is the interrupt instantaneous after that? Or, does that also wait for hub access to come around?

pjv · 2015-12-29 00:42

Good question Rayman.

I suspect yes to the first, and hopefully no to the second. We will have to wait for Chip to reveal the answers.

Cheers,

Peter (pjv)

cgracey · 2015-12-29 01:18

Rayman wrote: »

The 16 special hub longs also have to wait for hub access to write, don't they?

Is the interrupt instantaneous after that? Or, does that also wait for hub access to come around?

It's coincident with the write in the hub memory, after the writing cog waited for the hub slot. After another cog gets the interrupt or notes the event, he will have to wait for his hub cycle to read it.

As someone pointed out, the SETQ+WRLONG/RDLONG is a really efficient way to swap contexts. The hub waits are not avoidable, though.

Rayman · 2015-12-29 02:57

I think with the more MIPS and more cogs, we can find ways to do everything we did with P1 and more.
Would be nice to have an internal Port C, like maybe P2 Hot did, but I'm happy with what we have.

This P2 is different that the old P1 and P2-hot, and in some cases we have to think new ways about how best to get things done...

msrobots · 2015-12-29 03:35

Yeah, one long memory ported 16 times to be accessible from all cogs at the same time sounds mind destroying. But each cog could use 2 bits as flags.

Enjoy!

Mike

kwinn · 2015-12-29 04:49

cgracey wrote: »

Baggers wrote: »

Hi Chip, or anyone else on the forum,

What's the best/fastest way to set a flag/variable on one cog to be read by another, IIRC we used to have portD but correct me if I'm wrong, but that has gone now.

As writing a byte/word/long to HUB ram may be slow, as having another cog stuck in a read loop would have to wait for the HUB to come around again for each read.

Cheers,
Jim.

Aside from those special 16 hub longs which pjv mentioned, we only have external pins. Well, we have locks, but they are tied to hub timing, as well.

Jim, can you elaborate on what you are wanting to do?

I wasn't suggesting using the pin itself, so much as wondering if the signals that are already going to the pin could be used or modified to send and receive data to/from another latched bit or register on that smart pin. A simple sharing of the pin circuit where one cog can only write and the other read the latched bit/register.

cgracey · 2015-12-29 04:59

kwinn wrote: »

cgracey wrote: »

Baggers wrote: »

Hi Chip, or anyone else on the forum,

What's the best/fastest way to set a flag/variable on one cog to be read by another, IIRC we used to have portD but correct me if I'm wrong, but that has gone now.

As writing a byte/word/long to HUB ram may be slow, as having another cog stuck in a read loop would have to wait for the HUB to come around again for each read.

Cheers,
Jim.

Aside from those special 16 hub longs which pjv mentioned, we only have external pins. Well, we have locks, but they are tied to hub timing, as well.

Jim, can you elaborate on what you are wanting to do?

I wasn't suggesting using the pin itself, so much as wondering if the signals that are already going to the pin could be used or modified to send and receive data to/from another latched bit or register on that smart pin. A simple sharing of the pin circuit where one cog can only write and the other read the latched bit/register.

Well, I guess I could make a mode like that. Good idea. Interesting.

jmg · 2015-12-29 05:29

kwinn wrote: »

I wasn't suggesting using the pin itself, so much as wondering if the signals that are already going to the pin could be used or modified to send and receive data to/from another latched bit or register on that smart pin. A simple sharing of the pin circuit where one cog can only write and the other read the latched bit/register.

Similar to this, it may also be possible to 'bury-for-free' the Pin signals of a Pin used by other (Smart pin) functions.
eg if someone has a set up a Quadrature counter, a choice of
a) Std pin IO to allow self-testing
b) Buried Pin IO, now Pins are Quad IN and Pin polling is isolated from the physical pin.

That's not as flexible as direct COG-COG booleans, but it may come for almost free.
Most systems will have a few dedicated IO pins,

msrobots · 2015-12-29 08:14

On any project I run out of pins usually. So wasting a pin for internal signaling is - say - uncool.

But @kwinns idea is nice, if I understand it right. We could haw 64 flags writable and readable over the smart pin interface without hub timing.

quite interesting idea.

Mike

kwinn · 2015-12-29 15:59

msrobots wrote: »

On any project I run out of pins usually. So wasting a pin for internal signaling is - say - uncool.

But @kwinns idea is nice, if I understand it right. We could haw 64 flags writable and readable over the smart pin interface without hub timing.

quite interesting idea.

Mike

You got the gist of it, and it was pretty vague to start with since I have only the vaguest idea of how the smart pins will work. I figured if Chip liked the idea he knows and would make the best use of it.

Cluso99 · 2015-12-29 19:06

I am hoping that with 64 pins and 512KB hub that I can spare a few pins when required to signal between cogs.

cgracey · 2015-12-29 19:34

Cluso99 wrote: »

I am hoping that with 64 pins and 512KB hub that I can spare a few pins when required to signal between cogs.

If a single bit is sufficient, we could come up with something pretty simple before calling it done.

Baggers · 2015-12-29 21:20

A single bit is sufficient, but not for my first use, I don't mind it being a single bit or whatever you have in mind.

On an off chance, could a single long be used, any cog can write to or read from it, each cog's values are or'd when it's read, so you can either use 2 bits per cog or all 32bits for one cog controlling the others?

jmg · 2015-12-29 22:19

cgracey wrote: »

If a single bit is sufficient, we could come up with something pretty simple before calling it done.

A single bit covers the timing-sync-up cases, and avoids having to consume pins for signaling, which just looks inelegant.

Baggers wrote: »

A single bit is sufficient, but not for my first use, I don't mind it being a single bit or whatever you have in mind.

On an off chance, could a single long be used, any cog can write to or read from it, each cog's values are or'd when it's read, so you can either use 2 bits per cog or all 32bits for one cog controlling the others?

There are only 16 COGS, so maybe a LONG can map/fold upper 16b for write and lower 16b for read.
If the read is a All-COG-OR, then you can run a last-slave-ready loop very easily.
In HW that is 16 x 16 FF, each COG can Write to 16, and a read is the OR across all 16 COGS.
Default is non-signaling state.

Allocation mapping here in entirely in SW, so this could (eg) split to some signaling, and one byte of data/state info.

Or you map 2 bits per COG, and do the same mode, which needs 32 x 16 FF - more data is possible, but you have doubled the Logic needed.

Ideally, HW is pipelines so a master Flag-WRITE exits, at the same SysCLK, as all slaves Flag-WAIT.

Cluso99 · 2015-12-30 01:17

Chip,
Do you mean a single bit per cog? If so, that is 16 bits total.

Would a 32 bit be too much?

What I think would be nice is each cog has a 32 bit write register. The outputs of these 32 bits are open collector such that a "0" bit pulls down the bit line (like to old micros did with interrupts). Each cog can read these 32 output bits without delay (ie in parallel in the same clock if necessary).

If the reverse, such as wire "OR" works better, then that is fine too. It would just be nice to have a single read and write 32bits. Software can determine the usage, either as 0..32 bits for a cog.

I presume the real cost of this is the 32bit bus between every cog, not the number of ff's. If it has to be less bits, then that's fine.

Rayman · 2015-12-30 01:33

Wonder if one could add two extra bits to Port A&B that map into carry and zero flags somehow...

Not as nice as a full port C, but would let cogs fast share a couple bits...

Cog to Cog comms?

Comments