Is LUT sharing between adjacent cogs very important?

cgracey · 2016-05-05 23:14

Is LUT sharing between adjacent cogs very important?

I ask because we are out of room in the -A9 now.

While LUT sharing has some neat aspects, part of me likes the idea that the cogs are all, ideally, very independent, and things which might dictate use of an adjacent cog, as opposed to any cog, kind of undermines that. We ARE stuck, somewhat, already, because of the fast DAC channels to I/O pins.

I'm a little loathe to start stripping other things down to support this feature.

Putting in 32-bit latches is less painful, but not much, really. It doesn't take much more to support full LUT sharing.

Tubular · 2016-05-05 23:16

What about just sharing with the next cog, rather than both?

cgracey · 2016-05-05 23:19

Tubular wrote: »

What about just sharing with the next cog, rather than both?

This is a brain bender. I think we are already doing that, aren't we? If cog N can r/w cog N+1's LUT, cog N-1 can r/w cog N's LUT. It's either 'both' or 'neither', right?

Tubular · 2016-05-05 23:23

Err yes you're right!

Would write-only still be useful?

As somebody else mentioned, you need to allow some room for bug fixes, and you don't want it to be taking forever to compile and fail at the end.

cgracey · 2016-05-05 23:25

I've been going through the design looking for anything superfluous or inefficiently-implemented. Everything is about equally tight.

kwinn · 2016-05-05 23:28

While LUT sharing would be nice, it's better to use a 32 bit latch between cogs if that avoids tearing out what is already there.

jmg · 2016-05-05 23:30

cgracey wrote: »

Tubular wrote: »

What about just sharing with the next cog, rather than both?

This is a brain bender. I think we are already doing that, aren't we? If cog N can r/w cog N+1's LUT, cog N-1 can r/w cog N's LUT. It's either 'both' or 'neither', right?

That's sounding like a lot of muxes (and routes) ?

Why not physically place the LUT pool between two owner COGS and those COGS can share,
This gives a simple even-odd pairing ( which I thought was what was done ? )

jmg · 2016-05-05 23:31

cgracey wrote: »

I've been going through the design looking for anything superfluous or inefficiently-implemented. Everything is about equally tight.

How much does it shrink, if you reduce the size of the shared pool ?
(LUTs stay the same, just the muxes drop )

Rayman · 2016-05-05 23:33

Part of me is happy we're out of room. Perhaps we've hit a hard limit on new features...

Other part would like some kind of cog-cog signaling...

I guess going through a smartpin would be too slow.

cgracey · 2016-05-05 23:34

jmg wrote: »

cgracey wrote: »

I've been going through the design looking for anything superfluous or inefficiently-implemented. Everything is about equally tight.

How much does it shrink, if you reduce the size of the shared pool ?
(LUTs stay the same, just the muxes drop )

You mean share half the LUTs' address space? It looks like that would just save the logic for one address bit, since this is RAM, and not a bunch of flops.

Rayman · 2016-05-05 23:37

does limiting the cog-cog LUT access to just 4 specific LUT addresses make it smaller?

cgracey · 2016-05-05 23:37

Rayman wrote: »

Part of me is happy we're out of room. Perhaps we've hit a hard limit on new features...

Other part would like some kind of cog-cog signaling...

Yes, we need some signalling. What about each cog outputting an 'attention' strobe and each cog being able to select which one it listens to to drive an event/interrupt? That's really cheap.

Heater. · 2016-05-05 23:40

Not only is it not very important, I think its a brain dead idea.

Excuse me if I have missed a point here but historically all COGs were equal peers. I could start any random COG and have it run any random code. All was good.

When you say "sharing between adjacent cogs" I read that as meaning this is no longer true. I have to run the right code in the right COG to get what I want.

This sounds like a nightmare.

Skip it, burn the chip, ship it, I have money saved up to buy it.

cgracey · 2016-05-05 23:41

Rayman wrote: »

does limiting the cog-cog LUT access to just 4 specific LUT addresses make it smaller?

It only gets rid of 7 out of 9 address bits. That's nothing.

jmg · 2016-05-05 23:42

cgracey wrote: »

Yes, we need some signalling. What about each cog outputting an 'attention' strobe and each cog being able to select which one it listens to to drive an event/interrupt? That's really cheap.

So that is like back to virtual pins ?
They then have a shared, agreed HUB area for Data pools ?
With 16 COGS, you could manage 2 strobes per cog, in a 32b field ?

cgracey · 2016-05-05 23:43

Heater. wrote: »

Not only is it not very important, I think its a brain dead idea.

Excuse me if I have missed a point here but historically all COGs were equal peers. I could start any random COG and have it run any random code. All was good.

When you say "sharing between adjacent cogs" I read that as meaning this is no longer true. I have to run the right code in the right COG to get what I want.

This sounds like a nightmare.

Skip it, burn the chip, I have money saved up to buy it.

It's a feature that could make people suppose they are SUPPOSED to take advantage of it, simply because it exists, rather than use it for some very special application. I don't like stuff like that, either.

Heater. · 2016-05-05 23:44

Hope nobody missed my "Mr Angry" post above.

cgracey · 2016-05-05 23:46

jmg wrote: »

cgracey wrote: »

Yes, we need some signalling. What about each cog outputting an 'attention' strobe and each cog being able to select which one it listens to to drive an event/interrupt? That's really cheap.

So that is like back to virtual pins ?
They then have a shared, agreed HUB area for Data pools ?
With 16 COGS, you could manage 2 strobes per cog, in a 32b field ?

It's just like a "poke" on Facebook.

cgracey · 2016-05-05 23:49

Heater. wrote: »

Hope nobody missed my "Mr Angry" post above.

I saw it, Herra Vihainen Housut.

jmg · 2016-05-06 00:17

cgracey wrote: »

jmg wrote: »

cgracey wrote: »

Yes, we need some signalling. What about each cog outputting an 'attention' strobe and each cog being able to select which one it listens to to drive an event/interrupt? That's really cheap.

So that is like back to virtual pins ?
They then have a shared, agreed HUB area for Data pools ?
With 16 COGS, you could manage 2 strobes per cog, in a 32b field ?

It's just like a "poke" on Facebook. I hate to talk like that, but that's what it is. "Hey, Cog5! Cog11 poked you! Make of it what you will."

I think that is a very good idea, because with nothing at all between COGS (as now) , the only signaling they have is via HUB, (or waste a pin) and that means the flag also has HUB delays.

Sometimes the flag is all you need. - so all those cases, you get a lot faster,

If you need to send data packets too, fast flags help there, and you can use the fast FIFOs to get close to 'local' speeds ?

Rayman · 2016-05-06 00:35

So it could just be an event?

Maybe with cog per bit mask?

cgracey · 2016-05-06 00:44

Rayman wrote: »

So it could just be an event?

Maybe with cog per bit mask?

Right.

Rayman · 2016-05-06 00:45

Maybe the poke could have a bit per cog mask?

So, you could poke many cogs at same time...

Electrodude · 2016-05-06 00:47

cgracey wrote: »

Rayman wrote: »

Part of me is happy we're out of room. Perhaps we've hit a hard limit on new features...

Other part would like some kind of cog-cog signaling...

Yes, we need some signalling. What about each cog outputting an 'attention' strobe and each cog being able to select which one it listens to to drive an event/interrupt? That's really cheap.

Isn't there already an event on hub lock change?

What if you make it so any cog can instantly clear any hub lock (or any combination of hub locks, with Rayman's bit mask) without needing to wait for its turn, and make it so that every cog can constantly see the current status of every lock? This shouldn't cause any problems for normal lock usage, since the cog that claimed a lock should be the only one that clears it. Or, there could be a separate instruction to instantly force a lock off. Then, cog 0 can set a lock, cog 1 can wait for it to be cleared, and when cog 0 clears it (without waiting for its turn), cog 1 will instantly be notified (without waiting for its turn).

cgracey · 2016-05-06 00:49

Rayman wrote: »

Maybe the poke could have a bit per cog mask?

So, you could poke many cogs at same time...

It could be like that. You output 16 random bits and each high causes a one-clock strobe pulse to the related cog. Each cog receives all 16 bits OR'd from all cogs. This would be more complex than what I was thinking, but allows multiple simultaneous cogs to be alerted. So, each cog can alert up to all cogs, and receives a single alert, which is the OR of all the other cogs' outputs.

jmg · 2016-05-06 01:07

cgracey wrote: »

Rayman wrote: »

Maybe the poke could have a bit per cog mask?

So, you could poke many cogs at same time...

It could be like that. You output 16 random bits and each high causes a one-clock strobe pulse to the related cog. Each cog receives all 16 bits OR'd from all cogs. This would be more complex than what I was thinking, but allows multiple simultaneous cogs to be alerted. So, each cog can alert up to all cogs, and receives a single alert, which is the OR of all the other cogs' outputs.

Or that could be 32b, for 2 flags per COG ability ?
If it was grouped as 16+16, then another choice would be 16 flags + 16 party-line bits, which could be data ?
(Off line COGS have inactive states, and only one COG can use this simple party line at a time)

evanh · 2016-05-06 05:05

I note the LOCK bits are now usable for event signalling. They don't have Hub delays.

I imagine there could be more of them pretty easy.

Roy Eltham · 2016-05-06 05:50

What Heater said times infinity.

potatohead · 2016-05-06 05:51

I'm in favor of the COG signalling. I don't care about the cross LUT feature. It reeks of "its possible, because dual port, but not needed."

Extending the locks is a good idea too. Bill was clear on this feature for signaling.

JRetSapDoog · 2016-05-06 07:14

cgracey wrote: »

Rayman wrote: »

Maybe the poke could have a bit per cog mask?

So, you could poke many cogs at same time...

This ... allows multiple simultaneous cogs to be alerted. So, each cog can alert up to all cogs, and receives a single alert, which is the OR of all the other cogs' outputs.

Hope it's implemented that way (if feasible), for synchronizing multiple cogs for things.

jmg · 2016-05-06 07:50

JRetSapDoog wrote: »

cgracey wrote: »

This ... allows multiple simultaneous cogs to be alerted. So, each cog can alert up to all cogs, and receives a single alert, which is the OR of all the other cogs' outputs.

Hope it's implemented that way (if feasible), for synchronizing multiple cogs for things.

Yes, I agree, being able to sync many COGs is a natural P2 expectation.
You can sync many Smart Pins now, so users will expect to be able to do the same across N COGs.

Is LUT sharing between adjacent cogs very important?

Comments