Is LUT sharing between adjacent cogs very important?
cgracey
Posts: 14,155
in Propeller 2
Is LUT sharing between adjacent cogs very important?
I ask because we are out of room in the -A9 now.
While LUT sharing has some neat aspects, part of me likes the idea that the cogs are all, ideally, very independent, and things which might dictate use of an adjacent cog, as opposed to any cog, kind of undermines that. We ARE stuck, somewhat, already, because of the fast DAC channels to I/O pins.
I'm a little loathe to start stripping other things down to support this feature.
Putting in 32-bit latches is less painful, but not much, really. It doesn't take much more to support full LUT sharing.
I ask because we are out of room in the -A9 now.
While LUT sharing has some neat aspects, part of me likes the idea that the cogs are all, ideally, very independent, and things which might dictate use of an adjacent cog, as opposed to any cog, kind of undermines that. We ARE stuck, somewhat, already, because of the fast DAC channels to I/O pins.
I'm a little loathe to start stripping other things down to support this feature.
Putting in 32-bit latches is less painful, but not much, really. It doesn't take much more to support full LUT sharing.
Comments
This is a brain bender. I think we are already doing that, aren't we? If cog N can r/w cog N+1's LUT, cog N-1 can r/w cog N's LUT. It's either 'both' or 'neither', right?
Would write-only still be useful?
As somebody else mentioned, you need to allow some room for bug fixes, and you don't want it to be taking forever to compile and fail at the end.
That's sounding like a lot of muxes (and routes) ?
Why not physically place the LUT pool between two owner COGS and those COGS can share,
This gives a simple even-odd pairing ( which I thought was what was done ? )
How much does it shrink, if you reduce the size of the shared pool ?
(LUTs stay the same, just the muxes drop )
Other part would like some kind of cog-cog signaling...
I guess going through a smartpin would be too slow.
You mean share half the LUTs' address space? It looks like that would just save the logic for one address bit, since this is RAM, and not a bunch of flops.
Yes, we need some signalling. What about each cog outputting an 'attention' strobe and each cog being able to select which one it listens to to drive an event/interrupt? That's really cheap.
Excuse me if I have missed a point here but historically all COGs were equal peers. I could start any random COG and have it run any random code. All was good.
When you say "sharing between adjacent cogs" I read that as meaning this is no longer true. I have to run the right code in the right COG to get what I want.
This sounds like a nightmare.
Skip it, burn the chip, ship it, I have money saved up to buy it.
It only gets rid of 7 out of 9 address bits. That's nothing.
So that is like back to virtual pins ?
They then have a shared, agreed HUB area for Data pools ?
With 16 COGS, you could manage 2 strobes per cog, in a 32b field ?
It's a feature that could make people suppose they are SUPPOSED to take advantage of it, simply because it exists, rather than use it for some very special application. I don't like stuff like that, either.
It's just like a "poke" on Facebook.
I saw it, Herra Vihainen Housut.
I think that is a very good idea, because with nothing at all between COGS (as now) , the only signaling they have is via HUB, (or waste a pin) and that means the flag also has HUB delays.
Sometimes the flag is all you need. - so all those cases, you get a lot faster,
If you need to send data packets too, fast flags help there, and you can use the fast FIFOs to get close to 'local' speeds ?
Maybe with cog per bit mask?
Right.
So, you could poke many cogs at same time...
Isn't there already an event on hub lock change?
What if you make it so any cog can instantly clear any hub lock (or any combination of hub locks, with Rayman's bit mask) without needing to wait for its turn, and make it so that every cog can constantly see the current status of every lock? This shouldn't cause any problems for normal lock usage, since the cog that claimed a lock should be the only one that clears it. Or, there could be a separate instruction to instantly force a lock off. Then, cog 0 can set a lock, cog 1 can wait for it to be cleared, and when cog 0 clears it (without waiting for its turn), cog 1 will instantly be notified (without waiting for its turn).
It could be like that. You output 16 random bits and each high causes a one-clock strobe pulse to the related cog. Each cog receives all 16 bits OR'd from all cogs. This would be more complex than what I was thinking, but allows multiple simultaneous cogs to be alerted. So, each cog can alert up to all cogs, and receives a single alert, which is the OR of all the other cogs' outputs.
Or that could be 32b, for 2 flags per COG ability ?
If it was grouped as 16+16, then another choice would be 16 flags + 16 party-line bits, which could be data ?
(Off line COGS have inactive states, and only one COG can use this simple party line at a time)
I imagine there could be more of them pretty easy.
Extending the locks is a good idea too. Bill was clear on this feature for signaling.
Hope it's implemented that way (if feasible), for synchronizing multiple cogs for things.
You can sync many Smart Pins now, so users will expect to be able to do the same across N COGs.