That's it. Each cog will be able to r/w the next cog's LUT via its second port that the streamer uses. The other cog will have priority over the local cog's streamer.
Sounds cool. Does this place & route ok and meet timing ok ?
The other cog will have priority over the local cog's streamer.
Is that the right way round ?
Wouldn't most apps need unbroken streaming, and anything streaming at SysCLK/2 or slower, will have spare time slots that the other COG could easily wait for ?
That's it. Each cog will be able to r/w the next cog's LUT via its second port that the streamer uses..
So that is laid out as half-looks-left, half-looks-right, as each COG has two 'next COGs, and all COGs are equal ?
This would one COGS to tightly couple to 2 others & maybe one COG can feed 3 streamers ?
The streamer can output pixel-type data to DACs/pins that it looks up from the LUT. The streamer can also use the egg-beater hub access, but it's not the egg-beater. Some talk on here has shown confusion between the egg-beater and the streamer. They are separate things.
The other cog will have priority over the local cog's streamer.
Is that the right way round ?
Wouldn't most apps need unbroken streaming, and anything streaming at SysCLK/2 or slower, will have spare time slots that the other COG could easily wait for ?
That's it. Each cog will be able to r/w the next cog's LUT via its second port that the streamer uses..
So that is laid out as half-looks-left, half-looks-right, as each COG has two 'next COGs, and all COGs are equal ?
This would one COGS to tightly couple to 2 others & maybe one COG can feed 3 streamers ?
I figure that a cog accessing a LUT needs priority over the read-only streamer which outputs to DACs and pins. Really, you should never have a conflict, in the same sense that a pin shouldn't be controlled from multiple cogs. You wouldn't program that way. In case there is a conflict between the other cog and the local cog's streamer, the other cog wins.
You do, in effect, get access to the lower AND upper cog. Your LUT is accessible by the lower cog and the upper cog's LUT is accessible by you. This doesn't take much hardware, at all. Right now, we are using 111,758 ALMs out of 113,560. We've got about 112 ALM's for each cog left. I'm already using the 'area-aggressive' setting in the fitter. This is it!
I figure that a cog accessing a LUT needs priority over the read-only streamer which outputs to DACs and pins. Really, you should never have a conflict, in the same sense that a pin shouldn't be controlled from multiple cogs. You wouldn't program that way. In case there is a conflict between the other cog and the local cog's streamer, the other cog wins.
I see a full speed streamer as having to work in bursts, if used with any second COG, and usually some co-operation is needed, but it seems safer if the streamer wins, and the COG waits for the next slot, if it has to.
Doing the reverse will affect pixel playback ?
You do, in effect, get access to the lower AND upper cog. Your LUT is accessible by the lower cog and the upper cog's LUT is accessible by you. This doesn't take much hardware, at all.
I'm unclear - is this paired only access, or can N see N+!, (and also N-1) ?
Does Lower Cog mean N-1, or Even COG only ?
You do, in effect, get access to the lower AND upper cog. Your LUT is accessible by the lower cog and the upper cog's LUT is accessible by you. This doesn't take much hardware, at all. Right now, we are using 111,758 ALMs out of 113,560. We've got about 112 ALM's for each cog left. I'm already using the 'area-aggressive' setting in the fitter. This is it!
If I'm understanding correctly:
There are two additional instructions (e.g. RDAUX/WRAUX)
Cog 1 can WRAUX to Cog 2's LUT, then Cog 2 can RDLUT its own LUT
Cog 2 can WRLUT to its own LUT, then Cog 1 can RDAUX Cog 2's LUT
Cog 1 does not know when Cog 2 has written or read Cog 2's LUT
Cog 2 does not know when Cog 1 has written or read Cog 2's LUT
I get the desire to share 512 registers between cogs, but without efficient handshaking (i.e. read/write events), it seems to me that this approach will be no better than using HUB ram and the existing hub read/write events.
I advocate for the simplicity of two one-way 32-bit (33-bit!) registers that are accompanied by events. The events are critical, though. Without the events, I think that any inter-cog conduit will be underutilized.
Cog exec is a possibility. That didn't occur to me, either.
That was my next question
Seems this could also allow some seriously tricky 'self modifying code', where 'self' is not quite you, but your evil twin...
Looks like this would allow a 2nd COG as a great numeric and/or crypto co-processor, for precisions outside native support.
... The events are critical, though. Without the events, I think that any inter-cog conduit will be underutilized.
Handshakes in Dual Port memory are usually done with an agreed sempahore pair, using RAM ?
eg Write a block, then update RAM Flag says Block Ready, and other side polls that Ready, and sets a Block Read Done after it has got all the data, repeat...
...This doesn't take much hardware, at all. Right now, we are using 111,758 ALMs out of 113,560. We've got about 112 ALM's for each cog left. I'm already using the 'area-aggressive' setting in the fitter. This is it!
Hmm, 1.586% of spare space ? - fingers crossed about any bug-fixes or mode clean-ups...
... The events are critical, though. Without the events, I think that any inter-cog conduit will be underutilized.
Handshakes in Dual Port memory are usually done with an agreed sempahore pair, using RAM ?
eg Write a block, then update RAM Flag says Block Ready, and other side polls that Ready, and sets a Block Read Done after it has got all the data, repeat...
I agree. But I certainly hope you're not suggesting that each cog should go into a busy loop to wait for that flag!
There are two additional instructions (e.g. RDAUX/WRAUX)
Cog 1 can WRAUX to Cog 2's LUT, then Cog 2 can RDLUT its own LUT
Cog 2 can WRLUT to its own LUT, then Cog 1 can RDAUX Cog 2's LUT
Cog 1 does not know when Cog 2 has written or read Cog 2's LUT
Cog 2 does not know when Cog 1 has written or read Cog 2's LUT
I get the desire to share 512 registers between cogs, but without efficient handshaking (i.e. read/write events), it seems to me that this approach will be no better than using HUB ram and the existing hub read/write events.
I advocate for the simplicity of two one-way 32-bit (33-bit!) registers that are accompanied by events. The events are critical, though. Without the events, I think that any inter-cog conduit will be underutilized.
The hub always seems like a complicated place. I know it isn't... but that is the way it feels. I know exactly when something is happening in a cog, but when it comes to hub access, I am never really sure about the "when." Having this kind of communication and signaling between cogs would remove these kinds of uncertainties. Strange as it sounds... these bits simplify issues of determinacy in a very elegant way.
I mean events. Whether you use WAITxxx or an interrupt to react is up to you.
"Events are tracked and can be polled, waited for, and used directly as interrupt sources."
ok, there may be room to add an event flag, to when 'other-cog access' occurs ?
Should that be across the whole memory map ( in which case it may trigger early, in block moves )
or act only across part of the memory map, or triggered by one location only ?
We need to have an event, for sure. Maybe two events are needed. Maybe four? What should they be? We are not actually limited to 16 events. We could add one bit and go up to 32.
We need to have an event, for sure. Maybe two events are needed. Maybe four? What should they be? We are not actually limited to 16 events. We could add one bit and go up to 32.
If you consider block writing and moves, and handshakes, and FIFO action, it can take a lead from that ?
With block writing, you do not want to trigger event too early, but it is nice to have it auto-trigger when full.
For handshakes, you have more like Ready and Ack/done.
tight REP loops that {WAIT/Write} and {WAIT/Read} could work, with the right event details ? - and they would save power.
Speaking of handshakes, does the Streamer have simple FIFO style handshakes, so an external device can tell it to pause ?
Comments
Sounds cool. Does this place & route ok and meet timing ok ?
At the end, once timing closes OK, can you take a look at shorter output pulses to match the streamer clock when streaming above SysClk/2?
Thanks for all your efforts Chip. Really looking forward to it.
Is that the right way round ?
Wouldn't most apps need unbroken streaming, and anything streaming at SysCLK/2 or slower, will have spare time slots that the other COG could easily wait for ?
So that is laid out as half-looks-left, half-looks-right, as each COG has two 'next COGs, and all COGs are equal ?
This would one COGS to tightly couple to 2 others & maybe one COG can feed 3 streamers ?
I figure that a cog accessing a LUT needs priority over the read-only streamer which outputs to DACs and pins. Really, you should never have a conflict, in the same sense that a pin shouldn't be controlled from multiple cogs. You wouldn't program that way. In case there is a conflict between the other cog and the local cog's streamer, the other cog wins.
I see a full speed streamer as having to work in bursts, if used with any second COG, and usually some co-operation is needed, but it seems safer if the streamer wins, and the COG waits for the next slot, if it has to.
Doing the reverse will affect pixel playback ?
I'm unclear - is this paired only access, or can N see N+!, (and also N-1) ?
Does Lower Cog mean N-1, or Even COG only ?
Do like the LUT direct connection idea though...
Could it be bidirectional between cog pairs instead of one way around?
It is bidirectional:
How do you address the other cog's LUT? Can you do LUTEXEC out of it?
Oh, do you think Chip means one-direction / one way looking only ? ->
Now what I was expecting, but maybe he does mean that ?
I never thought about making it wait, in case the other cog's streamer was using it. Good idea!
Cog exec is a possibility. That didn't occur to me, either.
One thing I need to add is a SETQ3, to enable RD/WRLONG-repeat.
If I'm understanding correctly:
I get the desire to share 512 registers between cogs, but without efficient handshaking (i.e. read/write events), it seems to me that this approach will be no better than using HUB ram and the existing hub read/write events.
I advocate for the simplicity of two one-way 32-bit (33-bit!) registers that are accompanied by events. The events are critical, though. Without the events, I think that any inter-cog conduit will be underutilized.
If it is easy to do, that would be more generally useful.
That was my next question
Seems this could also allow some seriously tricky 'self modifying code', where 'self' is not quite you, but your evil twin...
Looks like this would allow a 2nd COG as a great numeric and/or crypto co-processor, for precisions outside native support.
Handshakes in Dual Port memory are usually done with an agreed sempahore pair, using RAM ?
eg Write a block, then update RAM Flag says Block Ready, and other side polls that Ready, and sets a Block Read Done after it has got all the data, repeat...
Hmm, 1.586% of spare space ? - fingers crossed about any bug-fixes or mode clean-ups...
I agree. But I certainly hope you're not suggesting that each cog should go into a busy loop to wait for that flag!
I think that is a 'maybe' - needs someone to craft a ROM-Ready USB loader small enough, and it can tack on the end.
Any system has to wait for data - I'm unclear what you mean by 'events' ?
Do you mean an interrupt is triggered ?
The hub always seems like a complicated place. I know it isn't... but that is the way it feels. I know exactly when something is happening in a cog, but when it comes to hub access, I am never really sure about the "when." Having this kind of communication and signaling between cogs would remove these kinds of uncertainties. Strange as it sounds... these bits simplify issues of determinacy in a very elegant way.
I mean events. Whether you use WAITxxx or an interrupt to react is up to you.
"Events are tracked and can be polled, waited for, and used directly as interrupt sources."
ok, there may be room to add an event flag, to when 'other-cog access' occurs ?
Should that be across the whole memory map ( in which case it may trigger early, in block moves )
or act only across part of the memory map, or triggered by one location only ?
DAA = Dead After All.
DAA = Do Anything At-all
They handle it differently for negative numbers and the documentation did not seem to be clear on that.
DAA = Don't Alter Anything ?
If you consider block writing and moves, and handshakes, and FIFO action, it can take a lead from that ?
With block writing, you do not want to trigger event too early, but it is nice to have it auto-trigger when full.
For handshakes, you have more like Ready and Ack/done.
tight REP loops that {WAIT/Write} and {WAIT/Read} could work, with the right event details ? - and they would save power.
Speaking of handshakes, does the Streamer have simple FIFO style handshakes, so an external device can tell it to pause ?