There are the three instantly accessible long repository positions, crafted into each smart pin structure, but we can't do atomic read-modify-write operations into them.
For each Cog that is polling the specific pin, IN is raised to flag that a new value was loaded into any one of the repos, but I didn't saw any discriminator to flag wich long was written by the sender Cog.
Due to the fact that there are 16 Cogs and 3 x 64 = 192 repos, sure that can lead to a protocol where only one Cog does writes, says, for example, into the first long repository of the pin whose number coincides with its own number, or any other pre-predefined pin-number space (e. g.;Cog[0]=>Pins[3:0], ..., Cog[15]=>Pins[63:60]).
RQPIN can be used by any Cog, to enable some "quietly sniffing" flag/data gathering mode.
The pretend-to-be answering Cog, if any, could write it's own number-related bit position at, says, the second long, then, wait some minimum cycles, before doing a RQPIN, to check if it was the winner of the bid.
Case not, it can kill itself, out of frustration, by not being as fast as the winner was, or by don't receiving permissions, to bite a piece of the cake!
Having three repos into each pin, things like Request/Grant/Grant Aknowledge can be a breeze.
In extremis, locks could be used, to ensure that flags aren't lost by having two or more Cogs simultaneously trying to modify the repos contents, but I hope no one would feels itself needing to do a trip this far away.
There are some nice trigger/helper mechanisms spread along pins and into the hub, to the point I can't remember/mention them all, into a glance. :cool:
To don't appear to be so dense...; protocol details could vary as dreamed/intended by any ingenous programmer.
Henrique
Those three commands all write the same long. So, there's only one, not three longs. To make it work like you thought it worked would take a bunch more muxes.
Those three commands all write the same long. So, there's only one, not three longs. To make it work like you thought it worked would take a bunch more muxes.
I'm not sure using the pin-nodes as signaling is very portable, so adding more logic there may not be that important.
eg if two elements of a design decide pins XYZ are fine for signaling, unless the tools can clearly mark conflicts elsewhere, there is a risk a user makes a late change to swap a pin, and something (seemingly) unrelated breaks ...
I can see it would be nice to have some way to know who sourced the WAITATN.
All I can think of that is Atomic, is 16 bytes, that need to be read then tested... ?
Do you really need to do polling at all on P2? Can't one COG interrupt another? Why not just have the server COG just idle at low power waiting for an interrupt?
I've not been reading this topic but I'll point out that POLLing, WAITing and INTerrupts all are managed through "events". So, where there is any reference to one method then all three are available. Here's the Documented opening paragraph:
EVENTS
Cogs monitor and track 16 different background events:
- An interrupt occurred
- CT1 equaled CT (CT is the 32-bit free-running global counter)
- CT2 equaled CT
- CT3 equaled CT
- Selectable event 1 occurred
- Selectable event 2 occurred
- Selectable event 3 occurred
- Selectable event 4 occurred
- A pattern of interest occurred on either INA or INB
- Hub FIFO block-wrap occurred - a new start address and block count were loaded
- Streamer command buffer is empty - ready to accept a new command
- Streamer finished - ran out of commands, now idle
- Streamer NCO rollover occurred
- Streamer read location $1FF of the lookup RAM
- Attention was requested by another cog or other cogs
- GETQX/GETQY executed without any CORDIC results available
Events are tracked and can be polled, waited for, and used as interrupt sources.
Possible useful interCog events already catered for would be:
- The Cog specific Attention flag
- 1-of-16 Locks
- A Smartpin IN state
- The companion Cog LUT access
Those three commands all write the same long. So, there's only one, not three longs. To make it work like you thought it worked would take a bunch more muxes.
Ooops! My bad!
I was drove to the conclusion that there were three, because three consecutive %MMMM field values are described as being long repository(ies)(plural due my misleading interpretation of the docs!!!).
I can see it would be nice to have some way to know who sourced the WAITATN.
All I can think of that is Atomic, is 16 bytes, that need to be read then tested... ?
I can see it would be nice to have some way to know who sourced the WAITATN.
All I can think of that is Atomic, is 16 bytes, that need to be read then tested... ?
Ah yes, so many discussions go past, it is not easy to remember all the details...
Without such a caller-id feature, I think we are left with FIFO read as the next most time-efficient ? - but that is not avail on Hubexec ?
Seairth,
Reading that topic, I note your example of using the WRLUTx instruction was when Chip was attempting to make all Luts writeable from any Cog. Since this never worked out the only generic block data sharing is via HubRAM. The low latency option is now limited to the companion Cog only.
I feel this also reduces the need for a matching informative signalling, ie: It's good enough as is.
Chip outlined an easy enough low power method using ATN followed by a 16-clock-16-LONG block read to identify who wants something and there's a spare 31 bits per Cog for extra info too. And it works with HubExec to boot.
Chip outlined an easy enough low power method using ATN followed by a 16-clock-16-LONG block read to identify who wants something and there's a spare 31 bits per Cog for extra info too. And it works with HubExec to boot.
Yeah, I understand that approach, though it's still not that simple. First, you would have to use 16 bits per long to be able to tell which cog actually signaled you (instead of just signaled in general). Further, aside from reading the 16 longs, you would also have to scan each of those longs to find which cog was actually ATNing you. And then you still need a means to atomically clear that flag in the appropriate long(s) in hub memory. Combined, I don't see this as being easy, low power, or fast. Additionally, it's yet another instance of having to come up with a standard convention that everyone adheres to (It's mailboxes all over again).
Edit: Argh! The whole point of referencing the prior discussion was to point out that the matter is settled for Chip and that we can move on to other discussions. I did not mean to start discussing it all over again. As they say... move along, nothing to see here.
Chip outlined an easy enough low power method using ATN followed by a 16-clock-16-LONG block read to identify who wants something and there's a spare 31 bits per Cog for extra info too.
I think there is also a 16 byte version of this, that can read in 4 clocks, with a spare 7 bits per Cog for extra info too...
Chip outlined an easy enough low power method using ATN followed by a 16-clock-16-LONG block read to identify who wants something and there's a spare 31 bits per Cog for extra info too.
I think there is also a 16 byte version of this, that can read in 4 clocks, with a spare 7 bits per Cog for extra info too...
SETQ + RDLONG, I believe is the whole method. The SETQ prefixing turns the RDLONG into a special CISC like instruction that can fill or, with a WRLONG, copy an entire CogRAM in one hit.
I must test out adding a preceding AUGD to the SETQ ...
Comments
Those three commands all write the same long. So, there's only one, not three longs. To make it work like you thought it worked would take a bunch more muxes.
eg if two elements of a design decide pins XYZ are fine for signaling, unless the tools can clearly mark conflicts elsewhere, there is a risk a user makes a late change to swap a pin, and something (seemingly) unrelated breaks ...
I can see it would be nice to have some way to know who sourced the WAITATN.
All I can think of that is Atomic, is 16 bytes, that need to be read then tested... ?
- The Cog specific Attention flag
- 1-of-16 Locks
- A Smartpin IN state
- The companion Cog LUT access
Ooops! My bad!
I was drove to the conclusion that there were three, because three consecutive %MMMM field values are described as being long repository(ies)(plural due my misleading interpretation of the docs!!!).
Henrique
If you'll recall, we had this discussion over a year ago, to no avail.
Ah yes, so many discussions go past, it is not easy to remember all the details...
Without such a caller-id feature, I think we are left with FIFO read as the next most time-efficient ? - but that is not avail on Hubexec ?
Reading that topic, I note your example of using the WRLUTx instruction was when Chip was attempting to make all Luts writeable from any Cog. Since this never worked out the only generic block data sharing is via HubRAM. The low latency option is now limited to the companion Cog only.
I feel this also reduces the need for a matching informative signalling, ie: It's good enough as is.
Chip outlined an easy enough low power method using ATN followed by a 16-clock-16-LONG block read to identify who wants something and there's a spare 31 bits per Cog for extra info too. And it works with HubExec to boot.
Yeah, I understand that approach, though it's still not that simple. First, you would have to use 16 bits per long to be able to tell which cog actually signaled you (instead of just signaled in general). Further, aside from reading the 16 longs, you would also have to scan each of those longs to find which cog was actually ATNing you. And then you still need a means to atomically clear that flag in the appropriate long(s) in hub memory. Combined, I don't see this as being easy, low power, or fast. Additionally, it's yet another instance of having to come up with a standard convention that everyone adheres to (It's mailboxes all over again).
Edit: Argh! The whole point of referencing the prior discussion was to point out that the matter is settled for Chip and that we can move on to other discussions. I did not mean to start discussing it all over again. As they say... move along, nothing to see here.
Any links for that ?
The DOCs I'm reading say "FIFO IN USE" in the - Hubexec Cycles - column for the block read opcodes I can see ?
SETQ+RDLONG could be used to read in 4 longs, or 16 bytes, even from hub exec.
I must test out adding a preceding AUGD to the SETQ ...