The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

dnalor · 2015-07-21 19:44

The interrupt can "only" insert a link in a running cog, right? How difficult would it be, to make it possible to wake a sleeping (waitcnt) cog?This feature would make the "interrupt thing" complete.e.g.: A Uart-cog would poll for TX-data only each ms (not wasting energy in a fullspeed loop). Incoming data would wake it "immediately".

I think that is possible now. An INT can (and usually would) 'wake' an idle COG and it has choices on RET to exit back to the last PC, or it could change RET adr to +1, or it could modify the WAITCNT value.To keep RET lockout times low, designs can also do an early RETI to non critical 'INT' code, before a final back-to-user RET.I'm not sure you would want an INT to force an exit of WAITCNT on RETI ?
Note: A PinINT can wake on Pin Cell UART CHAR, it does not need to be timer polled.

With this new 'write to cog's special long'-interrupt, all I would want is a waitforint (Sleep until interrupt).-> Sleep until rx-data from outside or tx-data from an other cog in buffer.

Wow, hubexec, interrupt, breakpoint....and sixteen cores. After seven years watching and hoping ... I do not believe this. Now it seems, that dreams become true.

potatohead · 2015-07-21 19:51

I think the special hub addresses are clear and simple personally and they may well be more flexible than the locks.

Can what you are thinking be done with the feature as currently defined Searith?

If so, it is cheap and we'll fleshed out... Maybe we roll with it and see how we all get along in FPGA land.

cgracey · 2015-07-21 19:57

I think I see what you are proposing, but I'm not fully grasping it, yet.

I wrote up another explanation, but haven't posted it because I don't know if it would make what I had said prior any more clear. With the current round of changes you have proposed, I don't know whether to push the lock thing any further. I do think it would provide an overall better approach than "special registers". But I also want the P2 image to get finished enough to release for testing and play.
At the very least, I suggest adding a "LOCKGET D" that simply allows the current value of a lock to be determined. That way, locks can be used as a semaphore or as a set of mutexes and/or events. With LOCKGET, a cog can see if a flag is set (if the recipient of the event) or cleared (if the sender of the event). If I recall, the current implementation has 32 lock bits. With that, one can easily implement up to 16 message-passing "channels" using locks and hub memory. Of course, it will require polling, but at least it will be possible.

Ah, Duh! I thought you were talking about ILOCK, not the hub-based LOCK bits. Okay. Now I understand!
There are 16 hub LOCK bits right now, by the way.
So, GETLOCK and maybe interrupt on a particular LOCK's change, or arrival in a certain state. Maybe the edge setup could be altered to select hub LOCK bits, as well as pins? That would keep things tidy. And it would make LOCK changes poll-able and wait-able, too.

potatohead · 2015-07-21 20:06

Oh man. I thought it too, and it was confusing, and it seemed circular to me.

Speaking of images, will we get one prior to smart pins Chip?

cgracey · 2015-07-21 20:08

Oh man. I thought it too, and it was confusing.

Speaking of images, will we get one prior to smart pins?

Of course! Smart pins are not even designed, yet. It will probably take a week to get a basic version going. I think you guys are going to have some good input on what it/they shall be.

kwinn · 2015-07-21 20:09

I assume the red/write interrupts won't be triggered for a given cog if it reads/writes it's own trigger location.
Is this correct?
C.W.

Why not? Simpler to let it happen than to add circuitry to prevent it, and someone may come up with a use for it.

potatohead · 2015-07-21 20:09

Cool. Let the COG scrubbing commence!

potatohead · 2015-07-21 20:11

I assume the red/write interrupts won't be triggered for a given cog if it reads/writes it's own trigger location.
Is this correct?
C.W.

Why not? Simpler to let it happen than to add circuitry to prevent it, and someone may come up with a use for it.

One use might be testing, or for syscall kinds of things like int13h or BRK style.

cgracey · 2015-07-21 20:11

About extending SETEDGE to incorporate hub LOCK bits:
SETEDGE D/# %EE_L_PPPPPP

%EE: 00 = any edge %EE: 01 = pos edge %EE: 10 = neg edge %EE: 11 = any edge
%L: 0 = PPPPPP is pin number 1 = xxPPPP is lock number
%PPPPPP: pin number
That will give us a nice, single signal that can be used as an interrupt and captured into a flop for polling.

Seairth · 2015-07-21 20:27

About extending SETEDGE to incorporate hub LOCK bits:
SETEDGE D/# %EE_L_PPPPPP

%EE: 00 = any edge %EE: 01 = pos edge %EE: 10 = neg edge %EE: 11 = any edge
%L: 0 = PPPPPP is pin number 1 = xxPPPP is lock number
%PPPPPP: pin number
That will give us a nice, single signal that can be used as an interrupt and captured into a flop for polling.

Interesting! Out of curiosity, what would it take to extend LOCK to 64 bits?

jmg · 2015-07-21 20:30

Oh, and by having pin-edge events poll-able AND wait-able, we can get rid of WAITPX, WAITPR, and WAITPF, which involved their own 64-to-1 pin mux and were bumping critical-path. Implementation of these ideas may cause a net drop in LE's. Getting those events standing on their own, outside of the interrupt modes, makes a lot of good things possible.

Those WAITPx are 1 cycle granular, and very deterministic.Has the interrupt jitter improved, or has the wait simply moved about a little ?WAITPx was also a wide-opcode (could manage multiple bits), will the new path you are talking about also be able to manage that ?

cgracey · 2015-07-21 20:30

About extending SETEDGE to incorporate hub LOCK bits:
SETEDGE D/# %EE_L_PPPPPP

%EE: 00 = any edge %EE: 01 = pos edge %EE: 10 = neg edge %EE: 11 = any edge
%L: 0 = PPPPPP is pin number 1 = xxPPPP is lock number
%PPPPPP: pin number
That will give us a nice, single signal that can be used as an interrupt and captured into a flop for polling.

Interesting! Out of curiosity, what would it take to extend LOCK to 64 bits?

Not much in the hub, but it would mean that 64 more bits would have to fan out to each cog. There could be a 128-to-64 mux in the cog's edge detector to select between hub locks and I/O pins. It's not that expensive.
Considering that in order to write to a LOCK bit, you must use LOCKSET/LOCKCLR, would all these bits find use? It seems like a lot of LOCK bits, to me.

cgracey · 2015-07-21 20:34

Oh, and by having pin-edge events poll-able AND wait-able, we can get rid of WAITPX, WAITPR, and WAITPF, which involved their own 64-to-1 pin mux and were bumping critical-path. Implementation of these ideas may cause a net drop in LE's. Getting those events standing on their own, outside of the interrupt modes, makes a lot of good things possible.

Those WAITPx are 1 cycle granular, and very deterministic.Has the interrupt jitter improved, or has the wait simply moved about a little ?WAITPx was also a wide-opcode (could manage multiple bits), will the new path you are talking about also be able to manage that ?

Interrupts always have jitter, of course, but using GETEDGE to wait for a transition would have no jitter. It's true that to wait for an edge on another pin, you would have to change the pin via SETEDGE. Nice thing is, though, you can poll and detect edges that you would have missed while other code was executing.

jmg · 2015-07-21 20:34

This way, you can have one cog doing this:
WRLONG data,other_cogs_special_location 'write dataGETRDL 'wait for other cog to read it<loop>
While another cog does this:
GETWRL 'wait for other cog to write my cog's special locationRDLONG data,my_cogs_special_location 'read data<loop>

Can these loops also be REP controlled ?

cgracey · 2015-07-21 20:36

This way, you can have one cog doing this:
WRLONG data,other_cogs_special_location 'write dataGETRDL 'wait for other cog to read it<loop>
While another cog does this:
GETWRL 'wait for other cog to write my cog's special locationRDLONG data,my_cogs_special_location 'read data<loop>

Can these loops also be REP controlled ?

Yes.

jmg · 2015-07-21 20:39

Oh, and by having pin-edge events poll-able AND wait-able, we can get rid of WAITPX, WAITPR, and WAITPF, which involved their own 64-to-1 pin mux and were bumping critical-path. Implementation of these ideas may cause a net drop in LE's. Getting those events standing on their own, outside of the interrupt modes, makes a lot of good things possible.

Those WAITPx are 1 cycle granular, and very deterministic.Has the interrupt jitter improved, or has the wait simply moved about a little ?WAITPx was also a wide-opcode (could manage multiple bits), will the new path you are talking about also be able to manage that ?

Interrupts always have jitter, of course, but using GETEDGE to wait for a transition would have no jitter. It's true that to wait for an edge on another pin, you would have to change the pin via SETEDGE. Nice thing is, though, you can poll and detect edges that you would have missed while other code was executing.

So GETEDGE is just WAITPx with a moved pathway, outside the interrupt state engine ?Are you saying it is one-pin-at-a-time ?That seems to have lost features on WAITPx ?

jmg · 2015-07-21 20:41

Ah, Duh! I thought you were talking about ILOCK, not the hub-based LOCK bits. Okay. Now I understand!
There are 16 hub LOCK bits right now, by the way.
So, GETLOCK and maybe interrupt on a particular LOCK's change, or arrival in a certain state. Maybe the edge setup could be altered to select hub LOCK bits, as well as pins? That would keep things tidy. And it would make LOCK changes poll-able and wait-able, too.

Sounds like these new LOCK/Semaphores need a terminology change, as no one seems sure what is meant ?

jmg · 2015-07-21 20:43

WAITINT would be a nice addition.

How does WAITINT, differ from an interrupt during a (long) WAITCNT ?

Seairth · 2015-07-21 21:38

I think the special hub addresses are clear and simple personally and they may well be more flexible than the locks.

Can what you are thinking be done with the feature as currently defined Searith?

If so, it is cheap and we'll fleshed out... Maybe we roll with it and see how we all get along in FPGA land.

So, suppose the simplest use case: you want to pass a 32-bit value from cog A to cog B. As always, you write the value to hub memory. Using the new "special addresses", you can write the value in cog B's address and have cog B detect it with a write interrupt (or polling). Once cog B reads the value, cog A is now safe to write another value to cog B's address. Simple.
Now, suppose you want any cog to be able to pass a value to cog B. At that point, you have a problem with the special addresses. There's no exclusivity. Of course, this is easily solved by using a LOCK as a mutex. whichever cog acquires the lock first gets to write to cog B's address. Simple.
Now, a different issue. Suppose you want to pass more than 32 bits from cog A to cog B. Obviously, some of that data will be written to non-special hub memory. You can still use the special address to alert cog B that data is available (elsewhere). However, there is no way to know if or when cog B reads that other data. Cog A will only know that cog B has read its special address.
In this case, there are two different approaches to solving the issue. One solution is to have cog B write a value to cog A's address when it is done. Another approach is to use a LOCK bit. Cog A sets the LOCK before writing to cog B's address and cog B acknowledges reading all of the data by clearing the same LOCK bit. A bit more complicated, but still manageable.
Now, suppose you want any cog to pass more than 32 bits to cog B. At this point, you would have to use a LOCK again. In this case, the first cog to acquire the lock would be able to go through the routine just described.
If you want cog B to respond back to cog X, you will want cog X to clear the LOCK, not cog B. This would allow cog X to "return" a value (as in a function) to cog X by writing to a "return" address (not cog X's address, since cog B doesn't know which cog it's responding to). However, there is no way for cog X to know that the return value has been written. You could notify cog X by writing to its address (if cog B knows the address)/ Or you could use another LOCK. Either way, this is definitely not simple.

Comparatively, if LOCKxxx were more expressive in the ways I've been discussing, then the above scenarios look like this instead:
Pass a 32-bit value from cog A to cog B: Cog A writes the 32-bit data to the hub. Cog A sets LOCK_B. Cog B detects the lock (event) with an interrupt and reads the hub data. Cog B clears LOCK_B to indicate that it has read the data. Cog A detects that the lock has been cleared and can continue to write more data. Just as simple.
Allow any cog to pass 32-bit value to cog B: In this case, two locks are used; one as a mutex and one as an event. Cog X sets mutex_b (think "critical section" or "transaction"). Cog X writes to the 32-bit data to hub memory. Cog X sets event_b. As before, cog B detects the event and reads the data. Cog B then clears event_b. Cog X detects that event_b is cleared, then clears mutex_b. A bit more complicated, but still manageable. And not much different that the prior paragraph.
Allow any cog to pass more than 32 bits to cog B. Well, this is exactly the same as passing a 32-bit value. No more work required.
Allow cog B to return a value to cog X: Again, basically the same as before, except for two things: cog B writes data to a "return" address prior to clearing event_b, and cog A reads the return data before clearing mutex_b. Still, pretty straight forward.

In both cases, you end up needing to use LOCKs for anything except the most trivial message passing. However, once you start doing anything complicated (multiple senders, large data, return values, etc.), I believe that a LOCK-only approach is simpler and more flexible.
Incidentally, this 2-bit LOCK requirement is why I was suggesting a larger number of lock bits. Admittedly, 256 bits is probably excessive. With 64 bits, this would at allow up to 32 bit pairs. However, with the addition of LOCKGET et al, I suspect that locks will play a more prominent role than just this particular use case.
But, as you say, let it get wrapped up so we can try this stuff out ourselves.

Seairth · 2015-07-21 21:50

WAITINT would be a nice addition.

How does WAITINT, differ from an interrupt during a (long) WAITCNT ?

Presumably, WAITINT would "release" immediately so that the appropriate LINK can be inserted into the pipeline. Using this, your "main" task could simply be:
loop WAITINT JMP #loop
And it would only ever process events, as fast as they arrive.

Heater. · 2015-07-21 21:57

Seairth,

tl;dr

There is a difference between passing a 32 bit value, or any data, from one COG to another and just sending a signal that says "hey, there is new data available".

It would be cool if such a signal could wake up an otherwise halted COG. Think the low power state of a P1 COG hanging on a WAITxxx.

Seairth · 2015-07-21 22:01

About extending SETEDGE to incorporate hub LOCK bits:
SETEDGE D/# %EE_L_PPPPPP

%EE: 00 = any edge %EE: 01 = pos edge %EE: 10 = neg edge %EE: 11 = any edge
%L: 0 = PPPPPP is pin number 1 = xxPPPP is lock number
%PPPPPP: pin number
That will give us a nice, single signal that can be used as an interrupt and captured into a flop for polling.

Interesting! Out of curiosity, what would it take to extend LOCK to 64 bits?

Not much in the hub, but it would mean that 64 more bits would have to fan out to each cog. There could be a 128-to-64 mux in the cog's edge detector to select between hub locks and I/O pins. It's not that expensive.
Considering that in order to write to a LOCK bit, you must use LOCKSET/LOCKCLR, would all these bits find use? It seems like a lot of LOCK bits, to me.

Huh. I would have thought that the lock ID would have been on the existing a_bus and lock/clear bits would have been on the existing d_bus. Is this fanout due to the addition of the locks to the EDGE interrupt? That mux sounds expensive.
I don't know if 64 are necessary. I just saw that there were two unused bits in SETEDGE. If you read my prior post (to potatohead), you can see why I think that 16 will be too few. Maybe 32 would be an acceptable compromise. But no more than 64, I think.

Seairth · 2015-07-21 22:07

Seairth,

tl;dr

There is a difference between passing a 32 bit value, or any data, from one COG to another and just sending a signal that says "hey, there is new data available".

It would be cool if such a signal could wake up an otherwise halted COG. Think the low power state of a P1 COG hanging on a WAITxxx.

Agreed. And that is *exactly* what will be possible with GETEDGE configured for use with LOCKs.
As an aside, the thrust of my prior post was in response to potatohead's comment that the special addresses might be more versatile than the extended LOCK capabilities I was pushing. My contention, obviously, is that extending LOCKs will be more versatile. Your comment is just one more example of that versatility.
(I suppose you could take things one step further and have a LOCKTOG, which would simply toggle a lock bit. With that, and other cogs looking for "any" transition, you now have a pulsed event. Well, you have it without that too, but it means that the pulsing cog has to track the state of the LOCK.)

potatohead · 2015-07-21 22:19

Message passing between COGS is one area that got complex and expensive. It's Chip's call on logic cost, timing on critical path, etc...

We do get waitint, which means the low power case is met either way.

I find the locks obtuse, getlockstate would help a lot with that.

Example code would help compare these IMHO.

Cluso99 · 2015-07-21 22:25

It might be better if
WAITINT WC
could be used like other WAITxxx WC
where we use the WC flag for a CNT timeout.
This would allow say a wait for an interrupt in sleep mode with this code...
...setup interrupt mode...
WAITINT WC
IF_C JMP #TIMEOUT
...process interrupt...

This just makes the WAITINT perform like the others. The interrupt timer could still be used with WAITINT without WC.

Cluso99 · 2015-07-21 22:47

Re the Write and Read Hub low addresses...

The Write interrupt to the respective COGs HUB LONG makes sense. It will be triggered by a write long/word/byte to these 4 byte addresses.
However the Read works on a different long address to signal back to the writer that the long/word/byte has been read. Could the Read work differently???
Cog #5 writes to hub #0 which causes an interrupt to Cog#0. At the same time, the cog# that caused a write will be latched into a single internal register. There will be one of these registers per cog's hub long slot.
When Cog#0 reads hub#0, a read interrupt is triggered on the Cog# whose value has been stored in hub#0 internal register. Caveat that if more than one cog writes to the same hub slot with write interrupts, the read interrupt will only be generated to the last cog that wrote to this hub slot.

This would therefore solve the mailbox problem for the simple case of one to one cog communication.

Cluso99 · 2015-07-21 22:50

I am not in favour of extending the Lock functions as IMHO they have rarely been used on the P1. In fact, I have never required their use, not even with ZiCog, my prop OS, Catalina C, or anything else. I have however used mailboxes extensively to communicate between drivers.

jmg · 2015-07-21 22:53

How does WAITINT, differ from an interrupt during a (long) WAITCNT ?

Presumably, WAITINT would "release" immediately so that the appropriate LINK can be inserted into the pipeline. Using this, your "main" task could simply be:
loop WAITINT JMP #loop
And it would only ever process events, as fast as they arrive.

Yes, I can see that but WAITCNT should behave in a similar fashion ?

This does raise a question about the HW behind WAITCNT checking.
IIRC Chip may have mentioned it has changed from P1's EQUAL CNT to a Has-Passed test. (>=)

With interrupts firing, an equal time test is no longer guaranteed to occur in a following WAITCNT.

cgracey · 2015-07-21 23:04

About extending SETEDGE to incorporate hub LOCK bits:
SETEDGE D/# %EE_L_PPPPPP

%EE: 00 = any edge %EE: 01 = pos edge %EE: 10 = neg edge %EE: 11 = any edge
%L: 0 = PPPPPP is pin number 1 = xxPPPP is lock number
%PPPPPP: pin number
That will give us a nice, single signal that can be used as an interrupt and captured into a flop for polling.

Interesting! Out of curiosity, what would it take to extend LOCK to 64 bits?

Not much in the hub, but it would mean that 64 more bits would have to fan out to each cog. There could be a 128-to-64 mux in the cog's edge detector to select between hub locks and I/O pins. It's not that expensive.
Considering that in order to write to a LOCK bit, you must use LOCKSET/LOCKCLR, would all these bits find use? It seems like a lot of LOCK bits, to me.

Huh. I would have thought that the lock ID would have been on the existing a_bus and lock/clear bits would have been on the existing d_bus. Is this fanout due to the addition of the locks to the EDGE interrupt? That mux sounds expensive.
I don't know if 64 are necessary. I just saw that there were two unused bits in SETEDGE. If you read my prior post (to potatohead), you can see why I think that 16 will be too few. Maybe 32 would be an acceptable compromise. But no more than 64, I think.

I remember that 16 LOCKs in the hub are a limit, since there are only 16 hub time slots. Each time slot gives a cog access to the LOCKs. To have more LOCKs would present problems with atomic operations, it seems to me.

cgracey · 2015-07-22 00:07

I am not in favour of extending the Lock functions as IMHO they have rarely been used on the P1. In fact, I have never required their use, not even with ZiCog, my prop OS, Catalina C, or anything else. I have however used mailboxes extensively to communicate between drivers.

I've never used them, either.
It's cheap to make the edge detector sensitive to them, though. They could at least be used to simultaneously interrupt or release multiple cogs, even if they were only controlled by one cog.

The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

Comments