The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

Cluso99 · 2015-07-22 00:19

I am not in favour of extending the Lock functions as IMHO they have rarely been used on the P1. In fact, I have never required their use, not even with ZiCog, my prop OS, Catalina C, or anything else. I have however used mailboxes extensively to communicate between drivers.

I've never used them, either.
It's cheap to make the edge detector sensitive to them, though. They could at least be used to simultaneously interrupt or release multiple cogs, even if they were only controlled by one cog.

If it's cheap and simple, then add them for sure.
This might mean the locks would get more use.

Cluso99 · 2015-07-22 00:44

The hub r/w interrupt mechanism makes for a fast simple mailbox mechanism, particularly if the read interrupt is changed as I suggested.

An object can be written without knowing the other cog(s) it is working with by setting a cog register(s) with the hub address(es) when the object is started. This is similar to how we pass PTR.

John Abshier · 2015-07-22 01:01

I have a sick feeling that I am jumping back to Jan-Feb 2014. From the number of posts, interrupts are complicated. Time continues. It looks like I don't need to save any money for Christmas this year either.

John Abshier

Seairth · 2015-07-22 01:40

I am not in favour of extending the Lock functions as IMHO they have rarely been used on the P1. In fact, I have never required their use, not even with ZiCog, my prop OS, Catalina C, or anything else. I have however used mailboxes extensively to communicate between drivers.

I've never used them, either.

I posit that the primary reason they were little used was the lack of a LOCKGET.
Frankly, you could get rid of the LOCKNEW and LOCKRET. Even though they provide a one-instruction semaphore, you must use the entire LOCK register as a semaphore. No ability to have both semaphores and mutexes at the same time. Besides, it is possible to implement a semaphore with LOCKSET/RDBYTE/WRBYTE/LOCKCLR. Sure, it's slower. But can be implemented as needed. And without losing mutexes. And if you need more than one semaphore, you can do that too.

Seairth · 2015-07-22 02:34

About extending SETEDGE to incorporate hub LOCK bits:
SETEDGE D/# %EE_L_PPPPPP

%EE: 00 = any edge %EE: 01 = pos edge %EE: 10 = neg edge %EE: 11 = any edge
%L: 0 = PPPPPP is pin number 1 = xxPPPP is lock number
%PPPPPP: pin number
That will give us a nice, single signal that can be used as an interrupt and captured into a flop for polling.

Interesting! Out of curiosity, what would it take to extend LOCK to 64 bits?

Not much in the hub, but it would mean that 64 more bits would have to fan out to each cog. There could be a 128-to-64 mux in the cog's edge detector to select between hub locks and I/O pins. It's not that expensive.
Considering that in order to write to a LOCK bit, you must use LOCKSET/LOCKCLR, would all these bits find use? It seems like a lot of LOCK bits, to me.

Huh. I would have thought that the lock ID would have been on the existing a_bus and lock/clear bits would have been on the existing d_bus. Is this fanout due to the addition of the locks to the EDGE interrupt? That mux sounds expensive.
I don't know if 64 are necessary. I just saw that there were two unused bits in SETEDGE. If you read my prior post (to potatohead), you can see why I think that 16 will be too few. Maybe 32 would be an acceptable compromise. But no more than 64, I think.

I remember that 16 LOCKs in the hub are a limit, since there are only 16 hub time slots. Each time slot gives a cog access to the LOCKs. To have more LOCKs would present problems with atomic operations, it seems to me.

Gotcha. I had assumed the LOCK implementation was basically the same as in P1V.
Maybe if you got rid of LOCKNEW, LOCKRET, what is left might make for a much simpler LOCK implementation.
For now, though, I say just move forward with what you have. Once we get an FPGA image, we can see whether the LOCKGET and EDGE functionality really does make LOCKs usable . Then we can figure out if there's a reasonable way to increase the number of bits.

potatohead · 2015-07-22 02:38

Or we find the bits are sufficient as the mailbox address scheme fills any gaps and is familiar ground.

rod1963 · 2015-07-22 02:39

You're not the only one.
I have a sick feeling that I am jumping back to Jan-Feb 2014. From the number of posts, interrupts are complicated. Time continues. It looks like I don't need to save any money for Christmas this year either.

John Abshier

potatohead · 2015-07-22 02:43

Truth is guys, lots remains to be done.

The Smart Pins are not designed, and there remains synthesis, test, layout checks, etc...

This exercise isn't adding a significant amount of time and does not appear to impact this year. The redesign and working with the chip layout guys took a long time.

It may be easy to blame this cycle, but I don't see that as the truth.

And remember a lot of this is novel. Some iteration is going to have to happen.

Chip worked until some input was needed. That is what is going on.

David Betz · 2015-07-22 03:01

Truth is guys, lots remains to be done.

The Smart Pins are not designed, and there remains synthesis, test, layout checks, etc...

This exercise isn't adding a significant amount of time and does not appear to impact this year. The redesign and working with the chip layout guys took a long time.

It may be easy to blame this cycle, but I don't see that as the truth.

And remember a lot of this is novel. Some iteration is going to have to happen.

Chip worked until some input was needed. That is what is going on.

I guess that means that additional requests would not be out of order...

potatohead · 2015-07-22 03:03

I sure am not asking. It is one thing when Chip initiates it. Necessary. Quite another to just toss stuff in.

That is how I plan on playing it.

jmg · 2015-07-22 03:12

I guess that means that additional requests would not be out of order...

Depends on their scope

If it relates to interrupts, or smart-pins, sure...

potatohead · 2015-07-22 03:24

Yeah, exactly. Think feature use cases as opposed to new features.

Cluso99 · 2015-07-22 03:52

Yeah, exactly. Think feature use cases as opposed to new features.

+1

A few things may arise when we get an FPGA release.

I was against interrupts, but what we have from a weeks work is extremely useful in certain areas, and answers the critics too.

Now it's time to move on and try this

Bill Henning · 2015-07-22 04:07

WOW!

I am away for a week and look what happens!

It will take me a couple of days to catch up on all the posts, but so far, I can say few things:

1) Great work Chip!

2) Interrupts on P2? Who would have thunk it! But a nice compromise since hardware tasks take a lot more logic

3) KEEP LOCKS

I could not have synchronized the five cogs and access to external memory on the P1 for my bitmapped high resolution drivers without locks.

With P2, I expect that we will have SDRAM to play with, and managing it properly will require locks.

16 locks is pretty good, however I'd prefer 32.

I'll have more comments as I go back and see where all the updates started.

Cluso99 · 2015-07-22 04:44

Yes Bill, it's been pretty full on the last week. Locks are not going and currently there are 16. It's been discussed whether there should be more, and if they should work differently and with interrupts.

Hopefully the new r/w hub interrupts are sufficient to leave the locks as they are.

Personally I would rather leave them (locks) alone for now and move on. We could discuss them in a separate new thread if necessary while we wait for FPGA code.

Tor · 2015-07-22 07:43

Now, a different issue. Suppose you want to pass more than 32 bits from cog A to cog B. Obviously, some of that data will be written to non-special hub memory. You can still use the special address to alert cog B that data is available (elsewhere). However, there is no way to know if or when cog B reads that other data. Cog A will only know that cog B has read its special address.
In this case, there are two different approaches to solving the issue. One solution is to have cog B write a value to cog A's address when it is done. Another approach is to use a LOCK bit. Cog A sets the LOCK before writing to cog B's address and cog B acknowledges reading all of the data by clearing the same LOCK bit. A bit more complicated, but still manageable.

A third approach: Use two buffers.
cog A writes the data to buffer 1 and alerts cog B. cog B reads the special loc and cog A knows it. cog A writes data to buffer 2 and alerts cog B. In the meantime cog B reads buffer 1. When it's done cob B reads the special loc and A knows that it's finished reading buffer 1 as it's obviously ready to read buffer 2, thus coc A can safely write to buffer 1. And so on.

Getting any cog to pass more than 32 bits to cog B: Must use locks.

BTW, maybe it was mentioned and I didn't see it - but there's also the use case where there is one writer and multiple readers.

Heater said: "It would be cool if such a signal could wake up an otherwise halted COG.
Think the low power state of a P1 COG hanging on a WAITxxx."

Yes. And inter-[cog|thread|etc] processing is tricky if you're missing some crucial mechanisms. Think of a buffer used for communication. It looks simple until you look closer.

Writing and reading to location in a ring buffer: Use mutex locks to r/w/ +update readpos/writepos (or in this case - the 'special loc' mechanism may be sufficient for managing the pos and count values)
Buffer is empty. Reader either wants to hang, or it wants to be able to check the state.
Buffer is full. Writer either wants to hang, or it wants to be able to check the state.
Buffer is no longer empty (because writer wrote to an empty buffer). Reader must be woken up if they are waiting. There could be more than one reader.
Buffer is no longer full (because a reader read something off it). Writer must be woken up if it is waiting. There could be more than one writer.
The one signalling that buffer has either gone from full to non-full or from empty to non-empty must be able to safely do so and continue, even if none were actually waiting at that point
If someone (say, a reader) was signalled (or woken up) when it wasn't actually waiting at that point must not suffer side effects if it checks the state of the buffer, finds it non-empty, reads it, tries again and find empty, and goes to wait: It must not then 'remember' the wake-up condition that was issued earlier.

It is suprisingly difficult to get this right when writing code for multiple threads in C, as soon as there is more than one writer or more than one reader. POSIX added some functions to support this, in the pthread_cond_*() function set. pthread_cond_signal, pthread_cond_broadcast, pthread_cond_wait, pthread_cond_timedwait.. if you try to do this with only locks, semaphores and mutexes you soon run into trouble. Either you get a race condition or you get stuck in a lock.

Heater. · 2015-07-22 07:58

Tor,
Threads/processes and shared state can of course get tricky. Turns out though that most data sharing is between a single writer and a single reader. That can be done without locks and such atomic operations. See the FIFOs in the good old Full Duplex Serial driver for example. Ages ago Chip asked if it would be OK to drop locks from the PII as he had never seen any code that used them. Very few people stepped forward with examples that really needed locks.
One thing I always though was missing from the PIis the ability to wait on a signal from some other COG. We can enter a very low power state whilst waiting on a pin, counter or video, but if our task has to wait on data from another COG it has to be awake and poll HUB RAM somehow.

Tor · 2015-07-22 08:05

With 16 cogs I'm sure more people will be interested in using multiple cogs to churn through data provided by some data generating cog.. or a reader processing data from multiple data-generating cogs (parallel sampling), although the first use case is probably easier to come up with applications for.

And yes, just go into wait and be woken up by another cog is super useful. Even without my full-blown ringbuffer considerations. I thought such a wakeup is now part of the P2 plan, but I've been reading through too many posts lately and I'm not sure what's in and what's not.

-Tor

Cluso99 · 2015-07-22 09:14

Tor,
With the new regime it is still just as easy as it was in P1 by polling/comparing the head/tail. No locks were/are necessary. Many of use use this method including FullDuplexSerial.

Many of us also use Mailboxes. These will work the same on P2

Now we also have some interrupts that will improve the polling/comparing the head/tail. Now the waiting cog can use interrupts by having the head/tail use the special hub locations. The waiting cog can perform other duties while waiting for an interrupt by the sending cog updating the head hub location, or it can simply wait using WAITINT while minimising power consumption. Again, no locks are necessary.

Tor · 2015-07-22 09:47

@Cluso99:

The Propeller hub architecture with slotted access to the hub implicitly gives you some atomicity, which is very useful. However, I'm also thinking about how POSIX functions may be implemented for propgcc, e.g. the pthread_cond_() functions. This for portability reasons of course.

-Tor

David Betz · 2015-07-22 11:22

I guess that means that additional requests would not be out of order...

Depends on their scope

If it relates to interrupts, or smart-pins, sure...

I'm not sure why those should be the limits. Interrupts weren't in the instruction set that Chip posted. They were someone forum person's suggestion. I don't see why we shouldn't start an entire new brainstorming session to see if we can come up with more good ideas. Now that we have interrupts, why not illegal address traps and a translation look aside buffer so we can do virtual memory?

Seairth · 2015-07-22 11:44

I guess that means that additional requests would not be out of order...

Depends on their scope

If it relates to interrupts, or smart-pins, sure...

I'm not sure why those should be the limits. Interrupts weren't in the instruction set that Chip posted. They were someone forum person's suggestion. I don't see why we shouldn't start an entire new brainstorming session to see if we can come up with more good ideas. Now that we have interrupts, why not illegal address traps and a translation look aside buffer so we can do virtual memory?

I can't tell if you're being facetious or not. But in case you aren't, I'll point out that these are not interrupts, in the traditional sense. Since the mechanism works by simply inserting a LINK instruction into the instruction stream, I don't see how traps could be implemented.

David Betz · 2015-07-22 11:56

I was being facetious. I don't see this process ever converging. It's not that I think any of the new features are bad. It's just that I'd like to see the design frozen at some point and open only to bug fixes rather than feature enhancements. I don't see us ever getting a silicon P2 otherwise. For that matter, I suspect this diversion to implement interrupts has also delayed the appearance of even an FPGA image. Let's leave any cool new ideas for P3.

Seairth · 2015-07-22 12:00

... I don't see how traps could be implemented.

To qualify that, I mean "using the current interrupt architecture."

Actually, I can think of a way that involves disabling writes to C/Z/register, flushing the pipeline immediately, and not updating PC (or maybe setting it to PC-1). Then the LINK will store the PC of the trapped instruction. Of course, in order for these traps to be useful, additional information would need to be captured, and that might get complicated.
(edit: whew! he was being facetious! ignore my ramblings before we end up down another rabbit hole...)

David Betz · 2015-07-22 12:01

... I don't see how traps could be implemented.

To qualify that, I mean "using the current interrupt architecture."

Actually, I can think of a way that involves disabling writes to C/Z/register, flushing the pipeline immediately, and not updating PC (or maybe setting it to PC-1). Then the LINK will store the PC of the trapped instruction. Of course, in order for these traps to be useful, additional information would need to be captured, and that might get complicated.

As mentioned in my last post, I was not serious.
Chip: Please do not consider adding illegal address traps or a TLB.

Tubular · 2015-07-22 12:02

I'm not sure why those should be the limits. Interrupts weren't in the instruction set that Chip posted. They were someone forum person's suggestion. I don't see why we shouldn't start an entire new brainstorming session to see if we can come up with more good ideas. Now that we have interrupts, why not illegal address traps and a translation look aside buffer so we can do virtual memory?

I think that "forum person" was Chip, it's his "flight of fancy"http://forums.parallax.com/discussion/comment/1336751/#Comment_1336751

We're being very restrained and holding our killer feature requests til P3. Form a queue

David Betz · 2015-07-22 13:04

I'm not sure why those should be the limits. Interrupts weren't in the instruction set that Chip posted. They were someone forum person's suggestion. I don't see why we shouldn't start an entire new brainstorming session to see if we can come up with more good ideas. Now that we have interrupts, why not illegal address traps and a translation look aside buffer so we can do virtual memory?

I think that "forum person" was Chip, it's his "flight of fancy"http://forums.parallax.com/discussion/comment/1336751/#Comment_1336751

We're being very restrained and holding our killer feature requests til P3. Form a queue

I could be wrong but I think there was a discussion of timer interrupts in this or one of the other P2 threads before Chip posted his instruction set. I was assuming that his "flight of fancy" was in response to that. Anyway, I agree that we should be restrained in suggesting new features. If P2 ships and is successful, we may not have as long to wait for a P3 as we had to wait for P2.

Dave Hein · 2015-07-22 13:05

Interrupts were discussed waaaay before Chip's flight of fancy. Chip was not the forum person that initiated the discussion on interrupts. I would have bet a substantial amount of money that Chip would never consider adding interrupts to the P2, but it appears that I would have lost that bet. It seems that he finally realized that there is some benefit to having interrupts. I'm glad to see the addition of interrupts.

Seairth · 2015-07-22 14:21

3) KEEP LOCKS

I could not have synchronized the five cogs and access to external memory on the P1 for my bitmapped high resolution drivers without locks.

With P2, I expect that we will have SDRAM to play with, and managing it properly will require locks.

16 locks is pretty good, however I'd prefer 32.

Out of curiosity, do you use LOCKNEW/LOCKRET, or just LOCKSET/LOCKCLR? Put another way, would you still be able to do what you want with:
LOCKSET (existing instruction, no change)LOCKCLR (existing instruction, no change)LOCKGET (new, writes current lock state of a single lock to C)LOCKTOG (new, toggles a lock and optionally writes old value to C)
With the interrupt mechanism, you would also have:
SETEDGE (to enable an interrupt on a lock transition)GETEDGE (to wait/block on a lock transition)

Dave Hein · 2015-07-22 14:37

Please keep LOCKNEW and LOCKRET.

The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

Comments