Propeller II update - BLOG

ctwardell · 2014-03-01 05:39

David Betz wrote: »

That would be fine if these "increments" didn't also change what is already there. They don't at a source level but they do at the binary encoding level. All I was asking is that a new instruction list be provided with each of these rearrangements. I guess it probably isn't worth the time it would take Chip to do it though since I may be the only consumer of these incremental lists. Most people will only need the instruction lists that correspond to actual FPGA releases.

I wonder if Chip would be willing to give you access to the source for PNut?

C.W.

ozpropdev · 2014-03-01 05:43

David Betz wrote: »

That would be fine if these "increments" didn't also change what is already there. They don't at a source level but they do at the binary encoding level. All I was asking is that a new instruction list be provided with each of these rearrangements. I guess it probably isn't worth the time it would take Chip to do it though since I may be the only consumer of these incremental lists. Most people will only need the instruction lists that correspond to actual FPGA releases.

I realize that Pnut updates don't help your situation.
Whenever there has been encoding changes Chip has documented them.
Cluso has produced very nicely formatted listings of these changes very quickly afterwards.

David Betz · 2014-03-01 06:10

ozpropdev wrote: »

I realize that Pnut updates don't help your situation.
Whenever there has been encoding changes Chip has documented them.
Cluso has produced very nicely formatted listings of these changes very quickly afterwards.

I guess that's why I'm waiting for a near-final release for FPGA that includes final instruction encodings.

cgracey · 2014-03-01 06:23

David Betz wrote: »

I guess that's why I'm waiting for a near-final release for FPGA that includes final instruction encodings.

In your case, it's probably best to wait. We're getting close, though.

cgracey · 2014-03-01 06:41

I've been mulling over the details of this context switcher all day and night.

As I was getting close to actually starting the implementation, I realized that there is going to be a lot of mux logic to grab 231 bits of state data from any of four the four hardware tasks. But, then, do we really need to grab ANY task's data? I see no performance benefit in running more than one threaded preemptive task, because threads must run for many cycles before they're switched, in order to get good economy. I don't see any value in having multiple concurrent switchable threads. If this is really the case, we only need to save and restore one task's setup, let's say task 3's. Task 3 can be the preemptive thread task and take all the CPU you want to give it. You can have the other three tasks (0,1,2) doing their own things, with one of them moonlighting as the scheduler for task 3. By having just one task function in this preemptive way, the requisite logic is reduced by ~75%. If someone sees any advantage to running more than one concurrent preemptive threaded task, please say so.

Bill Henning · 2014-03-01 07:06

Chip,

I don't see ANY need for more than one task in a cog being capable of running pre-emptive multiple-threaded.

It is also sufficient for debugging (to be able to debug one task per cog) until p3

Reducing the required logic by 75% is good!

cgracey wrote: »

I've been mulling over the details of this context switcher all day and night.

As I was getting close to actually starting the implementation, I realized that there is going to be a lot of mux logic to grab 231 bits of state data from any of four the four hardware tasks. But, then, do we really need to grab ANY task's data? I see no performance benefit in running more than one threaded preemptive task, because threads must run for many cycles before they're switched, in order to get good economy. I don't see any value in having multiple concurrent switchable threads. If this is really the case, we only need to save and restore one task's setup, let's say task 3's. Task 3 can be the preemptive thread task and take all the CPU you want to give it. You can have the other three tasks (0,1,2) doing their own things, with one of them moonlighting as the scheduler for task 3. By having just one task function in this preemptive way, the requisite logic is reduced by ~75%. If someone sees any advantage to running more than one concurrent preemptive threaded task, please say so.

potatohead · 2014-03-01 07:28

@user_name

You deal with big systems and system integration, and have a huge amount of experience of that sort. My perspective is that of a classic embedded control engineer: Build a box, draft a board, write a few thousand lines of code, and then turn it all over to manufacturing for replication.

Yes! And my perspective on embedded has gotten much clearer over the last few years. One statement: write a few thousand lines of code could be, write a few hundred, or a thousand lines of code. The reuse attribute of the Propeller is one of the very strong diferentiators. And that's one boundary / edge where I will express concern when we work up against it. Another is the robust nature of things, no interrupts drives a lot of that. Another edge...

Embedded has been strained over time with systems on board, and on chip, depending on where somebody is. Embedded can still mean a 6502 (no joke, it's being done) to a custom silicon system on chip with a bunch of IP globbed together, to an entire PC shrunk down, running whatever PCs can run, etc...

Frankly, the requirements map for many of these are on par with what I saw on bigger systems. One thing the P2 does here is eliminate the need for an OS on a nice chunk of those potential use cases. A whole component GONE! That's a big deal, and arguably something worth exploring like we are with tasks. The whole tasking, and in particular should there be preemptive ones is a very important discussion, and expressing some caution, as in "what is worth what?" is appropriate. I'm not saying don't do it, and I'm not saying do it either. I'm asking "should we?" and that one takes some time and thought and multiple perspectives to sort out at this point in the process.

Anyway, good discussion!

I've also a ton of experience aligning all sorts of technology to needs / requirements / process, and I am good at seeing disruptive things. P2 stands a chance at being disruptive given we can keep barriers to adoption low. Disruptive things happen on price (a fraction of the established norm) and they can happen on process (leaner than the established norm), and they can happen in terms of creating new expectations for end users of all kinds (think iPod redefining what a music player is, Apple in general) P2 is best aligned with the latter two cases. Gotta keep an eye on those so they are maximized.

As we go through the changes, I'm adopting them when I have time, and my set of experiences is a good indicator of relative complexity, so I share that. Not negative, just a data point. If I can jump in and go, it's pretty easy! If I struggle, it may still be easy, but it's worth looking at. Just an artifact of how I got here, and again a data point to share, that's all.

That was the "complexity" thread a while back. But, I'm also thinking of how we position this thing, where the real sweet spots are, who might they be disruptive for, how that might impact product development, etc... macro. And my occasional call to revisit macro is in the spirit of "what is worth what?" so that we are adding, not trading to avoid the "was a great sculpture, until too many brush strokes made it an ugly lump of clay" kind of thing mentioned above.

potatohead · 2014-03-01 07:36

@Chip

I concurr with Bill.

Re: Testing

Yes, we've not tested enough, nor have we really started to work on software. Early on we had some software activity, code was out there, people were figuring out stuff. The changes have been significant enough to stunt that.

Frankly, I would settle this last task bit, call tasks closed, and move to SERDES / USB whatever that is going to be, nail it cold so that we get the spiffy features mentioned by several here as important / relevant / competetive, then test the Smile out of it. Good debug routines can be built, code templates to get people started, demos of various things, gcc, libraries, etc... all gotta get cooking, and those will vet the features.

Remember, a simple COGNEW had a bug. No big deal. Now that we see it, Chip can fix it. We need to find these, or know they don't exist, much more than we need additional features / instructions, IMHO. And that's due to time, and because robust is one differentiator we are really going to want to have in the can, so to speak. New stuff that is powerful and that just works is worth a ton, and it does a lot to render price much less of an issue than it would be otherwise. We need all of that to be successful longer term, IMHO.

From there, we will have a much better understanding of what we got and what it's worth, and will find the bugs you all know we need to find. Adding something then will be harder, but if we do it, it will be a real add because we will know what is worth what to a degree that makes sense. This is not true right now, and what seems reasonable could not be, simply due to the large number of considerations.

Publison · 2014-03-01 07:42

C.W.

Did your avatar come down with the flu. It looks a little under the weather.

User Name · 2014-03-01 07:58

cgracey wrote: »

I don't see any value in having multiple concurrent switchable threads.

I don't either. My 'use case' needs just one. BTW, that use case is not a mere niche application for an embedded controller - it is something that embedded controllers do all the time. The magic of the latest proposal (preemptive multitasking) is that the function could be accomplished in one cog rather than three.

Because of the great flexibility of the Propeller concept, Propellers get targeted for applications well out of the normal scope of an embedded controller - to the point that those unusual use cases, willy-nilly, become the norm and thus advise further development. I don't get upset about it, but I do scratch my head about it from time-to-time.

potatohead · 2014-03-01 08:02

Me too, and I strongly suspect not needing an OS to address those cases, along with the powerful reuse / object / driver model the chip has is compelling enough to make people want to try it. The other reason I know is people have had enough time to reach great skill, and pushing the edge just happens as a natural artifact of that happening.

cgracey · 2014-03-01 08:18

Thanks for all of your input and feedback, Guys.

I'm going to sleep now.

Finally, this morning, things congealed on how to best implement the preemptive switcher. I've got it down to 191 bits that the hardware has to save on its own, while the rest can be done via software. This makes the switch code a little longer than before, but it doesn't burden the silicon nearly as much.

I know you've all got strong opinions about this tangent, but I'm really excited about this preemptive switching, because, like Username said, it allows you to run several large-model programs on one cog - programs that were written as if they would run alone in a cog. This frees up other cogs for real-time tasks. This is like a big release valve for code pressure. It's a way to make dynamic what was, before, static. It's like COGNEW for whole applications.

David Betz · 2014-03-01 08:29

cgracey wrote: »

In your case, it's probably best to wait. We're getting close, though.

Okay, I'll wait. To be clear, I don't care if new instructions are added. I'd mostly like to know that the encodings of existing instructions won't change dramatically.

David Betz · 2014-03-01 08:34

cgracey wrote: »

TI'm really excited about this preemptive switching, because, like Username said, it allows you to run several large-model programs on one cog - programs that were written as if they would run alone in a cog.

Yes, this does sound like a win. Much better than running multiple hubexec on different hardware tasks which would cause thrashing of the icache. Good observation!

cgracey · 2014-03-01 08:36

David Betz wrote: »

Okay, I'll wait. To be clear, I don't care if new instructions are added. I'd mostly like to know that the encodings of existing instructions won't change dramatically.

David, because of the way your compiled C code works, with R0..R15, etc., every program compiled could be optionally run in a preemptive switcher with minimal to no load on hub memory for state storage. Programs handled as threads could be picked up and run on any cog if their state storage was in hub memory and there was some kind of protocol for cogs to recognize and execute them. I could see all C code running in that paradigm, with special I/O functions coming from libraries written in PASM.

David Betz · 2014-03-01 08:45

cgracey wrote: »

David, because of the way your compiled C code works, with R0..R15, etc., every program compiled could be optionally run in a preemptive switcher with minimal to no load on hub memory for state storage. Programs handled as threads could be picked up and run on any cog if their state storage was in hub memory and there was some kind of protocol for cogs to recognize and execute them. I could see all C code running in that paradigm, with special I/O functions coming from libraries written in PASM.

Sounds excellent! Can't wait to try this.

Ahle2 · 2014-03-01 10:05

WOW!!
I have been following this avalanche that I started with great interest (and fear of what it might turn into). And I must say that this seems to be turning out the way I had envisioned it, on a very abstract level of course.
My vision was to do as much as possible in SW and only add the minimum HW needed to get it working.
Chips last realization of the benefits of preemtive multitasking for GCC and HubExec applications warms my heart. I do not longer feel like "the bad guy" that brought this up.

Buuut, still there is division amongst forum members regarding this and that makes me a little bit sad.

/Johannes

mindrobots · 2014-03-01 10:14

Don't worry, we'll all be friends again when we get new FPGA images to play with! New toys always make the kids play nice! :0)

potatohead · 2014-03-01 10:43

I wouldn't worry about any division. People are all just wanting to see success. Personally, I'm pleased with where it ended up as well.

mindrobots · 2014-03-01 10:47

Trust the sausage maker! !

User Name · 2014-03-01 10:50

I feel no enmity whatsoever towards anyone except Leon (JUST KIDDING!!!) and sincerely hope I haven't caused any. It goes without saying that I totally support and applaud potatohead's vigilance in fighting to retain the marvelous level of code reuseability that is one of the outstanding hallmarks of the Propeller.

cgracey · 2014-03-01 14:29

I hope pjv comes back. People really roasted him over his recent posting, where he showed thread data without the scheduler. It did look confusing, at first, but I think his point was that a scheduler could handle lots of those thread sets. Pjv, please just take it in stride

jmg · 2014-03-01 14:30

Bill Henning wrote: »

It is also sufficient for debugging (to be able to debug one task per cog) until p3

Reducing the required logic by 75% is good!

I would qualify and expand that a little.
Full vision debug on one task is ok, and if a user needs to choose that at compile time, even that is tolerable.

Debug-time choice would be smoother, and may be implicit, but I'm not sure how Chip decides which task gets the Swap resource.

Debug should still be able to see/trace the other tasks, and even single step them, but I think Read/Write of their PC+Flags would be enough to do that, and I think Chip mentioned an opcode to R.W PC+Flags.

If that other-task debug stepped-over the hard areas, (as I think Chip mentioned before), that is also ok.

Debug does not need a full swap ability on all 4 tasks, especially if doing that does save 75% of the swap-logic

ctwardell · 2014-03-01 14:39

Publison wrote: »

C.W.

Did your avatar come down with the flu. It looks a little under the weather.

He's fine, just a few minor burns from being tarred and feathered.

C.W.

jmg · 2014-03-01 14:57

cgracey wrote: »

To really orient debugging properly, it needs to be done from another cog. That cog has to have the ability to view into the target cog, step it, etc. Shy of that, we have a rather impure circumstance where the target cog must do debugging on itself, not allowing itself to be wholly what it would have been without the internal accommodations for the debug stuff. If I had time, I would certainly pursue this, but I do feel that is a bit much to jump on right now. So, what we have is adequate for grease monkeys like ourselves, but it's not shrink-wrapped like customers may expect it to be.

Across-COG is the least-user invasive, but as you say, has a very high logic cost.

I think with Task Mapping, + Swap, you can get very tolerably low invasion, and one simple means to push that invasion overhead down even further, would be to support a lower Task Map than 1:16.

With 1:16, you need 6.25% of time to spin the background task, if a (skip?) Yield counter of some few bits was allowed per task then that 6.25% can drop quickly.

4 bits of skip would be 0.39% of overhead, and 8 bits of skip is 0.0244%
Upper levels of Task flip logic are unchanged, the stub counter just allows a skip-mostly. Skip could choose a task, or it could just carry-over the last task.

potatohead · 2014-03-01 15:14

What kinds of things would be debugged using only that fraction of time? In other words, what makes that time necessary as opposed to nice to have?

rogloh · 2014-03-01 16:02

Ahle2 wrote: »

WOW!!
I have been following this avalanche that I started with great interest (and fear of what it might turn into). And I must say that this seems to be turning out the way I had envisioned it, on a very abstract level of course.
My vision was to do as much as possible in SW and only add the minimum HW needed to get it working.
Chips last realization of the benefits of preemtive multitasking for GCC and HubExec applications warms my heart. I do not longer feel like "the bad guy" that brought this up.
Buuut, still there is division amongst forum members regarding this and that makes me a little bit sad.

/Johannes

I would have to agree 100% with you Johannes as I've also been pushing a bit for this feature as well lately. No one wants to burden Chip with any extra hardware work that potentially delays the release and we want to do as much in software as we can. All we really needed was the ability to start/stop a task arbitrarily at any time and read/write its state, which was not completely possible in software before. That capability alone opens up so much now for P2 in terms of debug and multithreading. Even though it works (slowly) I don't want to use LEDs and other rudimentary techniques for debugging PASM anymore and I love the idea of being able to rapidly port third party C code requiring an underlying tasking model. Imposing co-operative multithreading only or hard limiting to 32 simultaneous hardware threads max could either require restructuring it completely to suit the P2 way or going and analysing every part of it in depth and adding yields everywhere and needing to repeat that process each time you update the third party code changes. I've done that type of thing before in other systems when third party drivers change etc and it sucks. You finally get everything right and then have to update. Start again.

I would add there is probably some perceived fear that going against the "consensus" (whatever that is) may begin to upset some people such as other forum members with higher post counts than you may have. Well fear is the enemy and I say speak up for what you believe in! If a few people don't like what you have to say, so be it. Ideas can be judged on technical merits alone if you make good arguments. I've been around for too long to just remain silent or just let others bully you into submission. It's not worth it. Producing great ideas shows to me that you are smarter than them!

jmg · 2014-03-01 16:13

potatohead wrote: »

What kinds of things would be debugged using only that fraction of time? In other words, what makes that time necessary as opposed to nice to have?

I think that depends on your overhead, and version control approach.
Some designs will try to keep the CPU clock as low as possible to save power, and there 6.25% overhead may be too much.
Other designs may not like the jitter as high as 1:16.
Someone strict on version control, may decide to always run the Debug %, just so they do not finish faster than any test cases, as that might give its own problems.

I would guess that debugging a SW USB design with 6.25% overhead would be more tricky than debugging one with 0.39% or even 0.0244%

I see it as a snooping Debug will always steal 1 * Tcyc slots, but doing that less frequently, could allow traced SW to not be disturbed as much.

( The alternative of a balanced move in Fsys may not be supported by the PLL & has other impacts.

eg Take the USB example, and say a design uses x8 for 96MHz nominal, if the PLL allowed 96*16/15 = 102.4MHz you could nudge up Sysclk, for the same average speed, but issues are
a) Other tasks/COGS have now had their clock changed
b) I don't think the present P2 PLL can actually do this )

If instead you can throttle back the Debug, so it wants 1 cycle per (roughly) 8 USB bytes, (0.195%) you should be able to 'hide' the Debug access jitter within the bit-sample Waits. ie you can real time trace the USB code, at close enough to full speed to give real results.

potatohead · 2014-03-01 16:29

I would add there is probably some perceived fear that going against the "consensus" (whatever that is) may begin to upset some people such as other forum members with higher post counts than you may have. Well fear is the enemy and I say speak up for what you believe in!

Strongly agreed, BTW.

There is a difference between fear and concern. Concern along with some more thinking saw a good solution at a reasonable logic / complexity cost. Well worth having, and the resulting feature is a win!

Sapieha · 2014-03-01 16:31

Hi rogloh.

Never hold any thing You will say.
I others don't like it --> theirs problem ---
BUT never use words that directly POINT any people --->
If You need answer to that

> use always I Think so

rogloh wrote: »

I would add there is probably some perceived fear that going against the "consensus" (whatever that is) may begin to upset some people such as other forum members with higher post counts than you may have. Well fear is the enemy and I say speak up for what you believe in! If a few people don't like what you have to say, so be it. Ideas can be judged on technical merits alone if you make good arguments. I've been around for too long to just remain silent or just let others bully you into submission. It's not worth it. Producing great ideas shows to me that you are smarter than them!

Propeller II update - BLOG

Comments