Multi-threading does not go against "the philosophy."
The round-robin hardware scheduled multi-threading is a natural counterpart to the round robin HUB access of COGs. "Wheels within wheels" you might say.
Pre-emptive multi-threading, otherwise known as "interrupt handling", is totally at odds with philosophy. Blasphemous I would say. It's more jamming a crowbar into your spokes!
I never though I would see the day when a Propeller would support interrupts and all the complexity and baggage that comes with them. That day has come.
But, Heater, that's just a use case, and not one that your or I would employ. I like the idea of being able to write a big program with richer context than a mere task, and without YIELDs strewn throughout, that I can run in time slices, along with other such programs. That seems great to me.
Are you worried that because the possibility of something like interrupts now exists, it will cause everything to go straight into the crapper?
There's a mechanism which can read/write a task's critical state information from/to the WIDEs.
By starving a task of time slots, it settles into a quiet mode where its state becomes static and it can be completely altered. When given time slots again, the task resumes execution.
This is realized by two instructions:
RDTASK D/# 'read state of task D/# into WIDEs
WRTASK D/# 'write state of task D/# from WIDEs
These instructions can be coordinated with WRWIDE/RDWIDE to quickly save and restore task state data from hub memory.
EDIT: Perhaps we can just have a single instruction SWPTASK D/# which swaps the task's data with WIDEs. That might be more efficient.
Yeah I guess it would save you one instruction in the switching sequence, but you'd probably still need the two hub cycles to save/restore the task info back to hub if that is where the data is going.
Actually could such a SWPTASK instruction be run by the user task itself while the task is still running? If so not only can we have cases where some scheduler task wakes up and swaps you out but if desired in some other type of application a task could also choose to swap itself out with new state info already setup in a WIDE. That might allow all sorts of weird and wonderful code and ideas to flourish, letting us use it for a breakpoint instruction to swap to some debugger context directly for example. I like it.
Pre-emptive multi-threading, otherwise known as "interrupt handling", is totally at odds with philosophy. Blasphemous I would say. It's more jamming a crowbar into your spokes!
I never though I would see the day when a Propeller would support interrupts and all the complexity and baggage that comes with them. That day has come.
I won't quote all of Ecclesiastes but...'There is a time for everything...'
Being overly rigid in any particular philosophy tends to be a bad thing.
We have to keep in mind that so far the typical Propeller based product is relatively low production. If supporting preemptive threading simplifies to move to the P2 from other processors that may result in more design wins.
Options are good, who wants to eat Spam all the time, now we can have Spam, Spam, Spam, Spam, Spam, Spam, Baked Beans, Spam, Spam, Spam and Spam...
I never though I would see the day when a Propeller would support interrupts and all the complexity and baggage that comes with them. That day has come.
I don't understand this statement. We don't have interrupts here. What we have is a way for PASM to get/set task information. By deciding to combine that with a scheduler task that is timesliced with some user task, we enable systems to stop a task, swap it out for another one, and resume. That is all we are doing here. There are no interrupts you will have to worry about, and you can still keep coding exactly the same way as before if you want. No one is stopping that model. But if you want to extend the P2 application space to allow multithreading, where you can readily port existing third party C code for example (not written with Prop in mind) which does depend on some RTOS like functionality, you will now have that possibility too. Win win for all. P2 opens up to be able to do even more things.
I don't understand this statement. We don't have interrupts here. What we have is a way for PASM to get/set task information. By deciding to combine that with a scheduler task that is timesliced with some user task, we enable systems to stop a task, swap it out for another one, and resume. That is all we are doing here. There are no interrupts you will have to worry about, and you can still keep coding exactly the same way as before if you want. No one is stopping that model. But if you want to extend the P2 application space to allow multithreading, where you can readily port existing third party C code for example (not written with Prop in mind) which does depend on some RTOS like functionality, you will now have that possibility too. Win win for all. P2 opens up to be able to do even more things.
Since the scheduler is software based you could implement 'soft interrupts'...
I'm all over the board on this, evolutionary architecture, revolutionary architecture?
Are we seeing a revolutionary change in the P2 where it becomes a viable hybrid architecture. If you stick to the P1 model, you have a truly enhanced and feature packed follow on to the P1 that will make all the P1 folks feel comfortable and rewarded for their investment of time and talent. If you start dipping your toes into all the new features, or start exploring them, you have a HUBEXEC model that is familiar to a lot of other MCU users (except there are 8 of them). If you get really brave and put yourself into both worlds, you could have something truly amazing to build with.
I'm just the kid in the back seat asking, "Are we there yet??" Excited to get on with the adventure no matter how it turns out!!!
Yeah I guess it would save you one instruction in the switching sequence, but you'd probably still need the two hub cycles to save/restore the task info back to hub if that is where the data is going.
Actually could such a SWPTASK instruction be run by the user task itself while the task is still running? If so not only can we have cases where some scheduler task wakes up and swaps you out but if desired in some other type of application a task could also choose to swap itself out with new state info already setup in a WIDE. That might allow all sorts of weird and wonderful code and ideas to flourish, letting us use it for a breakpoint instruction to swap to some debugger context directly for example. I like it.
Interesting ideas! You could definitely swap your state out, but you'd better not have anything else of yours in the pipeline, or there could be a schizophrenic episode. This is kind of a brain-bender to contemplate. I don't know if this would be useful. I'm kind of thinking maybe we should stick to separate instructions, since it's cleaner to think about. Keep thinking, though.
All this context saving accommodates "surprise" of the thread being switched. If you are doing this to your own task under software control, there's no need for all this surprise-proof circuitry. This circuit wants to sneak up on things. First it smothers the task in ether, then transports it to the mother ship, and back again. Only some vague sense of missing time, perhaps, is experienced by the thread.
If a thread can do a passcnt or jp, and then prevent further thread swapping and take over the cog for a short period of time, it's basically doing the same thing as an interrupt service routine. This is independent of the task swapping feature. That just allows it to add more tasks than can be supported by the 4 threads. Please correct me if this statement isn't correct. I haven't been following the last 100 posts very closely.
Interesting ideas! You could definitely swap your state out, but you'd better not have anything else of yours in the pipeline, or there could be a schizophrenic episode. This is kind of a brain-bender to contemplate. I don't know if this would be useful. I'm kind of thinking maybe we should stick to separate instructions, since it's cleaner to think about. Keep thinking, though.
All this context saving accommodates "surprise" of the thread being switched. If you are doing this under software control, there's no need for all this surprise-proof circuitry. This circuit wants to sneak up on things. First it smothers the task in chloroform, then transports it to the mother ship, and back again. Only some vague sense of missing time, perhaps.
With separate instructions, the dispatcher is in control and may decide NOT to swap the currently executing task. There could be cases where the current task is the best candidate to keep on running. With a combined instruction, you go through a potentially unneeded context switch and then another one to self-swapped task back in control. I think it's better to let the master task decide what is and isn't needed. You never want the workers to try and outguess the master.
I'm not sure you did mention it, but having the mask right there in D is nice, easy, and as you say, atomic. Thanks for the update!
Yes, David is right. Let's take a breather and get a summary. Perhaps things look too messy for all the mixed discussion, applicability as opposed to core functionality.
Re:Atmel adding spiffy new debug. THEY HAVE TO! They have to because their processor model is a PITA. And I would really like to not see this one be the same way. "Another tool in the toolbox" is appealing, but sometimes that it can be done, doesn't mean it should. Interrupts are just another tool, and do we want those? Really? Given what we know can happen when there are a few different choices and how those impact things overall? What we learned on P1. We've got a real time state dump! That, and a little bit of software can do amazing things. If we want to, a kernel can be written that steps through all kinds of stuff, and it can even be used in tandem with a real time one and the state dump. AT THE SAME TIME? Using the same chip for a video display to show it all even. Sheesh. Doing that on an ATMEL chip requires writing a whole pile of software and it's a mess. We can do it with relative ease, interactive if we want, using only some glue software and a few smarts.
Customers expect that stuff, because they don't expect the things we can do. Keep that in mind.
Please summarize the current state of this, as you see it right now Chip. Maybe write it like you would the final doc. That style. I think it will help.
Please summarize the current state of this, as you see it right now Chip. Maybe write it like you would the final doc. That style. I think it will help.
I did outline it pretty well a few posts ago. There's just not that much to say about it. I've been up all night making way for this addition. After I sleep, I'll implement RDTASK/WRTASK.
One thing I see we need is a way to set INDA/INDB using a variable. This is especially important for hub exec code, as it can't self-modify. These new SETINDA D and SETINDB D instructions will need a two-instruction gap before they take effect. Those are going to be very simple to implement.
As we start coding in hub exec mode, it may be apparent that some new instructions are needed to make things flow smoothly. I think we've got it mostly covered, already, but there may be a few things.
There's a mechanism which can read/write a task's critical state information from/to the WIDEs.
By starving a task of time slots, it settles into a quiet mode where its state becomes static and it can be completely altered. When given time slots again, the task resumes execution.
This is realized by two instructions:
RDTASK D/# 'read state of task D/# into WIDEs
WRTASK D/# 'write state of task D/# from WIDEs
These instructions can be coordinated with WRWIDE/RDWIDE to quickly save and restore task state data from hub memory.
EDIT: Perhaps we can just have a single instruction SWPTASK D/# which swaps the task's data with WIDEs. That might be more efficient.
And this:
SETTASK D/# 'starve the other task of time slots, it had 15/16 'starting on the next instruction, we'll have all the slots
SETPTRA save_addr 'point to 'save' area WRWIDE PTRA[0] 'save current WIDEs RDTASK D/# 'read task's states into WIDEs, along with DCACHE-address and DCACHE-valid bits
WRWIDE PTRA[1] 'save the entire task state
SETPTRA restore_addr 'point to 'restore' area RDWIDE PTRA[1] 'read new task state
WRTASK D/# 'establish new task state, along with DCACHE-address and DCACHE-valid bits
RDWIDEQ PTRA[0] 'set the new WIDEs without affecting DCACHE-address or DCACHE-valid (RDWIDEQ = quiet)
SETTASK D/# 'give task 15/16 time slots to turn it back on in its new state <wait for some time, update pointers, loop>
I don't know about others, but I did see more complexity here than adding two instructions.
One thing I see we need is a way to set INDA/INDB using a variable. This is especially important for hub exec code, as it can't self-modify. These new SETINDA D and SETINDB D instructions will need a two-instruction gap before they take effect. Those are going to be very simple to implement.
Yes, that would be great! I was playing with some P1 code that I ported to P2 that used MOVS and MOVD to modify MOV instructions. I changed them to SETS and SETD, and then had to add a couple of extra NOPs to handle P2's longer pipeline versus P1. This worked OK in cog code, but not in hub code. So then I thought about using SETINDA, but realized that it only works with immediate values. The new SETINDA D and SETINDB D instructions will be very useful, and the two-instruction gap is better than the three-instruction gap with SETS/SETD.
All this context saving accommodates "surprise" of the thread being switched. If you are doing this to your own task under software control, there's no need for all this surprise-proof circuitry. This circuit wants to sneak up on things. First it smothers the task in ether, then transports it to the mother ship, and back again. Only some vague sense of missing time, perhaps, is experienced by the thread.
Great metaphors! Gentleman, walnut farmer, and poet.
I think the final documentation will need an "Instruction Etymology" section to tell us where all these off the wall instructions came from and what problems they're intended to solve. I certainly get confused too while following this thread (although that's probably due, in part, to everybody having different names for the same thing).
After chewing Chip's state save / restore with WIDE's over, it occurs to me that we may not even need an instruction to single-step!
Consider:
SETTASK D/# 'starve the other task of time slots, it had 15/16 'starting on the next instruction, we'll have all the slots
SETPTRA save_addr 'point to 'save' area WRWIDE PTRA[0] 'save current WIDEs RDTASK D/# 'read task's states into WIDEs, along with DCACHE-address and DCACHE-valid bits
WRWIDE PTRA[1] 'save the entire task state
Instead of stopping the other task, give it 1/16th of cycles.
Keep saving its state. If the debugger gets 15/16th cycles, it has enough time to watch the debugee change state cycle by cycle (perhaps watch TASKTRACE?)
Watch the PC in the saved state, as soon as you see it increment, stop the task being watched. (we may need to keep track of pipeline stage, or when an instruction completes)
Bingo, you have stepped one instruction!
A variation on this can implement variable speed run modes, data break points, address break points.
You can even change the PC between instructions, to short circuit code, work around a bug... and you can even hot-patch the code being debugged!
I have garnered a lot of experience working with multiple threads in the P1. In fact ALL of the code I write is co-operative multi threading. It works great, ableit it could work even better/faster if there were an interrupt. But there isn't, so we make do.
An efficient task switch mechanism will greatly enhance my current switching functionality by making it faster and code-wise shorter.
My perception here with the P2 is that the "hardware" side of things allow us to efficiently (but as yet rather complicated due to some pipeline caveats) swap processor states. The concept is good, but until I better understand it, there seems to be a lot of "hair" to the implementation, and that scares me. Some experience will surely settle a bunch of that down. I hope new users will be able to reach the same conclusions before being put-off.
What I cannot get my head wrapped around is the "time sliced" concept. Seemingly, one task runs for a wile, and arbitrarily or round robbin, yields the processor to some other task without consideration for the needs of that other task. One of these tasks seems to be a scheduler controlling timings.....This is a whole new and strange world from my experience.
What I am used to in the P1, threads and tasks are the same thing. I thought I would mimic the way of doing things in the P2, but with better/faster instructions/architectures. And the basic philosophy here is that the scheduler does NOT control the timing. Task (thread) swap timing is determined by each individual thread, according to their own needs. The scheduler, which itself can be considered just another thread, simply keeps track of which thread is it next-to-run, and dispatches (swaps in from a WAITCNT state) that thread at the appropriate instant. If the processor was still busy with another thread, the swapping would wait until the running thread releases (yields) the processor, and that causes some jitter. And that is also why infinite loops need to have co-operative yields in them...... trivial, simply a JMPRET back to the scheduler. As each thread yields, it tells the scheduler when it wants to run again, and the scheduler puts that future time in a sequenced list of threads. When the next WAITCNT "event" expires, that next thread is launched. When that thread again yields, it's next runtime is again stored by the scheduler, along with its continuation address. Very, very simple, and very very effective. This method also lets you have any number of threads, not just 4 based on time slicing. Threads are SUSPENDED by never allowing a "match" to the sytem clock. Threads are RESUMED by the scheduler forcing an immeditate (or delayed) match. Threads have a one-clock cycle timing granularity, but may experience some jitter, depending on how busy the whole system becomes.
Having software threading leaves 100% of the processor available for computation versus hardware threading where some amount is continually consumed by the hardware.
What I find interesting here in drawing a comparison between the two methods, is that for many intended driver type applications such as a UART, the hardware approach will, effectively' still need to wait for the bit edge event to occur. But now it does a whole bunch of time-sliced spinning instead of just waiting in a yielded state, so I don't think there is much gain.
Anyhow, as I get to better understand the new world, I'm sure I will embrace the new concepts and deploy the appropriate elements to my advantage. And maybe even discover new ideas and methods ! I really DO like the new P2 and look forward to its cordic capabilities.
Native cog mode (single tasking) is great for super-hard-real time. (1080p drivers etc)
Hardware tasks are great for a lot of applications (one cog runs four tasks each more powerful than a 50MHz P1 cog)
Cooperative multi-threading is great for a lot of applications.
Pre-emptive mult-threading is great for a lot of applications.
There are overlap cases where you could use any method, however there are also cases where one or the other is preferrable.
The four P2 hardware tasks make it very easy to write drivers that do not have to precisely count cycles of where to place calls to the scheduler - as a matter of fact there will be no calls to the scheduler, saving memory.
Ozpropdev's Invaders brilliantly demonstrates how well this works on a Nano.
The threading being discussed is another model that makes large numbers of pre-emptive threads possible (thus no need for sprinkling calls to the scheduler in the code)
The brilliant thing is that on the P2 you can "pick your poison":
1) one task per cog, identical to P1
2) 2-4 tasks per cog, interleaved on a clock cycle basis, basically turns a cog into multiple baby cogs (similar to what XMOS)
3) one scheduler task, and 1-3 tasks capable of running multiple pre-emptive threads per task (also provides powerful debugging)
For any of the above, the code can be old-style "cog mode", or large model "hub-exec mode"
Any cog / task could run cooperative threads. Heck, potentially pre-emptive threads could run multiple cooperative threads!
Is the P1 simpler? YES
Is the P2 FAR more powerful than the P1? YES
The complication is well worth it. People who do not want to use the new features can ignore them - until such a time as they need them
Bill, I'm not in disagreement with you. I also love what's coming down the road, but, unlike yourself, do not yet adequately comprehend what that all looks like. And my comments are made based on my experience with the P1.
Well, my thought is this: Having the choices we did on P1 taught us a lot of things. A whole lot of where P2 is today is based on optimizing those things we learned about.
It remains possible to employ the P1 philosophy on P2. It is now possible to take things in ways more aligned with "those standard ways" many of us enjoy not dealing with on the P1.
So then, we will see a split in how things are done. A few splits actually. Personally, I think opening the door too wide on things like enabling a RTOS as opposed to writing in ways that do not require it will set other expectations that end up diluting the primary strengths we all know so well.
"me too" kind of thing, if I were to put it in very general terms.
So then, we will compete. May the best factions see great success! Perhaps this is good. Hope so.
This isn't the first time you've expressed a concern about expectations vis-a-vis the P2. We all come at life from different perspectives so it shouldn't seem odd that I don't understand this worry.
If one guy wants to write and market an RTOS for the P2, and some other guy wants to buy it, I would have to ask, "What's the big deal?" I won't be buying one, but I don't feel the slightest anxiety that someone else may.
This is no more of a split than the use of SPIN vs PASM vs C vs FORTH vs BASIC vs whatever, which is a perpetual part of the Propeller world. In fact, one could consider TF to be an RTOS for the P1, and its existence certainly hasn't weakened anything!
While I may never purchase an RTOS in my life, I'll be using preemptive multitasking the same day Chip uploads the relevant .jic file.
All software bloats to the point where it can send e-mail.
Maybe all CPU's bloat until they feature interrupts.
My worry is dilution of the real strength. As things bloat, they lose focus, it becomes harder to adopt them, harder to support them and harder to use them.
When overall share is small, resources limited, number of mature competetors high, risk is also high. Differentiation is a primary determining factor to a products success entering a market and getting enough share to make the business sustainable. Diluting things generally takes the discussion to a simple price / feature discussion, and that is a loser for the P2, IMHO. The discussion needs to be how the unique part competes with and adds more value to the buyers goals, not how similar it is to what the buyer is already doing. The former is a discussion about value, the latter mostly about price.
The reason it needs to be there is because when a new product is a me too product over being a new product, the primary competition ends up being "do nothing" instead of "do it different and or better"
Not that it is a bad chip. I think it is brilliant. It's not about that.
Diluting the core differentiator reduces the ability to set expectations aligned with the core strengths, and it does so by validating them. Once validated, the thing which was a core strength now becomes a nice option, and differences now become sales exceptions. No JTAG? How come interrupts are so goofy on your chip? What do you mean I have to do X, Y, Z in C to make it work? ETC...
Worse, special features become costs, and that too becomes an exception, all of which favor the established players, not the updtart looking to own a niche.
I've been around the block a few times on various new technology launches, and I've seen this play out enough times to be concerned, and so I'm going to express that concern because I like everybody here, thing Chip is brilliant, Parallax awesome, and so forth. The people here are sharp as all get out too. I learn something excellent nearly every day.
I sure am not going to return the favor by failing to share my experiences in like kind.
What happens, happens, but it won't have happened eyes closed. I value people here too much for that.
After chewing Chip's state save / restore with WIDE's over, it occurs to me that we may not even need an instruction to single-step!
...
Watch the PC in the saved state, as soon as you see it increment, stop the task being watched. (we may need to keep track of pipeline stage, or when an instruction completes)
I think this is supported and even simpler, as when I asked Chip some ways back about the precise phase of Task handovers, I think his reply inferred that a write to change taskmap, effectively resets the scanner too, so that means you can know/choose exactly which one of 16 cycles after the Write, you give to the Step Task.
The pipeline may move the execute point, but that will be stable.
Just for fun, this is what 4 individual co-operative LED flashers look like. Expandable to as many as you like !
code:
CON
_clkmode = xtal1 + pll16x 'Standard clock mode * crystal frequency = 80 MHz
_xinfreq = 5_000_000
Pause = $12
Sentinel = $33
mSec = 1000
DAT
SchedNoPoll2
File "SchedNoPoll2.binary"
LEDTimer1 long 0
LEDVector1 long LEDLoad1<<23 +LEDCode1<<9 +LEDVector2
LEDLoad1 long 110* mSec
LEDCode1 xor outa,LEDBit1
jmp #Pause 'return to scheduler
LEDBit1 long 1<<16
LEDTimer2 long 0
LEDVector2 long LEDLoad2<<23 +LEDCode2<<9 +LEDVector3
LEDLoad2 long 120* mSec
LEDCode2 xor outa,LEDBit2
jmp #Pause 'return to scheduler
LEDBit2 long 1<<17
LEDTimer3 long 0
LEDVector3 long LEDLoad3<<23 +LEDCode3<<9 +LEDVector4
LEDLoad3 long 130* mSec
LEDCode3 xor outa,LEDBit3
jmp #Pause 'return to scheduler
LEDBit3 long 1<<18
LEDTimer4 long 0
LEDVector4 long LEDLoad4<<23 +LEDCode4<<9 +Sentinel+1
LEDLoad4 long 140* mSec
LEDCode4 xor outa,LEDBit4
jmp #Pause 'return to scheduler
LEDBit4 long 1<<19
The discussion needs to be how the unique part competes with and adds more value to the buyers goals, not how similar it is to what the buyer is already doing.
I agree completely with this. Maybe what we see differently is whether the current feature under discussion, preemptive multitaksing a la Propeller, makes the Propeller just like an ARM or AVR. Or whether it "...adds value to the buyer's goals..."
Obviously this is a judgement call, and your opinion is clearly valuable. You deal with big systems and system integration, and have a huge amount of experience of that sort. My perspective is that of a classic embedded control engineer: Build a box, draft a board, write a few thousand lines of code, and then turn it all over to manufacturing for replication.
From my perspective, the P2 is in no danger of becoming an ARM.
Still, if I were to point to a recent development that makes it a little more like an ARM, it would be hub execution. Even with that, though, it's still wildly different, and in a very good way.
Comments
But, Heater, that's just a use case, and not one that your or I would employ. I like the idea of being able to write a big program with richer context than a mere task, and without YIELDs strewn throughout, that I can run in time slices, along with other such programs. That seems great to me.
Are you worried that because the possibility of something like interrupts now exists, it will cause everything to go straight into the crapper?
Yeah I guess it would save you one instruction in the switching sequence, but you'd probably still need the two hub cycles to save/restore the task info back to hub if that is where the data is going.
Actually could such a SWPTASK instruction be run by the user task itself while the task is still running? If so not only can we have cases where some scheduler task wakes up and swaps you out but if desired in some other type of application a task could also choose to swap itself out with new state info already setup in a WIDE. That might allow all sorts of weird and wonderful code and ideas to flourish, letting us use it for a breakpoint instruction to swap to some debugger context directly for example. I like it.
I won't quote all of Ecclesiastes but...'There is a time for everything...'
Being overly rigid in any particular philosophy tends to be a bad thing.
We have to keep in mind that so far the typical Propeller based product is relatively low production. If supporting preemptive threading simplifies to move to the P2 from other processors that may result in more design wins.
Options are good, who wants to eat Spam all the time, now we can have Spam, Spam, Spam, Spam, Spam, Spam, Baked Beans, Spam, Spam, Spam and Spam...
C.W.
I don't understand this statement. We don't have interrupts here. What we have is a way for PASM to get/set task information. By deciding to combine that with a scheduler task that is timesliced with some user task, we enable systems to stop a task, swap it out for another one, and resume. That is all we are doing here. There are no interrupts you will have to worry about, and you can still keep coding exactly the same way as before if you want. No one is stopping that model. But if you want to extend the P2 application space to allow multithreading, where you can readily port existing third party C code for example (not written with Prop in mind) which does depend on some RTOS like functionality, you will now have that possibility too. Win win for all. P2 opens up to be able to do even more things.
Since the scheduler is software based you could implement 'soft interrupts'...
Do I smell a pot of tar...
C.W.
Are we seeing a revolutionary change in the P2 where it becomes a viable hybrid architecture. If you stick to the P1 model, you have a truly enhanced and feature packed follow on to the P1 that will make all the P1 folks feel comfortable and rewarded for their investment of time and talent. If you start dipping your toes into all the new features, or start exploring them, you have a HUBEXEC model that is familiar to a lot of other MCU users (except there are 8 of them). If you get really brave and put yourself into both worlds, you could have something truly amazing to build with.
I'm just the kid in the back seat asking, "Are we there yet??" Excited to get on with the adventure no matter how it turns out!!!
If you start with a signal coming in off a pin that triggers a soft interrupt, does that become a firm interrupt?
Interesting ideas! You could definitely swap your state out, but you'd better not have anything else of yours in the pipeline, or there could be a schizophrenic episode. This is kind of a brain-bender to contemplate. I don't know if this would be useful. I'm kind of thinking maybe we should stick to separate instructions, since it's cleaner to think about. Keep thinking, though.
All this context saving accommodates "surprise" of the thread being switched. If you are doing this to your own task under software control, there's no need for all this surprise-proof circuitry. This circuit wants to sneak up on things. First it smothers the task in ether, then transports it to the mother ship, and back again. Only some vague sense of missing time, perhaps, is experienced by the thread.
Yup, or testing to see if a UART object received a character, or...
Oh, the tar is boiling...
And why are all those chickens naked?
C.W.
With separate instructions, the dispatcher is in control and may decide NOT to swap the currently executing task. There could be cases where the current task is the best candidate to keep on running. With a combined instruction, you go through a potentially unneeded context switch and then another one to self-swapped task back in control. I think it's better to let the master task decide what is and isn't needed. You never want the workers to try and outguess the master.
Yes, David is right. Let's take a breather and get a summary. Perhaps things look too messy for all the mixed discussion, applicability as opposed to core functionality.
Re:Atmel adding spiffy new debug. THEY HAVE TO! They have to because their processor model is a PITA. And I would really like to not see this one be the same way. "Another tool in the toolbox" is appealing, but sometimes that it can be done, doesn't mean it should. Interrupts are just another tool, and do we want those? Really? Given what we know can happen when there are a few different choices and how those impact things overall? What we learned on P1. We've got a real time state dump! That, and a little bit of software can do amazing things. If we want to, a kernel can be written that steps through all kinds of stuff, and it can even be used in tandem with a real time one and the state dump. AT THE SAME TIME? Using the same chip for a video display to show it all even. Sheesh. Doing that on an ATMEL chip requires writing a whole pile of software and it's a mess. We can do it with relative ease, interactive if we want, using only some glue software and a few smarts.
Customers expect that stuff, because they don't expect the things we can do. Keep that in mind.
Please summarize the current state of this, as you see it right now Chip. Maybe write it like you would the final doc. That style. I think it will help.
I did outline it pretty well a few posts ago. There's just not that much to say about it. I've been up all night making way for this addition. After I sleep, I'll implement RDTASK/WRTASK.
One thing I see we need is a way to set INDA/INDB using a variable. This is especially important for hub exec code, as it can't self-modify. These new SETINDA D and SETINDB D instructions will need a two-instruction gap before they take effect. Those are going to be very simple to implement.
As we start coding in hub exec mode, it may be apparent that some new instructions are needed to make things flow smoothly. I think we've got it mostly covered, already, but there may be a few things.
And this:
I don't know about others, but I did see more complexity here than adding two instructions.
Great metaphors! Gentleman, walnut farmer, and poet.
Consider:
Instead of stopping the other task, give it 1/16th of cycles.
Keep saving its state. If the debugger gets 15/16th cycles, it has enough time to watch the debugee change state cycle by cycle (perhaps watch TASKTRACE?)
Watch the PC in the saved state, as soon as you see it increment, stop the task being watched. (we may need to keep track of pipeline stage, or when an instruction completes)
Bingo, you have stepped one instruction!
A variation on this can implement variable speed run modes, data break points, address break points.
You can even change the PC between instructions, to short circuit code, work around a bug... and you can even hot-patch the code being debugged!
I have garnered a lot of experience working with multiple threads in the P1. In fact ALL of the code I write is co-operative multi threading. It works great, ableit it could work even better/faster if there were an interrupt. But there isn't, so we make do.
An efficient task switch mechanism will greatly enhance my current switching functionality by making it faster and code-wise shorter.
My perception here with the P2 is that the "hardware" side of things allow us to efficiently (but as yet rather complicated due to some pipeline caveats) swap processor states. The concept is good, but until I better understand it, there seems to be a lot of "hair" to the implementation, and that scares me. Some experience will surely settle a bunch of that down. I hope new users will be able to reach the same conclusions before being put-off.
What I cannot get my head wrapped around is the "time sliced" concept. Seemingly, one task runs for a wile, and arbitrarily or round robbin, yields the processor to some other task without consideration for the needs of that other task. One of these tasks seems to be a scheduler controlling timings.....This is a whole new and strange world from my experience.
What I am used to in the P1, threads and tasks are the same thing. I thought I would mimic the way of doing things in the P2, but with better/faster instructions/architectures. And the basic philosophy here is that the scheduler does NOT control the timing. Task (thread) swap timing is determined by each individual thread, according to their own needs. The scheduler, which itself can be considered just another thread, simply keeps track of which thread is it next-to-run, and dispatches (swaps in from a WAITCNT state) that thread at the appropriate instant. If the processor was still busy with another thread, the swapping would wait until the running thread releases (yields) the processor, and that causes some jitter. And that is also why infinite loops need to have co-operative yields in them...... trivial, simply a JMPRET back to the scheduler. As each thread yields, it tells the scheduler when it wants to run again, and the scheduler puts that future time in a sequenced list of threads. When the next WAITCNT "event" expires, that next thread is launched. When that thread again yields, it's next runtime is again stored by the scheduler, along with its continuation address. Very, very simple, and very very effective. This method also lets you have any number of threads, not just 4 based on time slicing. Threads are SUSPENDED by never allowing a "match" to the sytem clock. Threads are RESUMED by the scheduler forcing an immeditate (or delayed) match. Threads have a one-clock cycle timing granularity, but may experience some jitter, depending on how busy the whole system becomes.
Having software threading leaves 100% of the processor available for computation versus hardware threading where some amount is continually consumed by the hardware.
What I find interesting here in drawing a comparison between the two methods, is that for many intended driver type applications such as a UART, the hardware approach will, effectively' still need to wait for the bit edge event to occur. But now it does a whole bunch of time-sliced spinning instead of just waiting in a yielded state, so I don't think there is much gain.
Anyhow, as I get to better understand the new world, I'm sure I will embrace the new concepts and deploy the appropriate elements to my advantage. And maybe even discover new ideas and methods ! I really DO like the new P2 and look forward to its cordic capabilities.
And hopefully soon.
Keep on truckin' Chip were sure you can do it !
Cheers,
Peter (pjv)
Native cog mode (single tasking) is great for super-hard-real time. (1080p drivers etc)
Hardware tasks are great for a lot of applications (one cog runs four tasks each more powerful than a 50MHz P1 cog)
Cooperative multi-threading is great for a lot of applications.
Pre-emptive mult-threading is great for a lot of applications.
There are overlap cases where you could use any method, however there are also cases where one or the other is preferrable.
The four P2 hardware tasks make it very easy to write drivers that do not have to precisely count cycles of where to place calls to the scheduler - as a matter of fact there will be no calls to the scheduler, saving memory.
Ozpropdev's Invaders brilliantly demonstrates how well this works on a Nano.
The threading being discussed is another model that makes large numbers of pre-emptive threads possible (thus no need for sprinkling calls to the scheduler in the code)
The brilliant thing is that on the P2 you can "pick your poison":
1) one task per cog, identical to P1
2) 2-4 tasks per cog, interleaved on a clock cycle basis, basically turns a cog into multiple baby cogs (similar to what XMOS)
3) one scheduler task, and 1-3 tasks capable of running multiple pre-emptive threads per task (also provides powerful debugging)
For any of the above, the code can be old-style "cog mode", or large model "hub-exec mode"
Any cog / task could run cooperative threads. Heck, potentially pre-emptive threads could run multiple cooperative threads!
Is the P1 simpler? YES
Is the P2 FAR more powerful than the P1? YES
The complication is well worth it. People who do not want to use the new features can ignore them - until such a time as they need them
Regards,
Bill
Thanks,
Peter (pjv)
Well, my thought is this: Having the choices we did on P1 taught us a lot of things. A whole lot of where P2 is today is based on optimizing those things we learned about.
It remains possible to employ the P1 philosophy on P2. It is now possible to take things in ways more aligned with "those standard ways" many of us enjoy not dealing with on the P1.
So then, we will see a split in how things are done. A few splits actually. Personally, I think opening the door too wide on things like enabling a RTOS as opposed to writing in ways that do not require it will set other expectations that end up diluting the primary strengths we all know so well.
"me too" kind of thing, if I were to put it in very general terms.
So then, we will compete. May the best factions see great success! Perhaps this is good. Hope so.
This isn't the first time you've expressed a concern about expectations vis-a-vis the P2. We all come at life from different perspectives so it shouldn't seem odd that I don't understand this worry.
If one guy wants to write and market an RTOS for the P2, and some other guy wants to buy it, I would have to ask, "What's the big deal?" I won't be buying one, but I don't feel the slightest anxiety that someone else may.
This is no more of a split than the use of SPIN vs PASM vs C vs FORTH vs BASIC vs whatever, which is a perpetual part of the Propeller world. In fact, one could consider TF to be an RTOS for the P1, and its existence certainly hasn't weakened anything!
While I may never purchase an RTOS in my life, I'll be using preemptive multitasking the same day Chip uploads the relevant .jic file.
Maybe all CPU's bloat until they feature interrupts.
My worry is dilution of the real strength. As things bloat, they lose focus, it becomes harder to adopt them, harder to support them and harder to use them.
When overall share is small, resources limited, number of mature competetors high, risk is also high. Differentiation is a primary determining factor to a products success entering a market and getting enough share to make the business sustainable. Diluting things generally takes the discussion to a simple price / feature discussion, and that is a loser for the P2, IMHO. The discussion needs to be how the unique part competes with and adds more value to the buyers goals, not how similar it is to what the buyer is already doing. The former is a discussion about value, the latter mostly about price.
The reason it needs to be there is because when a new product is a me too product over being a new product, the primary competition ends up being "do nothing" instead of "do it different and or better"
Not that it is a bad chip. I think it is brilliant. It's not about that.
Diluting the core differentiator reduces the ability to set expectations aligned with the core strengths, and it does so by validating them. Once validated, the thing which was a core strength now becomes a nice option, and differences now become sales exceptions. No JTAG? How come interrupts are so goofy on your chip? What do you mean I have to do X, Y, Z in C to make it work? ETC...
Worse, special features become costs, and that too becomes an exception, all of which favor the established players, not the updtart looking to own a niche.
I've been around the block a few times on various new technology launches, and I've seen this play out enough times to be concerned, and so I'm going to express that concern because I like everybody here, thing Chip is brilliant, Parallax awesome, and so forth. The people here are sharp as all get out too. I learn something excellent nearly every day.
I sure am not going to return the favor by failing to share my experiences in like kind.
What happens, happens, but it won't have happened eyes closed. I value people here too much for that.
I think this is supported and even simpler, as when I asked Chip some ways back about the precise phase of Task handovers, I think his reply inferred that a write to change taskmap, effectively resets the scanner too, so that means you can know/choose exactly which one of 16 cycles after the Write, you give to the Step Task.
The pipeline may move the execute point, but that will be stable.
Just for fun, this is what 4 individual co-operative LED flashers look like. Expandable to as many as you like !
code:
CON
_clkmode = xtal1 + pll16x 'Standard clock mode * crystal frequency = 80 MHz
_xinfreq = 5_000_000
Pause = $12
Sentinel = $33
mSec = 1000
DAT
SchedNoPoll2
File "SchedNoPoll2.binary"
LEDTimer1 long 0
LEDVector1 long LEDLoad1<<23 +LEDCode1<<9 +LEDVector2
LEDLoad1 long 110* mSec
LEDCode1 xor outa,LEDBit1
jmp #Pause 'return to scheduler
LEDBit1 long 1<<16
LEDTimer2 long 0
LEDVector2 long LEDLoad2<<23 +LEDCode2<<9 +LEDVector3
LEDLoad2 long 120* mSec
LEDCode2 xor outa,LEDBit2
jmp #Pause 'return to scheduler
LEDBit2 long 1<<17
LEDTimer3 long 0
LEDVector3 long LEDLoad3<<23 +LEDCode3<<9 +LEDVector4
LEDLoad3 long 130* mSec
LEDCode3 xor outa,LEDBit3
jmp #Pause 'return to scheduler
LEDBit3 long 1<<18
LEDTimer4 long 0
LEDVector4 long LEDLoad4<<23 +LEDCode4<<9 +Sentinel+1
LEDLoad4 long 140* mSec
LEDCode4 xor outa,LEDBit4
jmp #Pause 'return to scheduler
LEDBit4 long 1<<19
PUB Null
coginit(0,@SchedNoPoll2,0)
/code
Cheers,
Peter (pjv)
I agree completely with this. Maybe what we see differently is whether the current feature under discussion, preemptive multitaksing a la Propeller, makes the Propeller just like an ARM or AVR. Or whether it "...adds value to the buyer's goals..."
Obviously this is a judgement call, and your opinion is clearly valuable. You deal with big systems and system integration, and have a huge amount of experience of that sort. My perspective is that of a classic embedded control engineer: Build a box, draft a board, write a few thousand lines of code, and then turn it all over to manufacturing for replication.
From my perspective, the P2 is in no danger of becoming an ARM.
Still, if I were to point to a recent development that makes it a little more like an ARM, it would be hub execution. Even with that, though, it's still wildly different, and in a very good way.