AMD, with Bulldozer, went a bit further implementation wise, and calls them cores - when in reality, they pack two integer "cores" that share some aspects of dispatch etc and one "FPU" and call them two cores.
I do not mind the terminology of core, as that is effectively what the programmer 'sees', provided there is some distinction that it is hot a separate set of physical cores. It comes with some caveats.
The Silicon certainly should minimize those caveats, (and remove the nasty ones) but it cannot eliminate all of them.
So I think some (small) modifier is needed, vCore, (virtual-core) or fCore (Flexible core) or tsCore (time Sliced core) or sCore (Sliced Core) or mCore (time-mapped core).. or (add your own modifier here...)
So that's a lot if industry training, and nomenclature to flip on its head...
P2 time-sliced threads are not exactly the same thing, but they are sharing the pipeline, and done at the lowest hardware level, so there will be no complete inversion of training involved here.
...unitl you get to threads (or pthreads) at the next layer of abstraction, the software. Then you can throw out the definitions coined by Intel and talk about the lightweight implementations of software constructs that are similar to processes (which are sometimes called tasks) but you need to keep processes and processors distinct since one is software and one is hardware. And then, theads can run within processes on some systems..so you can have a thread running inside a process that is currently executing in a hyperthread within a core within a processor which could be one of many processors within a a server (or a systems) that can be a node or part of a node or part of a cluster within a node.....turtles all the way down!
So the whole thing is one confusing mess even when you read through the definitions and if you don't disambiguate (sorry, I've been Googling today) in your documentation and context, it doesn't matter what you call them. Introducing new terms sounds like a marketing trick and leads to more confusion until you learn the new terms.
You could go COG, VCOG, SoftCOG but that seems silly.
You could go COG, COG Slice and Code Slice but that is even sillier.
When I hear thread, but mind doesn't race to HyperThreads and Intel - I go to switched software execution units under the control of some scheduler...and usually, that scheduler is just another thread because it is switchable on most systems because it can get pre-empted. In our case, I don't think we can let the scheduler get preempted since it should run at the top level.
If you rename the task, then you need to rename the SETTASK instruction (which is a minor annoyance at this point).
You are over-complicating it, and your proposed terminology is far too similar, far too easy to confuse.
Much more confusing than cog/task/thread.
I can live with cog/vcog/thread, but the names you proposed confuse even me. Frankly, I find cog/task/thread to be the easiest and most logical.
Also, I see no reason to follow terminology Intel uses for some of its desktop processors, when software threading has been available - and called threads - for over 40 years.
Yes, which means you cannot use hyper-threading + threading, so still need a separation in terminology.
I do not mind the terminology of core, as that is effectively what the programmer 'sees', provided there is some distinction that it is hot a separate set of physical cores. It comes with some caveats.
The Silicon certainly should minimize those caveats, (and remove the nasty ones) but it cannot eliminate all of them.
So I think some (small) modifier is needed, vCore, (virtual-core) or fCore (Flexible core) or tsCore (time Sliced core) or sCore (Sliced Core) or mCore (time-mapped core).. or (add your own modifier here...)
So I think some (small) modifier is needed, vCore, (virtual-core) or fCore (Flexible core) or tsCore (time Sliced core) or sCore (Sliced Core) or mCore (time-mapped core).. or (add your own modifier here...)
Good, grief.
Over there in XMOS land they had chips with four processors (cores). Each capable of up to 8 hardware, time sliced, threads.
Some brilliant marketing guy at XMOS decided to advertise this as four "tiles" each supporting 8 "logical cores".
Two engineers walk into a bar, the first one tells the bartender, "I'll have a Propeller" and the bartender gives him a drink.
The second engineer says, "I'll have a Propeller too."
Bartender say, "Sorry, you can't"
Second engineer, "But you just gave him one!"
Bartender, "No, I didn't."
Second engineer, "You most certainly did give him a Propeller!"
Bartender, "Why, yes, I certainly did!"
Second engineer, "Well! Then I'd like a Propeller too!"
Bartender, "No, sorry, don't know how to make one of those yet."
....
and so on......
I can live with cog/vcog/thread, but the names you proposed confuse even me. Frankly, I find cog/task/thread to be the easiest and most logical.
Also, I see no reason to follow terminology Intel uses for some of its desktop processors, when software threading has been available - and called threads - for over 40 years.
If that is the case, why did we have so many talking about preemptive multitasking, for the full-software-swap topic ?
Shouldn't they have all naturally used preemptive multi threading ?
A benefit of using just one of thread/task, is less confusion, so I would favour cog/vcog/thread
If that is the case, why did we have so many talking about preemptive multitasking, for the full-software-swap topic ?
Shouldn't they have all naturally used preemptive multi threading ?
A benefit of using just one of thread/task, is less confusion, so I would favour cog/vcog/thread
Umm... that one thread AT A TIME per task, with the thread being switched as needed by the scheduler.
Basically, see my example in #5862 in this task (hehe) that shows how I plan to use cogs/tasks/threads. I think you will find it to be a compelling case for why threads are very useful.
Basically, see my example in #5862 in this task (hehe) that shows how I plan to use cogs/tasks/threads. I think you will find it to be a compelling case for why threads are very useful.
I'm sure we agree on the uses of a full software swap, and even on the need for Silicon support to make doing that safe and compact, no matter if it is called multitasking or multithreading up at that level.
I was trying to avoid the usage confusion that already exists (as shown on here) between what is SW-managed and what is HW-managed
I really don't think anyone finds the following confusing, except when someone mixes up task/thread:
1. cog (totally deterministic)
2. task (one of four hardware tasks, very deterministic)
3. thread (thread one of many threads of execution in a task, what is determinism?)
The cog/task/thread terminology works, it is simple, clear, and no need to muck it up.
Threads also correspond to software threads.
This has my vote. Simple, clean.
Re: debug and read that task register.
You want to be able to catch any unexpected change to TaskMap, between break points, for example.
Here's the thing. Right now, it's not exposed to the memory map. Part of the reason we would want read / write capability is to understand when the register is getting corrupted / written to out of hand., right?
Of course, this kind of thing happens when you expose it all to a running program, which the P2 does not do. So then, the debug is easy. Did the code run SETTASK? If so, then that's where the problem is. Done, next. Search between breakpoints, and look at self-modifying code.
We don't need read on SETTASK for debug purposes, and frankly if we can read it, then people will, and we are back to "seems like good data, until it isn't" and bad practice.
Finally, "reading registers is common on MCU's"
Yeah? So are a lot of things. This isn't a common MCU, and things can play out differently, etc... Again, those devices with all the registers exposed to the memory map, as opposed to instructions, can much more easily exhibit corruption through mistaken / subtle reads and writes. Having SETTASK work the way it does, factors that all out.
Thread/task/process has always been confusing, always was always will be. Introducing hyper-threads doesn't help. Having thread or task at that layer leads to confusion, sorry Intel.
People that understand seem to quickly level set if the terms are identified in context. If you don't have any background it is confusing.
Although vcog sound more marketing than any of them, it does distinguish the result of a set task instruction.
I hope Chip reads through all this blather and gets the essence of the idea behind the YIELD or PREEMPT instruction that started this latest round of blather.
"Gentlemen's agreement" and code in the same forum thread make chills run up and down my spine. I don't want my microcontroller to have the complexity of my PC. (Spent hours today getting Windows 7, cable card tuner, network adapter, and tuning adapter to work today. Not sure I have it fixed because before it would work for weeks before dying.) Object X should not be able to cause Object Y to fail.
Here's the thing. Right now, it's not exposed to the memory map. Part of the reason we would want read / write capability is to understand when the register is getting corrupted / written to out of hand., right?
Of course, this kind of thing happens when you expose it all to a running program, which the P2 does not do. So then, the debug is easy. Did the code run SETTASK? If so, then that's where the problem is. Done, next. Search between breakpoints, and look at self-modifying code.
I'm not following this debug method, as the TASK map may have been changed in another thread (eg a library)
A user may have no idea it is even happening, they see strange results...
Besides making solid sense for debug, reliable read of Slice mapping allows those being sliced to make decisions about how much resource they have to work with.
Code can be designed to be more reliable, as it can easily check it has the resource it expects/needs.
We don't need read on SETTASK for debug purposes, and frankly if we can read it, then people will, and we are back to "seems like good data, until it isn't" and bad practice.
I'm following this logic even less ? - it is simply reading a register. there is no 'good data' aspect, it is just the value that is being used by the Slice Mapper.
Giving programmers accurate information is never bad practice.
If any user has a total aversion to reading this, or using it, then they can never read it.
"Gentlemen's agreement" and code in the same forum thread make chills run up and down my spine. I don't want my microcontroller to have the complexity of my PC. (Spent hours today getting Windows 7, cable card tuner, network adapter, and tuning adapter to work today. Not sure I have it fixed because before it would work for weeks before dying.) Object X should not be able to cause Object Y to fail.
Fully agree.
The silicon should manage the details, so the programmer does not have to (almost) clone a flip-flop, and a programmer should be able to design verifiable code.
I also think at this point, all we need is a bit of hardware support for a task (or a thread within a task) yielding/exiting.
The simplest scheme I can think of only takes one instruction:
YIELD dest, #code ' code must be non-zero ie 1..511
This instruction would do the following:
- write "code" to the "dest" register
- loop to itself
A nice simple mechanism, that allows a lot of possibilities:
- the scheduler can monitor dest by "tjnz dest,#handler"
- the handler can take an action based on the contents of dest
Possible actions are: (numbers are arbitrary)
- if dest==$0FF, shut down task (ie exit())
- if dest = {1..128} wait for specified event or timeout (ie select()) - note exact mechanism would be software defined
- if dest = {129..254} we have a breakpoint 1..127, exit is just a breakpoint that never returns
- if dest = {$100..$1FF} perform a system software function {ie getch, putch, puts etc}
The nice thing is that all of the above - with the exception of "YIELD" (but perhaps SYSTEM is a better name) instruction is software defined.
After YIELD is starved by the scheduler, when that task/thread gets cycles again, it can resume on the next instruction (by adding 1 to the saved PC)
This is far simpler to understand, provides far more capability, and takes far fewer cycles (and less code) than decoding a task long.
And dare I say it... having this simple instruction is in the "propeller way" - simple hardware, push complexity off to software.
NOTE:
This could be done as
mov dest,#code
jmp #self
but given how often this would occur in code, it would save a LOT of hub memory to have a separate instruction to do this.
The amount of gates/logic to iomplement this should be trivial, and it solves:
- yielding
- task/thread exiting
- breakpoints
- some system library calls
I think this YIELD instruction is the way to go. It's flexible, simple, and robust.
It seems to me, too, that two things must happen in a 'yield':
1) A value must be written to a common register to signify and define the yield to the scheduler.
2) The instruction loops until the saved PC gets incremented by the scheduler.
I think any more behavior than this is superfluous and possibly overwrought.
By the way, to implement an instruction such as this is a little beyond nothing, in terms of Verilog code and gates.
The moment you read the register, you don't know if the data is good. It could have changed before the info can be acted on. The scheduler can very easily publish any data the task needs.
I vote for YIELD as bill described it.
Re: gentlemen's agreement.
The alternative is to not use the tasking capability. We don't have supervisor modes, which would be needed to make the agreements "binding"
In the simplest case, default to COG code, not tasks; otherwise, tasks can impact tasks.
I am glad you like it, I thought it would be easy to implement.
I think "SYS" may be better than yield, as it does much more. Here are some pasm ideas:
SYS task1reg, #EXIT ' tells the scheduler the thread is finished
SYS task1reg, #YIELD ' tells the scheduler the thread yields the time left in its normal time slice
SYS task1reg,#BREAK22 ' tells the scheduler that the thread has hit breakpoint 22
SYS task1reg,#WAITFOR+signal ' tells the scheduler to suspend the thread until 'signal' is set, where signal is a small positive integer (say 1..127)
SYS task1reg,#PUTCH
SYS task1reg,#PUTS
The reason for specifying the taskXregister is in case we want to manage multiple tasks running threads, or need more flexibility.
I really like it that you will internally increment the PC while SYS/YIELD is looping, that means there will be no need for the scheduler to increment it in the WIDE.
What is the latest on saving/restoring the task state, and single stepping?
I think this YIELD instruction is the way to go. It's flexible, simple, and robust.
It seems to me, too, that two things must happen in a 'yield':
1) A value must be written to a common register to signify and define the yield to the scheduler.
2) The instruction loops until the saved PC gets incremented by the scheduler.
I think any more behavior than this is superfluous and possibly overwrought.
By the way, to implement an instruction such as this is a little beyond nothing, in terms of Verilog code and gates.
Prior to November, ozpropdev showed us just what could be done using the simple 4 tasks available in a cog. This was fantastic.
Since then, great strides forward with HUBEXEC have been made.
However, we have added all sorts of help to the multi-tasking and now multi-threading. I am totally lost as to where it is now, and I am not interested.
No-one has put forward an argument of where this is actually going to be used, or how many extra P2 chips this is going to sell for Parallax. there is all sorts of "fluffy" ideas about what can be done with this. Originally I thought it was the GCC guys pushing for this, but they seem to be against it to.
Bill seems the only one with any ideas, and this seems to be for a full-blown micro computer system.
Time to put your cards on the table guys...
To those that are pushing for these additions (pre-emptive multi-tasking or is it multi-threading - I don't even understand anymore)...
1. How many P1's have you used/sold ?
2. What is your plan to use these features on P2 ?
3. How many additional P2's are you going to use/sell if you get these pre-emptive features ?
4. How many P2's will you use/sell without these pre-emptive features ?
It still continues to eat up silicon and time.
It's fast becoming a monster aimed at competing with ARM etc.
It's a microcontroller.
Everyone has agreed, the P2 will never be an ARM as it will never compete.
Chip's already trying to remove instructions to make space for these new instructions. He's already pruning for silicon space.
And yet we don't have the highly important SERDES yet. Not even a real discussion yet.
Something has to give...
1. SERDES does not happen
2. We don't see P2 this year
3. We stop this nonsense and move on to USB/SERDES
I really like it that you will internally increment the PC while SYS/YIELD is looping, that means there will be no need for the scheduler to increment it in the WIDE.
I think that is a typo ? I think Chip was pausing the PC during the yield/pause. and then using INC to release/unpause via the scheduler.
I really like it that you will internally increment the PC while SYS/YIELD is looping, that means there will be no need for the scheduler to increment it in the WIDE.
What is the latest on saving/restoring the task state, and single stepping?
I didn't know I was going to increment the PC while SYS/YIELD looped, but that could be done. All we need is a simple instruction to kick any PC forward: INCPC D/#. Each PC has its own incrementer that can be triggered. This might let us avoid doing a save/restore in some cases.
Last night I added PTRX/PTRY to each task in preparation of doing the task save/restore circuitry, so it's coming in the next day, or two.
I think instruction names need to avoid being common words that people might want to use in their code. Instead of SYS or YIELD, maybe something like TSTALL would be good.
I didn't know I was going to increment the PC while SYS/YIELD looped, but that could be done. All we need is a simple instruction to kick any PC forward: INCPC D/#. Each PC has its own incrementer that can be triggered. This might let us avoid doing a save/restore in some cases.
That's what I thought you meant, INCPC D/# unpauses the paused thread. I think Bill misstyped something.
I think instruction names need to avoid being common words that people might want to use in their code. Instead of SYS or YIELD, maybe something like TSTALL would be good.
I'm really looking forward to being able to write large hub exec programs that can be multi-threaded. To make tools that work on the chip, this is really helpful.
This notion of old things being thrown away has been really overstated.
We will get to SERDES very soon.
We are also working on an FPGA board that will have lots of support circuits which will allow people to really exercise the chip. I want to get a few months of that underway to suss out any warts before we take the silicon approach again. This will push things out a little further, but will ensure that the next try is very likely a "go". It's so expensive and time-consuming to go through that process that I'd like to minimize the chance of another false start. I'm hoping we'll have chips by the end of the year.
Prior to November, ozpropdev showed us just what could be done using the simple 4 tasks available in a cog. This was fantastic.
Since then, great strides forward with HUBEXEC have been made.
However, we have added all sorts of help to the multi-tasking and now multi-threading. I am totally lost as to where it is now, and I am not interested.
Exactly. You are not interested, that does not mean others are also not interested.
No-one has put forward an argument of where this is actually going to be used, or how many extra P2 chips this is going to sell for Parallax. there is all sorts of "fluffy" ideas about what can be done with this. Originally I thought it was the GCC guys pushing for this, but they seem to be against it to.
See post #5858 for what extra capabilities we get, and some markets it may open up.
See post #5862 for an exact usage case, which I think many will use. P2 makes tcp/ip and usb very viable.
Bill seems the only one with any ideas, and this seems to be for a full-blown micro computer system.
There have been many other ideas, by many people
- some were great, but far to out of possible scope for P2
- some ideas were simply bad
- some got implemented earlier
- P2 is finally gelling there, and there are fewer "low hanging fruit" to think of
Having said that, Chip will soon start on SERDES, and he may do something about recycling wasted hub slots.
Time to put your cards on the table guys...
To those that are pushing for these additions (pre-emptive multi-tasking or is it multi-threading - I don't even understand anymore)...
1. How many P1's have you used/sold ? MANY
2. What is your plan to use these features on P2 ? See #5862
3. How many additional P2's are you going to use/sell if you get these pre-emptive features ? Will find out! Definitely more than without the features!
4. How many P2's will you use/sell without these pre-emptive features ? Not nearly as many, as MANY markets/applications would not be possible
It still continues to eat up silicon and time. some, but not as much as those opposed to change would make it out to be
It's fast becoming a monster aimed at competing with ARM etc. DUH! In microcontroller space, that was part of the idea! - but not in tablet space
It's a microcontroller.
Everyone has agreed, the P2 will never be an ARM as it will never compete. May I respectfully suggest you speak for yourself? And not everyone?
Chip's already trying to remove instructions to make space for these new instructions. He's already pruning for silicon space.
And yet we don't have the highly important SERDES yet. Not even a real discussion yet.
Something has to give...
1. SERDES does not happen
2. We don't see P2 this year
3. We stop this nonsense and move on to USB/SERDES
About SERDES and USB being very important, we totally agree.
This is not nonsese.
FYI, I would appreciate it if you would make technical arguments, instead of "I don't need it, no one else does" / "it's not like the P1" / "the sky is falling"
When I make an argument, it is based on technical data, which I provide.
FYI, other microcontrollers - be they ARM, PIC, AVR, TI - often have RTOS's grafted on, precisely to handle many threads of execution.
With cog / tasks / threads P2 will do it FAR better, far more efficiently than any of them. Much lower context switch times. A good thing.
When I proposed it, I assumed the scheduler would increment the PC. It seemed simpler
I can't comment on INCPC, as I am not sure what you currently intend for saving/restoring the task state. Frankly, instead of starving a task for cycles, I think the scheduler having
PAUSE #tasknum ' saves state in WIDE scheduler can access
RUN #tasknum ' restores state from WIDE scheduler can access
may be easier to explain to people, and may make the scheduler more readable. It can (for the debugger) also have
STEP #tasknum ' can only be invoked on a PAUSE'd task, completes one instruction, scheduler task auto-loops on it until task being stepped has finished the instruction
Mind you, I am perfectly fine with starving a task, then giving it cycles to re-enable it if it is easier to implement, or takes less gates.
Hmm.. my only issue with TSTALL is that it does more than stall the task.
How about
SCHED reg,#0..511
SCHED is short for "schedule"
And of course all of the above could have a 'T' prefix... TPAUSE/TRUN/TSTEP/TSCHED
I didn't know I was going to increment the PC while SYS/YIELD looped, but that could be done. All we need is a simple instruction to kick any PC forward: INCPC D/#. Each PC has its own incrementer that can be triggered. This might let us avoid doing a save/restore in some cases.
Last night I added PTRX/PTRY to each task in preparation of doing the task save/restore circuitry, so it's coming in the next day, or two.
I think instruction names need to avoid being common words that people might want to use in their code. Instead of SYS or YIELD, maybe something like TSTALL would be good.
To those that are pushing for these additions (pre-emptive multi-tasking or is it multi-threading - I don't even understand anymore)...
There are many things being discussed, multi-threading (Full SW swap) is just one aspect.
To me, good Swap for Debug visibility is more important, but multi-threading comes with that & I can see benefits of that
I believe Chip has pretty much done this.
A couple of other items are not new features, but more clean ups of existing features.
Removal of risks like corruption & lowering the impact of multi-cycle resource sharing, on already tested threads.
Items of not much sizzle, (when done, they will be more invisible than visible), but are important to have it just work.
I'm really looking forward to being able to write large hub exec programs that can be multi-threaded. To make tools that work on the chip, this is really helpful.
I don't really see what multi-threaded adds here. IMHO it is an over-complication imposed by PCs. It does not stop us having a P2 development environment with a simple OS. Heck, I run one on my P1 now although it lacks a few features like compiling (because I haven't converted Michael Park's compiler to run under my OS version which uses Kye's SD driver). However, with the SRAM I can run almost any CPM2 program on it (using ZiCog), including transferring files to/from the CPM file system to the FAT system.
The minis I worked on between 1974-1999 did not have multi-threading although they did have multi-partitions (which are amazingly similar to multiple cogs). These ran commercial online realtime software, some with 100+ terminals scattered throughout Australia.
So I guess I don't see why the current hubexec mode and 4 task cogs cannot already do what is required to run a self-contained P2 System.
This notion of old things being thrown away has been really overstated.
Agreed it's overstated, but it did nearly happen.
We will get to SERDES very soon.
Fantastic !!!
We are also working on an FPGA board that will have lots of support circuits which will allow people to really exercise the chip. I want to get a few months of that underway to suss out any warts before we take the silicon approach again. This will push things out a little further, but will ensure that the next try is very likely a "go". It's so expensive and time-consuming to go through that process that I'd like to minimize the chance of another false start. I'm hoping we'll have chips by the end of the year.
I wholeheartedly agree with your plan. The delay is completely necessary to try and fully evaluate the P2 before you commit to silicon. I do understand the costs and time that get wasted if the silicon fails.
There have been so many positive changes that have happened since November, and we really haven't begun to test them yet.
I am actually quite concerned that all the multi-tasking and multi-threading additions may introduce bugs into the non-tasking/threading modes. When it was simple I was not concerned with this. As long as the basic modes worked, we could live with any problems and work around them. I am not so sure anymore.
I didn't know I was going to increment the PC while SYS/YIELD looped, but that could be done. All we need is a simple instruction to kick any PC forward: INCPC D/#. Each PC has its own incrementer that can be triggered. This might let us avoid doing a save/restore in some cases.
Last night I added PTRX/PTRY to each task in preparation of doing the task save/restore circuitry, so it's coming in the next day, or two.
I think instruction names need to avoid being common words that people might want to use in their code. Instead of SYS or YIELD, maybe something like TSTALL would be good.
Having the ability for the scheduler task to choose to increment or not depending on the way the YIELD/TSTALL operation was called is useful. For example, if you used it as a dynamic breakpoint which you inserted into the user task code, you would want to be able to patch the original code back and return to the original PC (so no increment required there), but if it was used as a system call like PUTCH or as a true YIELD operation, you would want to return to the next intruction after it and then need the increment. The scheduler task now has full control over what it does in response to the user task if we give it an INCPC instruction which also avoids reading/writing the WIDE holding its task state each time.
As to some earlier comments know that I fully support these multi-tasking, multi-threading efforts as I see this as adding significant extra capabilities to the P2 which only increases its potential market. But I don't think this P2 development is necessarily a marketing driven project, it is basically engineering driven primarily from Chip's interests/desires and bouncing ideas off people like us. That is just how it is working out for better or worse.
No one can possibly know all the forecast sales numbers with or without this or that feature in the device. Sure P2 won't be competing with an ARM for power and price etc, but what if it could now open up new or unforeseen applications that could really benefit from P2's hard real-time I/O, video, signal processing capabilities etc and now allow larger/richer control applications and existing C code or people used to RTOS etc to now come on over to the P2 with relative ease. That could really open P2 up to a lot of things. To me it would seem like a big effort try to just basically recreate a faster P1 and hard limit to 8 COGs again albeit at 200MHz or so, and thankfully that is not the case here. If it were then you might start to restrict yourself to the same customers/market as P1 already has. That is where all this tasking etc will help expand the target market for P2 and let new possibilities arise. The best thing is it doesn't need to be used if it seems foreign and some of the existing P1 people don't quite get it all yet. It is all fully optional and P2 can still look and operate quite a bit like a fast P1 when required.
Comments
Yes, which means you cannot use hyper-threading + threading, so still need a separation in terminology.
I do not mind the terminology of core, as that is effectively what the programmer 'sees', provided there is some distinction that it is hot a separate set of physical cores. It comes with some caveats.
The Silicon certainly should minimize those caveats, (and remove the nasty ones) but it cannot eliminate all of them.
So I think some (small) modifier is needed, vCore, (virtual-core) or fCore (Flexible core) or tsCore (time Sliced core) or sCore (Sliced Core) or mCore (time-mapped core).. or (add your own modifier here...)
...unitl you get to threads (or pthreads) at the next layer of abstraction, the software. Then you can throw out the definitions coined by Intel and talk about the lightweight implementations of software constructs that are similar to processes (which are sometimes called tasks) but you need to keep processes and processors distinct since one is software and one is hardware. And then, theads can run within processes on some systems..so you can have a thread running inside a process that is currently executing in a hyperthread within a core within a processor which could be one of many processors within a a server (or a systems) that can be a node or part of a node or part of a cluster within a node.....turtles all the way down!
So the whole thing is one confusing mess even when you read through the definitions and if you don't disambiguate (sorry, I've been Googling today) in your documentation and context, it doesn't matter what you call them. Introducing new terms sounds like a marketing trick and leads to more confusion until you learn the new terms.
You could go COG, VCOG, SoftCOG but that seems silly.
You could go COG, COG Slice and Code Slice but that is even sillier.
When I hear thread, but mind doesn't race to HyperThreads and Intel - I go to switched software execution units under the control of some scheduler...and usually, that scheduler is just another thread because it is switchable on most systems because it can get pre-empted. In our case, I don't think we can let the scheduler get preempted since it should run at the top level.
If you rename the task, then you need to rename the SETTASK instruction (which is a minor annoyance at this point).
You are over-complicating it, and your proposed terminology is far too similar, far too easy to confuse.
Much more confusing than cog/task/thread.
I can live with cog/vcog/thread, but the names you proposed confuse even me. Frankly, I find cog/task/thread to be the easiest and most logical.
Also, I see no reason to follow terminology Intel uses for some of its desktop processors, when software threading has been available - and called threads - for over 40 years.
C.W.
Over there in XMOS land they had chips with four processors (cores). Each capable of up to 8 hardware, time sliced, threads.
Some brilliant marketing guy at XMOS decided to advertise this as four "tiles" each supporting 8 "logical cores".
Makes me sick. Let's not go there.
except my last scream wasn't silent!!
{works best if John Cleese is the bartender}
Two engineers walk into a bar, the first one tells the bartender, "I'll have a Propeller" and the bartender gives him a drink.
The second engineer says, "I'll have a Propeller too."
Bartender say, "Sorry, you can't"
Second engineer, "But you just gave him one!"
Bartender, "No, I didn't."
Second engineer, "You most certainly did give him a Propeller!"
Bartender, "Why, yes, I certainly did!"
Second engineer, "Well! Then I'd like a Propeller too!"
Bartender, "No, sorry, don't know how to make one of those yet."
....
and so on......
C.W.
If that is the case, why did we have so many talking about preemptive multitasking, for the full-software-swap topic ?
Shouldn't they have all naturally used preemptive multi threading ?
A benefit of using just one of thread/task, is less confusion, so I would favour cog/vcog/thread
Umm... that one thread AT A TIME per task, with the thread being switched as needed by the scheduler.
Basically, see my example in #5862 in this task (hehe) that shows how I plan to use cogs/tasks/threads. I think you will find it to be a compelling case for why threads are very useful.
I'm sure we agree on the uses of a full software swap, and even on the need for Silicon support to make doing that safe and compact, no matter if it is called multitasking or multithreading up at that level.
I was trying to avoid the usage confusion that already exists (as shown on here) between what is SW-managed and what is HW-managed
This has my vote. Simple, clean.
Re: debug and read that task register.
Here's the thing. Right now, it's not exposed to the memory map. Part of the reason we would want read / write capability is to understand when the register is getting corrupted / written to out of hand., right?
Of course, this kind of thing happens when you expose it all to a running program, which the P2 does not do. So then, the debug is easy. Did the code run SETTASK? If so, then that's where the problem is. Done, next. Search between breakpoints, and look at self-modifying code.
We don't need read on SETTASK for debug purposes, and frankly if we can read it, then people will, and we are back to "seems like good data, until it isn't" and bad practice.
Finally, "reading registers is common on MCU's"
Yeah? So are a lot of things. This isn't a common MCU, and things can play out differently, etc... Again, those devices with all the registers exposed to the memory map, as opposed to instructions, can much more easily exhibit corruption through mistaken / subtle reads and writes. Having SETTASK work the way it does, factors that all out.
People that understand seem to quickly level set if the terms are identified in context. If you don't have any background it is confusing.
Although vcog sound more marketing than any of them, it does distinguish the result of a set task instruction.
I hope Chip reads through all this blather and gets the essence of the idea behind the YIELD or PREEMPT instruction that started this latest round of blather.
John Abshier
Correct, but there are other uses too.
I'm not following this debug method, as the TASK map may have been changed in another thread (eg a library)
A user may have no idea it is even happening, they see strange results...
Besides making solid sense for debug, reliable read of Slice mapping allows those being sliced to make decisions about how much resource they have to work with.
Code can be designed to be more reliable, as it can easily check it has the resource it expects/needs.
I'm following this logic even less ? - it is simply reading a register. there is no 'good data' aspect, it is just the value that is being used by the Slice Mapper.
Giving programmers accurate information is never bad practice.
If any user has a total aversion to reading this, or using it, then they can never read it.
Fully agree.
The silicon should manage the details, so the programmer does not have to (almost) clone a flip-flop, and a programmer should be able to design verifiable code.
I think this YIELD instruction is the way to go. It's flexible, simple, and robust.
It seems to me, too, that two things must happen in a 'yield':
1) A value must be written to a common register to signify and define the yield to the scheduler.
2) The instruction loops until the saved PC gets incremented by the scheduler.
I think any more behavior than this is superfluous and possibly overwrought.
By the way, to implement an instruction such as this is a little beyond nothing, in terms of Verilog code and gates.
The moment you read the register, you don't know if the data is good. It could have changed before the info can be acted on. The scheduler can very easily publish any data the task needs.
I vote for YIELD as bill described it.
Re: gentlemen's agreement.
The alternative is to not use the tasking capability. We don't have supervisor modes, which would be needed to make the agreements "binding"
In the simplest case, default to COG code, not tasks; otherwise, tasks can impact tasks.
I am glad you like it, I thought it would be easy to implement.
I think "SYS" may be better than yield, as it does much more. Here are some pasm ideas:
SYS task1reg, #EXIT ' tells the scheduler the thread is finished
SYS task1reg, #YIELD ' tells the scheduler the thread yields the time left in its normal time slice
SYS task1reg,#BREAK22 ' tells the scheduler that the thread has hit breakpoint 22
SYS task1reg,#WAITFOR+signal ' tells the scheduler to suspend the thread until 'signal' is set, where signal is a small positive integer (say 1..127)
SYS task1reg,#PUTCH
SYS task1reg,#PUTS
The reason for specifying the taskXregister is in case we want to manage multiple tasks running threads, or need more flexibility.
I really like it that you will internally increment the PC while SYS/YIELD is looping, that means there will be no need for the scheduler to increment it in the WIDE.
What is the latest on saving/restoring the task state, and single stepping?
Since then, great strides forward with HUBEXEC have been made.
However, we have added all sorts of help to the multi-tasking and now multi-threading. I am totally lost as to where it is now, and I am not interested.
No-one has put forward an argument of where this is actually going to be used, or how many extra P2 chips this is going to sell for Parallax. there is all sorts of "fluffy" ideas about what can be done with this. Originally I thought it was the GCC guys pushing for this, but they seem to be against it to.
Bill seems the only one with any ideas, and this seems to be for a full-blown micro computer system.
Time to put your cards on the table guys...
To those that are pushing for these additions (pre-emptive multi-tasking or is it multi-threading - I don't even understand anymore)...
1. How many P1's have you used/sold ?
2. What is your plan to use these features on P2 ?
3. How many additional P2's are you going to use/sell if you get these pre-emptive features ?
4. How many P2's will you use/sell without these pre-emptive features ?
It still continues to eat up silicon and time.
It's fast becoming a monster aimed at competing with ARM etc.
It's a microcontroller.
Everyone has agreed, the P2 will never be an ARM as it will never compete.
Chip's already trying to remove instructions to make space for these new instructions. He's already pruning for silicon space.
And yet we don't have the highly important SERDES yet. Not even a real discussion yet.
Something has to give...
1. SERDES does not happen
2. We don't see P2 this year
3. We stop this nonsense and move on to USB/SERDES
I think that is a typo ? I think Chip was pausing the PC during the yield/pause. and then using INC to release/unpause via the scheduler.
I didn't know I was going to increment the PC while SYS/YIELD looped, but that could be done. All we need is a simple instruction to kick any PC forward: INCPC D/#. Each PC has its own incrementer that can be triggered. This might let us avoid doing a save/restore in some cases.
Last night I added PTRX/PTRY to each task in preparation of doing the task save/restore circuitry, so it's coming in the next day, or two.
I think instruction names need to avoid being common words that people might want to use in their code. Instead of SYS or YIELD, maybe something like TSTALL would be good.
That's what I thought you meant, INCPC D/# unpauses the paused thread. I think Bill misstyped something.
Or TPAUSE ?
I'm really looking forward to being able to write large hub exec programs that can be multi-threaded. To make tools that work on the chip, this is really helpful.
This notion of old things being thrown away has been really overstated.
We will get to SERDES very soon.
We are also working on an FPGA board that will have lots of support circuits which will allow people to really exercise the chip. I want to get a few months of that underway to suss out any warts before we take the silicon approach again. This will push things out a little further, but will ensure that the next try is very likely a "go". It's so expensive and time-consuming to go through that process that I'd like to minimize the chance of another false start. I'm hoping we'll have chips by the end of the year.
Exactly. You are not interested, that does not mean others are also not interested.
See post #5858 for what extra capabilities we get, and some markets it may open up.
See post #5862 for an exact usage case, which I think many will use. P2 makes tcp/ip and usb very viable.
There have been many other ideas, by many people
- some were great, but far to out of possible scope for P2
- some ideas were simply bad
- some got implemented earlier
- P2 is finally gelling there, and there are fewer "low hanging fruit" to think of
Having said that, Chip will soon start on SERDES, and he may do something about recycling wasted hub slots.
About SERDES and USB being very important, we totally agree.
This is not nonsese.
FYI, I would appreciate it if you would make technical arguments, instead of "I don't need it, no one else does" / "it's not like the P1" / "the sky is falling"
When I make an argument, it is based on technical data, which I provide.
FYI, other microcontrollers - be they ARM, PIC, AVR, TI - often have RTOS's grafted on, precisely to handle many threads of execution.
With cog / tasks / threads P2 will do it FAR better, far more efficiently than any of them. Much lower context switch times. A good thing.
When I proposed it, I assumed the scheduler would increment the PC. It seemed simpler
I can't comment on INCPC, as I am not sure what you currently intend for saving/restoring the task state. Frankly, instead of starving a task for cycles, I think the scheduler having
PAUSE #tasknum ' saves state in WIDE scheduler can access
RUN #tasknum ' restores state from WIDE scheduler can access
may be easier to explain to people, and may make the scheduler more readable. It can (for the debugger) also have
STEP #tasknum ' can only be invoked on a PAUSE'd task, completes one instruction, scheduler task auto-loops on it until task being stepped has finished the instruction
Mind you, I am perfectly fine with starving a task, then giving it cycles to re-enable it if it is easier to implement, or takes less gates.
Hmm.. my only issue with TSTALL is that it does more than stall the task.
How about
SCHED reg,#0..511
SCHED is short for "schedule"
And of course all of the above could have a 'T' prefix... TPAUSE/TRUN/TSTEP/TSCHED
There are many things being discussed, multi-threading (Full SW swap) is just one aspect.
To me, good Swap for Debug visibility is more important, but multi-threading comes with that & I can see benefits of that
I believe Chip has pretty much done this.
A couple of other items are not new features, but more clean ups of existing features.
Removal of risks like corruption & lowering the impact of multi-cycle resource sharing, on already tested threads.
Items of not much sizzle, (when done, they will be more invisible than visible), but are important to have it just work.
The minis I worked on between 1974-1999 did not have multi-threading although they did have multi-partitions (which are amazingly similar to multiple cogs). These ran commercial online realtime software, some with 100+ terminals scattered throughout Australia.
So I guess I don't see why the current hubexec mode and 4 task cogs cannot already do what is required to run a self-contained P2 System.
Agreed it's overstated, but it did nearly happen. Fantastic !!! I wholeheartedly agree with your plan. The delay is completely necessary to try and fully evaluate the P2 before you commit to silicon. I do understand the costs and time that get wasted if the silicon fails.
There have been so many positive changes that have happened since November, and we really haven't begun to test them yet.
I am actually quite concerned that all the multi-tasking and multi-threading additions may introduce bugs into the non-tasking/threading modes. When it was simple I was not concerned with this. As long as the basic modes worked, we could live with any problems and work around them. I am not so sure anymore.
Having the ability for the scheduler task to choose to increment or not depending on the way the YIELD/TSTALL operation was called is useful. For example, if you used it as a dynamic breakpoint which you inserted into the user task code, you would want to be able to patch the original code back and return to the original PC (so no increment required there), but if it was used as a system call like PUTCH or as a true YIELD operation, you would want to return to the next intruction after it and then need the increment. The scheduler task now has full control over what it does in response to the user task if we give it an INCPC instruction which also avoids reading/writing the WIDE holding its task state each time.
As to some earlier comments know that I fully support these multi-tasking, multi-threading efforts as I see this as adding significant extra capabilities to the P2 which only increases its potential market. But I don't think this P2 development is necessarily a marketing driven project, it is basically engineering driven primarily from Chip's interests/desires and bouncing ideas off people like us. That is just how it is working out for better or worse.
No one can possibly know all the forecast sales numbers with or without this or that feature in the device. Sure P2 won't be competing with an ARM for power and price etc, but what if it could now open up new or unforeseen applications that could really benefit from P2's hard real-time I/O, video, signal processing capabilities etc and now allow larger/richer control applications and existing C code or people used to RTOS etc to now come on over to the P2 with relative ease. That could really open P2 up to a lot of things. To me it would seem like a big effort try to just basically recreate a faster P1 and hard limit to 8 COGs again albeit at 200MHz or so, and thankfully that is not the case here. If it were then you might start to restrict yourself to the same customers/market as P1 already has. That is where all this tasking etc will help expand the target market for P2 and let new possibilities arise. The best thing is it doesn't need to be used if it seems foreign and some of the existing P1 people don't quite get it all yet. It is all fully optional and P2 can still look and operate quite a bit like a fast P1 when required.
We either get the tasking solid, or not. Given where we are, I vote for solid. I think we are very close.
I like what I see. Good post.
Is this the room for an argument?