As far I know the only thing extra that is needed is a way of retreiving registers from another thread and it can all be done in software.
If 'registers' includes PC and flags, then yes, I would agree.
I think it is ok (and maybe even desirable) if any across-task task-register access is limited to a 'frozen' task (ie master task has used Give Me 100% opcode )
I see this less of a 'Multitasking' problem, and more of a simple access one, with the main benefit being Debug support.
What is additionally needed to realize single-stepping is just a way to read back and set a task's Z/C/lockpending/PC bits. The single step can be achieved via SETTASK, where one slot is given and then taken away by another task, so that the target task runs one instruction at a time, between context dumps.
On second thought, it would be easy to make an instruction stall the task loop for one instruction while a random task is inserted into the pipeline: TINSERT D/#.
PLEASE MAKE IT STOP!!! ... It is just me wanting to thinker with these things for my own pleasure
And that is a compelling reason to discuss the possibilities. We're all here because we like these kinds of things and are challenged and intrigued by them.
.. The single step can be achieved via SETTASK, where one slot is given and then taken away by another task, so that the target task runs one instruction at a time, between context dumps.
That sounds good, so the present SETTASK task mapping engine, can co-operate nicely with the pipelines & SW, to give a a predictable phase for the one-slot-step ?
That sounds good, so the present SETTASK task mapping engine, can co-operate nicely with the pipelines & SW, to give a a predictable phase for the one-slot-step ?
SETTASK could be used, but read my edit to post #64. There's a cleaner way possible.
SETTASK could be used, but read my edit to post #64. There's a cleaner way possible.
Cleaner is always good
Did you mean this edit ? On second thought, it would be easy to make an instruction stall the task loop for one instruction while a random task is inserted into the pipeline: TINSERT D/#.
Did you mean this edit ? On second thought, it would be easy to make an instruction stall the task loop for one instruction while a random task is inserted into the pipeline: TINSERT D/#.
I'm not quite seeing single step here ?
It would let you execute one instruction from a task so that you could extract the data afterwards, show some data, then do it again.
There is only one or two simple things missing to facilitate all this.
if there is a simple way to appease the demand for single-step debugging then by all means please go for it.
Could this method be made to work with all instructions including the "prefix" instructions that have an effect on the next instruction(s)? Or is it limited to the normal kind of instructions?
if there is a simple way to appease the demand for single-step debugging then by all means please go for it.
Could this method be made to work with all instructions including the "prefix" instructions that have an effect on the next instruction(s)? Or is it limited to the normal kind of instructions?
Ahle has mentioned he'd like to make a preemptive multitasking system on the Prop2, so I've been thinking about what might be needed in the chip to facilitate it.
I can make an instruction to swap a task's Z/C/PC with a D register (SWTASK D,S/#, where S=task), but there are some limitations to what can be done with that:
REPS/REPD/TLOCK state information will be lost and not restored later, so these instructions cannot be used in code that is going to be interrupted by SWTASK. Is this a showstopper?
Also, I don't see how each task's PTRA/PTRB can be saved and restored efficiently. Now TLOCK could get us around this, but TLOCK cannot survive a SWTASK.
The trouble with TLOCK is that the instruction can only set a hidden flag to lock the process next time it shows up in stage 0 of the pipeline. If a task executes a TLOCK, but the next task in the pipeline does a SWTASK on that task, the prior task's TLOCK is interrupted before it engages. Had it engaged without interruption, it could have used REPS/REPD safely. I could store that TLOCK-pending flag in bit 29 of D (bits 31/30 = Z/C) , so that it could be restored later. I think I must do this. This way, INDA/INDB/REPS/REPD may be executed in TLOCK'd sections of code that don't get interrupted, but this increases the time granularity of task switching.
What else is to be considered regarding preemptive multitasking?
Would it be possible to "execute" TLOCK in the first pipeline stage, since it has no operands? If so, would this ensure that no other tasks get loaded into the pipeline? That would partially resolve the SWTASK issue, as well as tracking the hidden flag (because no other task would be able to mess with it).
Would it be possible to "execute" TLOCK in the first pipeline stage, since it has no operands? If so, would this ensure that no other tasks get loaded into the pipeline? That would partially resolve the SWTASK issue, as well as tracking the hidden flag (because no other task would be able to mess with it).
I've been thinking the same thing. It would simplify starting and stopping a task (as in single-stepping), as it would get rid of the need to track any state information about whether a TLOCK was pending.
Once I get my setup running again (it's broken now due to some changes I made to support starting cogs in hub exec mode), I'm going to look into this.It may not be possible because of late-arriving data from the instruction cache, but we'll see soon. If this would work, it would be better. If it doesn't, it's not that big of a deal - there'd just be an extra bit to track, along with Z/C/PC.
I've been thinking the same thing. It would simplify starting and stopping a task (as in single-stepping), as it would get rid of the need to track any state information about whether a TLOCK was pending.
Once I get my setup running again (it's broken now due to some changes I made to support starting cogs in hub exec mode), I'm going to look into this.It may not be possible because of late-arriving data from the instruction cache, but we'll see soon. If this would work, it would be better. If it doesn't, it's not that big of a deal - there'd just be an extra bit to track, along with Z/C/PC.
I usually don't find myself reading Heater's posts and disagreeing. Odd. Although, I would like to back him up and say that Mac/Windows haven't had cooperative multitasking at the process level in over a decade. Support for mapping hardware preemption from processes onto user threads is OS-dependent, but the preemption mechanism is fully utilized by all modern desktop and mobile OS'es. Speaking of mechanisms, I agree that hardware support for pre-emptive multitasking is really problematic.
But if I can suggest a mechanism that might accommodate the request, it would be for one cog to pause and then inspect the full state of another cog. This would allow one "supervisor" cog to use the built-in hardware tasks (or waitpin, or whatever you want) as a timer to preemptively freeze another cog's state (process), save it out to hub RAM or SDRAM, and swap in the state of a different frozen cog state. It would allow the definition of "running state" (whole cog?, cog and aux?) to be defined by the software that handles the swap. Software defined preemption. Feels Propeller-y to me.
As a very big bonus the pause and inspect mechanisms, depending on how they work, could also allow single-step debugging of one cog by another. Maybe this already exists in the instruction set? I'm still digesting everything that's been discussed in the past 6 months.
Still, this may be very difficult to achieve because it means being able to pause and examine *every* piece of a cog's state. That's a death by a thousand cuts, and anything that gets missed becomes a cog capability that isn't available for preemption.
This thread is the exact reason why I am afraid of saying much in the main thread. A few words of mine all of a sudden triggers a massive thread with so many replays that I don't have the time to read them all. First of all I want to be clear that I never asked Chip to implement any new instructions or hw support regarding PM. My question was wheter there was a way of getting registers from another task. The reason I did not say why I wanted to know, was because I was afraid of something like this thread would happen.
Chip asked what I wanted it for and I had to reply. As I have said in the main thread... The P2 does not need a microkernel, interrupts , hw support for scheduling or memory management in hw... It is just me wanting to thinker with these things for my own pleasure. And I want to do it all in SW. As far I know the only thing extra that is needed is a way of retreiving registers from another thread and it can all be done in software.
Btw, I feel so young when all you gurus goes on about things that you did 10-20 years before I was born. But I think respect goes both ways. I have a lot to learn from you, and maybe my energy and creativity can give something to the community as well. BUT PLEASE MAKE THIS THREAD DIE!!!
/Johannes
I completely agree with Ahle2. Everything necessary for preemptive multitasking and single-step debugging already can be done easily in software with how tasks work now. The only instructions necessary (which were already added before this thread started, or at least significantly discussed) are GETTASK, TLOCK, TFREE (or whatever they're called now) and maybe a few other getter and setter opcodes. The propeller shouldn't have any hardware features that can be easily done in software. The kernel/debugger whatever can just be a low priority task that normally sits in a passcnt but tlocks ever once in a while and does its stuff.
But if I can suggest a mechanism that might accommodate the request, it would be for one cog to pause and then inspect the full state of another cog. This would allow one "supervisor" cog to use the built-in hardware tasks (or waitpin, or whatever you want) as a timer to preemptively freeze another cog's state (process), save it out to hub RAM or SDRAM, and swap in the state of a different frozen cog state.
That's what Chip is looking into. ( see #74 )
You can already do the Freeze another cog task, in an atomic way.
it just needs the ability to read/write the full state information, (Z/C/PC/??) across cogs tasks, to be added.
That's what Chip is looking into. ( see #74 )
You can already do the Freeze another cog, in an atomic way.
it just needs the ability to read/write the full state information, (Z/C/PC/??) across cogs, to be added.
Oh hey, look at that! Really glad that solution is being investigated, it feels like a minimal mechanism that will add a lot of flexibility. And cross-cog debugging is going to be a lot more reliable than having a debug task in the same cog as the running code. I'm sure we've all experienced code that worked with the debugging code enabled, but failed without it.
Oh hey, look at that! Really glad that solution is being investigated, it feels like a minimal mechanism that will add a lot of flexibility. And cross-cog debugging is going to be a lot more reliable than having a debug task in the same cog as the running code. I'm sure we've all experienced code that worked with the debugging code enabled, but failed without it.
Oops, I misread your post, and mangled my reply a little.
To clarify, Chip is talking about freeze of a task and using another task as the debug stub.
If that frozen task can have the PC & any flags read/written it can be debugged/single stepped, and also someone could replace all of the code in that task, if they wanted to.
I do not think the data paths exist for cross-cog debugging, but task mapping can lower the impact of same-cog debug.
I do not think the data paths exist for cross-cog debugging, but task mapping can lower the impact of same-cog debug.
Yes, I was thinking about how the chip layout would have to be to allow one cog to stop execution and then read the entire register file of another cog and it seemed like it would be too big an impact on the design.
Comments
Agreed.
If 'registers' includes PC and flags, then yes, I would agree.
I think it is ok (and maybe even desirable) if any across-task task-register access is limited to a 'frozen' task (ie master task has used Give Me 100% opcode )
I see this less of a 'Multitasking' problem, and more of a simple access one, with the main benefit being Debug support.
On second thought, it would be easy to make an instruction stall the task loop for one instruction while a random task is inserted into the pipeline: TINSERT D/#.
And that is a compelling reason to discuss the possibilities. We're all here because we like these kinds of things and are challenged and intrigued by them.
That sounds good, so the present SETTASK task mapping engine, can co-operate nicely with the pipelines & SW, to give a a predictable phase for the one-slot-step ?
SETTASK could be used, but read my edit to post #64. There's a cleaner way possible.
Cleaner is always good
Did you mean this edit ?
On second thought, it would be easy to make an instruction stall the task loop for one instruction while a random task is inserted into the pipeline: TINSERT D/#.
I'm not quite seeing single step here ?
It would let you execute one instruction from a task so that you could extract the data afterwards, show some data, then do it again.
There is only one or two simple things missing to facilitate all this.
if there is a simple way to appease the demand for single-step debugging then by all means please go for it.
Could this method be made to work with all instructions including the "prefix" instructions that have an effect on the next instruction(s)? Or is it limited to the normal kind of instructions?
It would work with all instructions.
That's all sounding good, and I like the word 'simple'
Would it be possible to "execute" TLOCK in the first pipeline stage, since it has no operands? If so, would this ensure that no other tasks get loaded into the pipeline? That would partially resolve the SWTASK issue, as well as tracking the hidden flag (because no other task would be able to mess with it).
I've been thinking the same thing. It would simplify starting and stopping a task (as in single-stepping), as it would get rid of the need to track any state information about whether a TLOCK was pending.
Once I get my setup running again (it's broken now due to some changes I made to support starting cogs in hub exec mode), I'm going to look into this.It may not be possible because of late-arriving data from the instruction cache, but we'll see soon. If this would work, it would be better. If it doesn't, it's not that big of a deal - there'd just be an extra bit to track, along with Z/C/PC.
If that instruction stall pipeline cash so only this instructions info is used -- That maybe can work
But if I can suggest a mechanism that might accommodate the request, it would be for one cog to pause and then inspect the full state of another cog. This would allow one "supervisor" cog to use the built-in hardware tasks (or waitpin, or whatever you want) as a timer to preemptively freeze another cog's state (process), save it out to hub RAM or SDRAM, and swap in the state of a different frozen cog state. It would allow the definition of "running state" (whole cog?, cog and aux?) to be defined by the software that handles the swap. Software defined preemption. Feels Propeller-y to me.
As a very big bonus the pause and inspect mechanisms, depending on how they work, could also allow single-step debugging of one cog by another. Maybe this already exists in the instruction set? I'm still digesting everything that's been discussed in the past 6 months.
Still, this may be very difficult to achieve because it means being able to pause and examine *every* piece of a cog's state. That's a death by a thousand cuts, and anything that gets missed becomes a cog capability that isn't available for preemption.
I completely agree with Ahle2. Everything necessary for preemptive multitasking and single-step debugging already can be done easily in software with how tasks work now. The only instructions necessary (which were already added before this thread started, or at least significantly discussed) are GETTASK, TLOCK, TFREE (or whatever they're called now) and maybe a few other getter and setter opcodes. The propeller shouldn't have any hardware features that can be easily done in software. The kernel/debugger whatever can just be a low priority task that normally sits in a passcnt but tlocks ever once in a while and does its stuff.
electrodude
That's what Chip is looking into. ( see #74 )
You can already do the Freeze another cog task, in an atomic way.
it just needs the ability to read/write the full state information, (Z/C/PC/??) across cogs tasks, to be added.
Oh hey, look at that! Really glad that solution is being investigated, it feels like a minimal mechanism that will add a lot of flexibility. And cross-cog debugging is going to be a lot more reliable than having a debug task in the same cog as the running code. I'm sure we've all experienced code that worked with the debugging code enabled, but failed without it.
Oops, I misread your post, and mangled my reply a little.
To clarify, Chip is talking about freeze of a task and using another task as the debug stub.
If that frozen task can have the PC & any flags read/written it can be debugged/single stepped, and also someone could replace all of the code in that task, if they wanted to.
I do not think the data paths exist for cross-cog debugging, but task mapping can lower the impact of same-cog debug.