What happens when you're using multiple HW tasks and/or using the new threading capability with code that uses GETMULL/H, GETDIVQ/R, etc? Do we now have a separate multiplier for each HW task? Even if we do, what happens if the scheduler decides to switch threads just before one of these instructions? Won't the new thread get the result of an instruction initiated by the old thread if they're both using that HW resource?
What happens when you're using multiple HW tasks and/or using the new threading capability with code that uses GETMULL/H, GETDIVQ/R, etc? Do we now have a separate multiplier for each HW task? Even if we do, what happens if the scheduler decides to switch threads just before one of these instructions? Won't the new thread get the result of an instruction initiated by the old thread if they're both using that HW resource?
This is where you must use TLOCK/TFREE in the thread or task. That way, you can hog the multiplier/divider/squarerooter/CORDIC/other for the umpteen cycles it takes to load it, wait, and get the result out of it. TLOCK/TFREE make sharing of these singular resources very simple.
Consider that once a thread is TLOCK'd there is no possibility of an interruption, until it executes TFREE, at which point the slots resuming their cycling. So, arbitration occurs at the task level by a task simply taking control for a brief period. It's a first-come-first-serve scenario that requires no other arbitration mechanism. It's serendipitously simple.
What happens when you're using multiple HW tasks and/or using the new threading capability with code that uses GETMULL/H, GETDIVQ/R, etc? Do we now have a separate multiplier for each HW task? Even if we do, what happens if the scheduler decides to switch threads just before one of these instructions? Won't the new thread get the result of an instruction initiated by the old thread if they're both using that HW resource?
The docs from the release at the end of January still indicate those as single resources per cog.
Tips for coding multi-tasking programs
--------------------------------------
While all tasks in a multi-tasking program can execute atomic instructions without any inter-task conflict,
remember that there's only one of each of the following cog resources and only one task can use it at a time:
Singular resource Some related instructions that could cause conflicts
----------------------------------------------------------------------------------------------------------
WIDE registers RDBYTEC/RDWORDC/RDLONGC/RDWIDEC/RDWIDE/WRWIDE/SETWIDE/SETWIDZ
INDA FIXINDA/FIXINDS/SETINDA/SETINDS / INDA modification via INDA usage
INDB FIXINDB/FIXINDS/SETINDB/SETINDS / INDB modification via INDB usage
PTRA SETPTRA/ADDPTRA/SUBPTRA / PTRA modification via RDxxxx/WRxxxx
PTRB SETPTRB/ADDPTRB/SUBPTRB / PTRB modification via RDxxxx/WRxxxx
PTRX SETPTRX/ADDPTRX/SUBPTRX/CALLX/RETX/PUSHX/POPX / PTRX modification via RDAUXx/WRAUXx
PTRY SETPTRY/ADDPTRY/SUBPTRY/CALLY/RETY/PUSHY/POPY / PTRY modification via RDAUXx/WRAUXx
ACCA SETACCA/SETACCS/MACA/SARACCA/SARACCS/CLRACCA/CLRACCS
ACCB SETACCB/SETACCS/MACB/SARACCB/SARACCS/CLRACCB/CLRACCS
32x32 multiplier MUL32/MUL32U
64/32 divider FRAC/DIV32/DIV32U/DIV64/DIV64U/DIV64D
64-bit square rooter SQRT64/SQRT32
CORDIC computer QSINCOS/QARCTAN/QROTATE/QLOG/QEXP/SETQI/SETQZ
SERA SETSERA/SERINA/SEROUTA
SERB SETSERB/SERINB/SEROUTB
XFR SETXFR
VID WAITVID/SETVID/SETVIDY/SETVIDI/SETVIDQ/POLVID
Block repeater REPS/REPD
CTRA SETCTRA/SETWAVA/SETPHSA/ADDPHSA/SUBPHSA/GETPHZA/POLCTRA/CAPCTRA/SYNCTRA
CTRB SETCTRB/SETWAVB/SETPHSB/ADDPHSB/SUBPHSB/GETPHZB/POLCTRB/CAPCTRB/SYNCTRB
PIX (not usable in multi-tasking, requires single-task timing)
Looks like a good reason to add those extra locks I requested...
You could wrap the usage of those instructions within a lock.
Hmmm, I wonder if we need intra-cog locks...that way we wouldn't need to have locks used internal to a COG incur a hubop...
What happens when you're using multiple HW tasks and/or using the new threading capability with code that uses GETMULL/H, GETDIVQ/R, etc? Do we now have a separate multiplier for each HW task? Even if we do, what happens if the scheduler decides to switch threads just before one of these instructions? Won't the new thread get the result of an instruction initiated by the old thread if they're both using that HW resource?
Yes I imagine they probably will collide in such a case where they are sharing a common resource. That is why we require protection methods such as task locking when doing multitasking/multithreading. This can be done either at the time slicing level with TLOCK or within the scheduler task itself using a software method to suspend user thread switching temporarily until the result is complete and consumed by the thread using it. Multithreading always adds its own set of complexities when you need to protect shared resources.
This is where you must use TLOCK/TFREE in the thread or task. That way, you can hog the multiplier/divider/squarerooter/CORDIC/other for the umpteen cycles it takes to load it, wait, and get the result out of it. TLOCK/TFREE make sharing of these singular resources very simple.
Consider that once a thread is TLOCK'd there is no possibility of an interruption, until it executes TFREE, at which point the slots resuming their cycling.
Thanks. I guess that probably means another code generation option in GCC to indicate if you expect the code to run in a threaded or tasking environment. You wouldn't want the TLOCK/TFREE instructions to be generated in single tasking code since it would waste both space and time. What would this mean for the thread switcher? Would it be prevented from stopping a thread that was TLOCKed?
What happens when you're using multiple HW tasks and/or using the new threading capability with code that uses GETMULL/H, GETDIVQ/R, etc? Do we now have a separate multiplier for each HW task? Even if we do, what happens if the scheduler decides to switch threads just before one of these instructions? Won't the new thread get the result of an instruction initiated by the old thread if they're both using that HW resource?
Other instructions like SETWIDE,SETWIDZ,RDWIDE,WRWIDE have to be used carefully with multi-tasking too.
Yes I imagine they probably will collide in such a case where they are sharing a common resource. That is why we require protection methods such as task locking when doing multitasking/multithreading. This can be done either at the time slicing level with TLOCK or within the scheduler task itself using a software method to suspend user thread switching temporarily until the result is complete and consumed by the thread using it. Multithreading always adds its own set of complexities when you need to protect shared resources.
Okay, this introduces yet another code generation option:
1) single tasking code with no worries about shared HW resources
2) multi-tasking code that uses TLOCK/TFREE to manage access to shared HW resources
3) multi-threaded code that uses a software in the scheduler to manage access to shared resources
I suppose some combination of 2 and 3 would be necessary if you have HW tasks running in addition to the scheduler and the time-sliced task. Maybe it would be easier if the HW self-looped for the SET instructions as well as the GET instructions. If a task/thread sets up a multiply or divide then every other task/thread would self loops on the SET instructions until the operation completes in the first task/thread. Would that work?
Other instructions like SETWIDE,SETWIDZ,RDWIDE,WRWIDE have to be used carefully with multi-tasking too.
The WIDEs will be saved and restored under preemptive multitasking, but SETWIDE's configuration (address of WIDEs) ought to be universal for a particular multitasking scheme.
Okay, this introduces yet another code generation option:
1) single tasking code with no worries about shared HW resources
2) multi-tasking code that uses TLOCK/TFREE to manage access to shared HW resources
3) multi-threaded code that uses a software in the scheduler to manage access to shared resources
I suppose some combination of 2 and 3 would be necessary if you have HW tasks running in addition to the scheduler and the time-sliced task. Maybe it would be easier if the HW self-looped for the SET instructions as well as the GET instructions. If a task/thread sets up a multiply or divide then every other task/thread would self loops on the SET instructions until the operation completes in the first task/thread. Would that work?
Cases 2 and 3 are the same. They are tasks. So, you only need to know if other tasks are running, and, if so, issue TLOCK/TFREE around singular resource usages.
Cases 2 and 3 are the same. They are tasks. So, you only need to know if other tasks are running, and, if so, issue TLOCK/TFREE around singular resource usages.
But what happens if one thread running under a scheduler does TLOCK and then the scheduler swaps in another thread? Then the other thread tries to setup another multiply which trashes the one that the first thread setup. I think there needs to be some software mechanism in addition to TLOCK and TFREE for scheduled threads. Or am I missing something?
Okay, this introduces yet another code generation option:
1) single tasking code with no worries about shared HW resources
2) multi-tasking code that uses TLOCK/TFREE to manage access to shared HW resources
3) multi-threaded code that uses a software in the scheduler to manage access to shared resources
I suppose some combination of 2 and 3 would be necessary if you have HW tasks running in addition to the scheduler and the time-sliced task. Maybe it would be easier if the HW self-looped for the SET instructions as well as the GET instructions. If a task/thread sets up a multiply or divide then every other task/thread would self loops on the SET instructions until the operation completes in the first task/thread. Would that work?
It would work. But you can probably use the same TLOCK solution for both cases 2) and 3) in order to keep things consistent and reduce the number of approaches, it will just introduce slightly more jitter in case 3) than might otherwise be required but at least it would work and is scheduler task implementation independent. Downside is a debugger controlled by another task would not be able stop inside TLOCK/TFREE section.
But what happens if one thread running under a scheduler does TLOCK and then the scheduler swaps in another thread? Then the other thread tries to setup another multiply which trashes the one that the first thread setup. I think there needs to be some software mechanism in addition to TLOCK and TFREE for scheduled threads. Or am I missing something?
Ha! That's the beauty of the whole thing. The scheduler CAN'T swap in another thread because the old thread is hogging the pipeline due to TLOCK. Only after the thread executes the TFREE will the scheduler execute another instruction. There are no interrupts - only software executing in tasks. Any task in a TLOCK is the only thing running.
It would work. You can probably use the same TLOCK solution for both cases 2) and 3) in order to keep things consistent and reduce the number of approaches, it will just introduce slightly more jitter in case 3) than might otherwise be required but at least it would work and is scheduler task implementation independent. Downside is a debugger controlled by another task would not be able stop inside TLOCK/TFREE section.
Okay, I see what you mean. The scheduler is in another task so it can't interrupt a task between TLOCK and TFREE. I guess that would work then. So GCC only needs to support two modes, single-tasking mode and multi-tasking mode and the multi-tasking mode needs to wrap all of these non-atomic operations in TLOCK/TFREE.
Ha! That's the beauty of the whole thing. The scheduler CAN'T swap in another thread because the old thread is hogging the pipeline due to TLOCK. Only after the thread executes the TFREE will the scheduler execute another instruction. There are no interrupts - only software executing in tasks. Any task in a TLOCK is the only thing running.
Yes, I see that now. The only downside, as rogloh mentioned, is that a debugger will not be able to stop the thread in those sections.
Okay, I see what you mean. The scheduler is in another task so it can't interrupt a task between TLOCK and TFREE. I guess that would work then. So GCC only needs to support two modes, single-tasking mode and multi-tasking mode and the multi-tasking mode needs to wrap all of these non-atomic operations in TLOCK/TFREE.
Exactly.
Preemptive threading is just a use case of multitasking.
Chip here. I'm on my dad's computer and didn't know he was logged in. I assumed it was my account.
I was wondering about that CHIPKEN username. Now I know! I guess you can tell that I'm finally getting around to reading the last instruction list that you published. Looking forward to the new one with all of this tasking stuff in it.
This is where you must use TLOCK/TFREE in the thread or task. That way, you can hog the multiplier/divider/squarerooter/CORDIC/other for the umpteen cycles it takes to load it, wait, and get the result out of it. TLOCK/TFREE make sharing of these singular resources very simple.
Consider that once a thread is TLOCK'd there is no possibility of an interruption, until it executes TFREE, at which point the slots resuming their cycling. So, arbitration occurs at the task level by a task simply taking control for a brief period. It's a first-come-first-serve scenario that requires no other arbitration mechanism. It's serendipitously simple.
Chip,
The only issue with using a TLOCK for these cases is that it halts the other threads from doing anything while waiting on the multiplier/CORDIC, etc. when all we really want to do is make sure they don't try to use the same resource.
It would be nice to have some locks within each cog that are purely locks, they don't change anything with the threading and by being a cog resource do not incur a hub operation.
These would be general purpose locks similar to the existing locks, they would just be a cog resource.
The only issue with using a TLOCK for these cases is that it halts the other threads from doing anything while waiting on the multiplier/CORDIC, etc. when all we really want to do is make sure they don't try to use the same resource.
It would be nice to have some locks within each cog that are purely locks, they don't change anything with the threading and by being a cog resource do not incur a hub operation.
C.W.
They'd have to be implemented in a way where you'd request one and then automatically loop until you get it. Otherwise, those umpteen cycles would be burned up in monkey motion vying for the lock. I'll have to think about how those locks could work. There'd need to be agreement within a multitasking scheme what locks were meant for what resources.
Hey, those locks could really hang things up if you were waiting for one, you got it, but then your thread got switched out before you could take care of business and release the lock. Other tasks and threads vying for that same lock would hang until your thread was reactivated and released the lock. Any idea on how to overcome this dilemma?
TLOCK/TFREE are briefly piggish, but keep things very simple. I mean, a divider operation only takes 16 clocks. That's somewhat significant for multitasking, but insignificant for preemptive threading.
I started to type up another answer, and then rechecked the forum this time. I think I said almost exactly the same thing as you just did Chip. We seem to be on the same wavelength right now and overlapping our posts so I'll pause and keep quiet for a bit. For small overall critical section intervals TLOCK/TFREE are going to be reasonably efficient, at least to begin with.
I started to type up another answer, and then rechecked the forum this time. I think I said almost exactly the same thing as you just did Chip. We seem to be on the same wavelength right now and overlapping our posts so I'll pause and keep quiet for a bit. For small overall critical section delays TLOCK/TFREE are going to be reasonably efficient, at least to begin with.
I was just telling my dad about all the excellent ideas and input provided by people on the forum, and how half of them were from Australia and New Zealand. We're actually on the same clock, I think.
They'd have to be implemented in a way where you'd request one and then automatically loop until you get it. Otherwise, those umpteen cycles would be burned up in monkey motion vying for the lock. I'll have to think about how those locks could work. There'd need to be agreement within a multitasking scheme what locks were meant for what resources.
Hey, those locks could really hang things up if you were waiting for one, you got it, but then your thread got switched out before you could take care of business and release the lock. Other tasks and threads vying for that same lock would hang until your thread was reactivated and released the lock. Any idea on how to overcome this dilemma?
TLOCK/TFREE are briefly piggish, but keep things very simple. I mean, a divider operation only takes 16 clocks. That's somewhat significant for multitasking, but insignificant for preemptive threading.
Yeah, I guess holding a lock while being deactivated would be an issue. It would work well for tasks, but not with the preemptive threading.
I was just telling my dad about all the excellent ideas and input provided by people on the forum, and how half of them were from Australia and New Zealand.
I just noticed that all of the CALLx/RETx instructions save and restore C and Z as well as PC. I guess that means that it is impossible to write a function that returns a value in either C or Z. Well, not exactly impossible but it would require modifying the function's return address on the stack. I don't think C/C++ will need to return values in C or Z but is that something commonly done in PASM code?
So subtle and smart I'm not sure I know what it means.
I thought preemptive scheduling was a method of achieving multitasking not using it.
"Preemptive", "hardware scheduled", "cooperative", "multi-processor" are all ways one can achieve multitasking.
All but cooperative have the issue of how to safely share resources, data or hardware. That's where locks come in.
Unless you do it the "piggy" way and just disable interrupts, thus switching off preemption, for the period of time a resource is in use. Or in this case use TLOCK if I understand correctly.
I just noticed that all of the CALLx/RETx instructions save and restore C and Z as well as PC. I guess that means that it is impossible to write a function that returns a value in either C or Z. Well, not exactly impossible but it would require modifying the function's return address on the stack. I don't think C/C++ will need to return values in C or Z but is that something commonly done in PASM code?
All CALL instructions save the flags on the stack in bits 17 and 16, with the addres in bits 15..0. All RETs can optionally load Z and C from those same bits, but only if WZ/WC follow the RET instruction. Otherwise, the current Z/C are unchanged, so they CAN be function results.
Comments
This is where you must use TLOCK/TFREE in the thread or task. That way, you can hog the multiplier/divider/squarerooter/CORDIC/other for the umpteen cycles it takes to load it, wait, and get the result out of it. TLOCK/TFREE make sharing of these singular resources very simple.
Consider that once a thread is TLOCK'd there is no possibility of an interruption, until it executes TFREE, at which point the slots resuming their cycling. So, arbitration occurs at the task level by a task simply taking control for a brief period. It's a first-come-first-serve scenario that requires no other arbitration mechanism. It's serendipitously simple.
The docs from the release at the end of January still indicate those as single resources per cog.
Looks like a good reason to add those extra locks I requested...
You could wrap the usage of those instructions within a lock.
Hmmm, I wonder if we need intra-cog locks...that way we wouldn't need to have locks used internal to a COG incur a hubop...
C.W.
1) single tasking code with no worries about shared HW resources
2) multi-tasking code that uses TLOCK/TFREE to manage access to shared HW resources
3) multi-threaded code that uses a software in the scheduler to manage access to shared resources
I suppose some combination of 2 and 3 would be necessary if you have HW tasks running in addition to the scheduler and the time-sliced task. Maybe it would be easier if the HW self-looped for the SET instructions as well as the GET instructions. If a task/thread sets up a multiply or divide then every other task/thread would self loops on the SET instructions until the operation completes in the first task/thread. Would that work?
The WIDEs will be saved and restored under preemptive multitasking, but SETWIDE's configuration (address of WIDEs) ought to be universal for a particular multitasking scheme.
Cases 2 and 3 are the same. They are tasks. So, you only need to know if other tasks are running, and, if so, issue TLOCK/TFREE around singular resource usages.
It would work. But you can probably use the same TLOCK solution for both cases 2) and 3) in order to keep things consistent and reduce the number of approaches, it will just introduce slightly more jitter in case 3) than might otherwise be required but at least it would work and is scheduler task implementation independent. Downside is a debugger controlled by another task would not be able stop inside TLOCK/TFREE section.
Ha! That's the beauty of the whole thing. The scheduler CAN'T swap in another thread because the old thread is hogging the pipeline due to TLOCK. Only after the thread executes the TFREE will the scheduler execute another instruction. There are no interrupts - only software executing in tasks. Any task in a TLOCK is the only thing running.
Exactly.
Preemptive threading is just a use case of multitasking.
Chip,
The only issue with using a TLOCK for these cases is that it halts the other threads from doing anything while waiting on the multiplier/CORDIC, etc. when all we really want to do is make sure they don't try to use the same resource.
It would be nice to have some locks within each cog that are purely locks, they don't change anything with the threading and by being a cog resource do not incur a hub operation.
These would be general purpose locks similar to the existing locks, they would just be a cog resource.
C.W.
They'd have to be implemented in a way where you'd request one and then automatically loop until you get it. Otherwise, those umpteen cycles would be burned up in monkey motion vying for the lock. I'll have to think about how those locks could work. There'd need to be agreement within a multitasking scheme what locks were meant for what resources.
Hey, those locks could really hang things up if you were waiting for one, you got it, but then your thread got switched out before you could take care of business and release the lock. Other tasks and threads vying for that same lock would hang until your thread was reactivated and released the lock. Any idea on how to overcome this dilemma?
TLOCK/TFREE are briefly piggish, but keep things very simple. I mean, a divider operation only takes 16 clocks. That's somewhat significant for multitasking, but insignificant for preemptive threading.
I was just telling my dad about all the excellent ideas and input provided by people on the forum, and how half of them were from Australia and New Zealand. We're actually on the same clock, I think.
Yeah, I guess holding a lock while being deactivated would be an issue. It would work well for tasks, but not with the preemptive threading.
C.W.
That you wrote a pile of posts ago! And it slipped right by... Subtle, and smart.
I thought preemptive scheduling was a method of achieving multitasking not using it.
"Preemptive", "hardware scheduled", "cooperative", "multi-processor" are all ways one can achieve multitasking.
All but cooperative have the issue of how to safely share resources, data or hardware. That's where locks come in.
Unless you do it the "piggy" way and just disable interrupts, thus switching off preemption, for the period of time a resource is in use. Or in this case use TLOCK if I understand correctly.
All CALL instructions save the flags on the stack in bits 17 and 16, with the addres in bits 15..0. All RETs can optionally load Z and C from those same bits, but only if WZ/WC follow the RET instruction. Otherwise, the current Z/C are unchanged, so they CAN be function results.