Propeller II update - BLOG

David Betz · 2014-03-01 17:45

Dumb question removed. :-(

David Betz · 2014-03-01 19:19

What happens when you're using multiple HW tasks and/or using the new threading capability with code that uses GETMULL/H, GETDIVQ/R, etc? Do we now have a separate multiplier for each HW task? Even if we do, what happens if the scheduler decides to switch threads just before one of these instructions? Won't the new thread get the result of an instruction initiated by the old thread if they're both using that HW resource?

CHIPKEN · 2014-03-01 19:35

David Betz wrote: »

What happens when you're using multiple HW tasks and/or using the new threading capability with code that uses GETMULL/H, GETDIVQ/R, etc? Do we now have a separate multiplier for each HW task? Even if we do, what happens if the scheduler decides to switch threads just before one of these instructions? Won't the new thread get the result of an instruction initiated by the old thread if they're both using that HW resource?

This is where you must use TLOCK/TFREE in the thread or task. That way, you can hog the multiplier/divider/squarerooter/CORDIC/other for the umpteen cycles it takes to load it, wait, and get the result out of it. TLOCK/TFREE make sharing of these singular resources very simple.

Consider that once a thread is TLOCK'd there is no possibility of an interruption, until it executes TFREE, at which point the slots resuming their cycling. So, arbitration occurs at the task level by a task simply taking control for a brief period. It's a first-come-first-serve scenario that requires no other arbitration mechanism. It's serendipitously simple.

ctwardell · 2014-03-01 19:37

David Betz wrote: »

What happens when you're using multiple HW tasks and/or using the new threading capability with code that uses GETMULL/H, GETDIVQ/R, etc? Do we now have a separate multiplier for each HW task? Even if we do, what happens if the scheduler decides to switch threads just before one of these instructions? Won't the new thread get the result of an instruction initiated by the old thread if they're both using that HW resource?

The docs from the release at the end of January still indicate those as single resources per cog.

Tips for coding multi-tasking programs
--------------------------------------

While all tasks in a multi-tasking program can execute atomic instructions without any inter-task conflict,
remember that there's only one of each of the following cog resources and only one task can use it at a time:

  Singular resource      Some related instructions that could cause conflicts
  ----------------------------------------------------------------------------------------------------------
  WIDE registers         RDBYTEC/RDWORDC/RDLONGC/RDWIDEC/RDWIDE/WRWIDE/SETWIDE/SETWIDZ
  INDA                   FIXINDA/FIXINDS/SETINDA/SETINDS / INDA modification via INDA usage
  INDB                   FIXINDB/FIXINDS/SETINDB/SETINDS / INDB modification via INDB usage
  PTRA                   SETPTRA/ADDPTRA/SUBPTRA / PTRA modification via RDxxxx/WRxxxx
  PTRB                   SETPTRB/ADDPTRB/SUBPTRB / PTRB modification via RDxxxx/WRxxxx
  PTRX                   SETPTRX/ADDPTRX/SUBPTRX/CALLX/RETX/PUSHX/POPX / PTRX modification via RDAUXx/WRAUXx
  PTRY                   SETPTRY/ADDPTRY/SUBPTRY/CALLY/RETY/PUSHY/POPY / PTRY modification via RDAUXx/WRAUXx
  ACCA                   SETACCA/SETACCS/MACA/SARACCA/SARACCS/CLRACCA/CLRACCS
  ACCB                   SETACCB/SETACCS/MACB/SARACCB/SARACCS/CLRACCB/CLRACCS
  32x32 multiplier       MUL32/MUL32U
  64/32 divider          FRAC/DIV32/DIV32U/DIV64/DIV64U/DIV64D
  64-bit square rooter   SQRT64/SQRT32
  CORDIC computer        QSINCOS/QARCTAN/QROTATE/QLOG/QEXP/SETQI/SETQZ
  SERA                   SETSERA/SERINA/SEROUTA
  SERB                   SETSERB/SERINB/SEROUTB
  XFR                    SETXFR
  VID                    WAITVID/SETVID/SETVIDY/SETVIDI/SETVIDQ/POLVID
  Block repeater         REPS/REPD
  CTRA                   SETCTRA/SETWAVA/SETPHSA/ADDPHSA/SUBPHSA/GETPHZA/POLCTRA/CAPCTRA/SYNCTRA
  CTRB                   SETCTRB/SETWAVB/SETPHSB/ADDPHSB/SUBPHSB/GETPHZB/POLCTRB/CAPCTRB/SYNCTRB
  PIX                    (not usable in multi-tasking, requires single-task timing)

Looks like a good reason to add those extra locks I requested...

You could wrap the usage of those instructions within a lock.

Hmmm, I wonder if we need intra-cog locks...that way we wouldn't need to have locks used internal to a COG incur a hubop...

C.W.

rogloh · 2014-03-01 19:37

David Betz wrote: »

What happens when you're using multiple HW tasks and/or using the new threading capability with code that uses GETMULL/H, GETDIVQ/R, etc? Do we now have a separate multiplier for each HW task? Even if we do, what happens if the scheduler decides to switch threads just before one of these instructions? Won't the new thread get the result of an instruction initiated by the old thread if they're both using that HW resource?

Yes I imagine they probably will collide in such a case where they are sharing a common resource. That is why we require protection methods such as task locking when doing multitasking/multithreading. This can be done either at the time slicing level with TLOCK or within the scheduler task itself using a software method to suspend user thread switching temporarily until the result is complete and consumed by the thread using it. Multithreading always adds its own set of complexities when you need to protect shared resources.

David Betz · 2014-03-01 19:41

CHIPKEN wrote: »

This is where you must use TLOCK/TFREE in the thread or task. That way, you can hog the multiplier/divider/squarerooter/CORDIC/other for the umpteen cycles it takes to load it, wait, and get the result out of it. TLOCK/TFREE make sharing of these singular resources very simple.

Consider that once a thread is TLOCK'd there is no possibility of an interruption, until it executes TFREE, at which point the slots resuming their cycling.

Thanks. I guess that probably means another code generation option in GCC to indicate if you expect the code to run in a threaded or tasking environment. You wouldn't want the TLOCK/TFREE instructions to be generated in single tasking code since it would waste both space and time. What would this mean for the thread switcher? Would it be prevented from stopping a thread that was TLOCKed?

ozpropdev · 2014-03-01 19:41

David Betz wrote: »

What happens when you're using multiple HW tasks and/or using the new threading capability with code that uses GETMULL/H, GETDIVQ/R, etc? Do we now have a separate multiplier for each HW task? Even if we do, what happens if the scheduler decides to switch threads just before one of these instructions? Won't the new thread get the result of an instruction initiated by the old thread if they're both using that HW resource?

Other instructions like SETWIDE,SETWIDZ,RDWIDE,WRWIDE have to be used carefully with multi-tasking too.

David Betz · 2014-03-01 19:44

rogloh wrote: »

Yes I imagine they probably will collide in such a case where they are sharing a common resource. That is why we require protection methods such as task locking when doing multitasking/multithreading. This can be done either at the time slicing level with TLOCK or within the scheduler task itself using a software method to suspend user thread switching temporarily until the result is complete and consumed by the thread using it. Multithreading always adds its own set of complexities when you need to protect shared resources.

Okay, this introduces yet another code generation option:

1) single tasking code with no worries about shared HW resources
2) multi-tasking code that uses TLOCK/TFREE to manage access to shared HW resources
3) multi-threaded code that uses a software in the scheduler to manage access to shared resources

I suppose some combination of 2 and 3 would be necessary if you have HW tasks running in addition to the scheduler and the time-sliced task. Maybe it would be easier if the HW self-looped for the SET instructions as well as the GET instructions. If a task/thread sets up a multiply or divide then every other task/thread would self loops on the SET instructions until the operation completes in the first task/thread. Would that work?

David Betz · 2014-03-01 19:46

ozpropdev wrote: »

Other instructions like SETWIDE,SETWIDZ,RDWIDE,WRWIDE have to be used carefully with multi-tasking too.

I'm not so concerned about those because they may not be used by the GCC code generator but multiply and divide certainly will.

CHIPKEN · 2014-03-01 19:49

ozpropdev wrote: »

Other instructions like SETWIDE,SETWIDZ,RDWIDE,WRWIDE have to be used carefully with multi-tasking too.

The WIDEs will be saved and restored under preemptive multitasking, but SETWIDE's configuration (address of WIDEs) ought to be universal for a particular multitasking scheme.

CHIPKEN · 2014-03-01 19:51

David Betz wrote: »

Okay, this introduces yet another code generation option:

1) single tasking code with no worries about shared HW resources
2) multi-tasking code that uses TLOCK/TFREE to manage access to shared HW resources
3) multi-threaded code that uses a software in the scheduler to manage access to shared resources

I suppose some combination of 2 and 3 would be necessary if you have HW tasks running in addition to the scheduler and the time-sliced task. Maybe it would be easier if the HW self-looped for the SET instructions as well as the GET instructions. If a task/thread sets up a multiply or divide then every other task/thread would self loops on the SET instructions until the operation completes in the first task/thread. Would that work?

Cases 2 and 3 are the same. They are tasks. So, you only need to know if other tasks are running, and, if so, issue TLOCK/TFREE around singular resource usages.

David Betz · 2014-03-01 19:52

CHIPKEN wrote: »

Cases 2 and 3 are the same. They are tasks. So, you only need to know if other tasks are running, and, if so, issue TLOCK/TFREE around singular resource usages.

But what happens if one thread running under a scheduler does TLOCK and then the scheduler swaps in another thread? Then the other thread tries to setup another multiply which trashes the one that the first thread setup. I think there needs to be some software mechanism in addition to TLOCK and TFREE for scheduled threads. Or am I missing something?

CHIPKEN · 2014-03-01 19:54

Chip here. I'm on my dad's computer and didn't know he was logged in. I assumed it was my account.

rogloh · 2014-03-01 19:54

David Betz wrote: »

Okay, this introduces yet another code generation option:

1) single tasking code with no worries about shared HW resources
2) multi-tasking code that uses TLOCK/TFREE to manage access to shared HW resources
3) multi-threaded code that uses a software in the scheduler to manage access to shared resources

I suppose some combination of 2 and 3 would be necessary if you have HW tasks running in addition to the scheduler and the time-sliced task. Maybe it would be easier if the HW self-looped for the SET instructions as well as the GET instructions. If a task/thread sets up a multiply or divide then every other task/thread would self loops on the SET instructions until the operation completes in the first task/thread. Would that work?

It would work. But you can probably use the same TLOCK solution for both cases 2) and 3) in order to keep things consistent and reduce the number of approaches, it will just introduce slightly more jitter in case 3) than might otherwise be required but at least it would work and is scheduler task implementation independent. Downside is a debugger controlled by another task would not be able stop inside TLOCK/TFREE section.

CHIPKEN · 2014-03-01 19:56

David Betz wrote: »

But what happens if one thread running under a scheduler does TLOCK and then the scheduler swaps in another thread? Then the other thread tries to setup another multiply which trashes the one that the first thread setup. I think there needs to be some software mechanism in addition to TLOCK and TFREE for scheduled threads. Or am I missing something?

Ha! That's the beauty of the whole thing. The scheduler CAN'T swap in another thread because the old thread is hogging the pipeline due to TLOCK. Only after the thread executes the TFREE will the scheduler execute another instruction. There are no interrupts - only software executing in tasks. Any task in a TLOCK is the only thing running.

potatohead · 2014-03-01 19:56

Cool! CHIP+KEN = Dad. Got it!

David Betz · 2014-03-01 19:58

rogloh wrote: »

It would work. You can probably use the same TLOCK solution for both cases 2) and 3) in order to keep things consistent and reduce the number of approaches, it will just introduce slightly more jitter in case 3) than might otherwise be required but at least it would work and is scheduler task implementation independent. Downside is a debugger controlled by another task would not be able stop inside TLOCK/TFREE section.

Okay, I see what you mean. The scheduler is in another task so it can't interrupt a task between TLOCK and TFREE. I guess that would work then. So GCC only needs to support two modes, single-tasking mode and multi-tasking mode and the multi-tasking mode needs to wrap all of these non-atomic operations in TLOCK/TFREE.

David Betz · 2014-03-01 20:00

CHIPKEN wrote: »

Ha! That's the beauty of the whole thing. The scheduler CAN'T swap in another thread because the old thread is hogging the pipeline due to TLOCK. Only after the thread executes the TFREE will the scheduler execute another instruction. There are no interrupts - only software executing in tasks. Any task in a TLOCK is the only thing running.

Yes, I see that now. The only downside, as rogloh mentioned, is that a debugger will not be able to stop the thread in those sections.

CHIPKEN · 2014-03-01 20:01

David Betz wrote: »

Okay, I see what you mean. The scheduler is in another task so it can't interrupt a task between TLOCK and TFREE. I guess that would work then. So GCC only needs to support two modes, single-tasking mode and multi-tasking mode and the multi-tasking mode needs to wrap all of these non-atomic operations in TLOCK/TFREE.

Exactly.

Preemptive threading is just a use case of multitasking.

David Betz · 2014-03-01 20:01

CHIPKEN wrote: »

Chip here. I'm on my dad's computer and didn't know he was logged in. I assumed it was my account.

I was wondering about that CHIPKEN username. Now I know! I guess you can tell that I'm finally getting around to reading the last instruction list that you published. Looking forward to the new one with all of this tasking stuff in it.

ctwardell · 2014-03-01 20:02

CHIPKEN wrote: »

This is where you must use TLOCK/TFREE in the thread or task. That way, you can hog the multiplier/divider/squarerooter/CORDIC/other for the umpteen cycles it takes to load it, wait, and get the result out of it. TLOCK/TFREE make sharing of these singular resources very simple.

Consider that once a thread is TLOCK'd there is no possibility of an interruption, until it executes TFREE, at which point the slots resuming their cycling. So, arbitration occurs at the task level by a task simply taking control for a brief period. It's a first-come-first-serve scenario that requires no other arbitration mechanism. It's serendipitously simple.

Chip,

The only issue with using a TLOCK for these cases is that it halts the other threads from doing anything while waiting on the multiplier/CORDIC, etc. when all we really want to do is make sure they don't try to use the same resource.
It would be nice to have some locks within each cog that are purely locks, they don't change anything with the threading and by being a cog resource do not incur a hub operation.

These would be general purpose locks similar to the existing locks, they would just be a cog resource.

C.W.

CHIPKEN · 2014-03-01 20:12

ctwardell wrote: »

Chip,

The only issue with using a TLOCK for these cases is that it halts the other threads from doing anything while waiting on the multiplier/CORDIC, etc. when all we really want to do is make sure they don't try to use the same resource.
It would be nice to have some locks within each cog that are purely locks, they don't change anything with the threading and by being a cog resource do not incur a hub operation.

C.W.

They'd have to be implemented in a way where you'd request one and then automatically loop until you get it. Otherwise, those umpteen cycles would be burned up in monkey motion vying for the lock. I'll have to think about how those locks could work. There'd need to be agreement within a multitasking scheme what locks were meant for what resources.

Hey, those locks could really hang things up if you were waiting for one, you got it, but then your thread got switched out before you could take care of business and release the lock. Other tasks and threads vying for that same lock would hang until your thread was reactivated and released the lock. Any idea on how to overcome this dilemma?

TLOCK/TFREE are briefly piggish, but keep things very simple. I mean, a divider operation only takes 16 clocks. That's somewhat significant for multitasking, but insignificant for preemptive threading.

rogloh · 2014-03-01 20:19

I started to type up another answer, and then rechecked the forum this time. I think I said almost exactly the same thing as you just did Chip. We seem to be on the same wavelength right now and overlapping our posts so I'll pause and keep quiet for a bit. For small overall critical section intervals TLOCK/TFREE are going to be reasonably efficient, at least to begin with.

CHIPKEN · 2014-03-01 20:23

rogloh wrote: »

I started to type up another answer, and then rechecked the forum this time. I think I said almost exactly the same thing as you just did Chip. We seem to be on the same wavelength right now and overlapping our posts so I'll pause and keep quiet for a bit. For small overall critical section delays TLOCK/TFREE are going to be reasonably efficient, at least to begin with.

I was just telling my dad about all the excellent ideas and input provided by people on the forum, and how half of them were from Australia and New Zealand. We're actually on the same clock, I think.

ctwardell · 2014-03-01 20:25

CHIPKEN wrote: »

They'd have to be implemented in a way where you'd request one and then automatically loop until you get it. Otherwise, those umpteen cycles would be burned up in monkey motion vying for the lock. I'll have to think about how those locks could work. There'd need to be agreement within a multitasking scheme what locks were meant for what resources.

Hey, those locks could really hang things up if you were waiting for one, you got it, but then your thread got switched out before you could take care of business and release the lock. Other tasks and threads vying for that same lock would hang until your thread was reactivated and released the lock. Any idea on how to overcome this dilemma?

TLOCK/TFREE are briefly piggish, but keep things very simple. I mean, a divider operation only takes 16 clocks. That's somewhat significant for multitasking, but insignificant for preemptive threading.

Yeah, I guess holding a lock while being deactivated would be an issue. It would work well for tasks, but not with the preemptive threading.

C.W.

rogloh · 2014-03-01 20:26

CHIPKEN wrote: »

I was just telling my dad about all the excellent ideas and input provided by people on the forum, and how half of them were from Australia and New Zealand.

Must be all the beer we drink.

potatohead · 2014-03-01 20:28

@Chip: It is stuff like this,

Preemptive threading is just a use case of multitasking.

That you wrote a pile of posts ago! And it slipped right by... Subtle, and smart.

David Betz · 2014-03-01 20:36

I just noticed that all of the CALLx/RETx instructions save and restore C and Z as well as PC. I guess that means that it is impossible to write a function that returns a value in either C or Z. Well, not exactly impossible but it would require modifying the function's return address on the stack. I don't think C/C++ will need to return values in C or Z but is that something commonly done in PASM code?

Heater. · 2014-03-01 20:39

So subtle and smart I'm not sure I know what it means.

I thought preemptive scheduling was a method of achieving multitasking not using it.

"Preemptive", "hardware scheduled", "cooperative", "multi-processor" are all ways one can achieve multitasking.

All but cooperative have the issue of how to safely share resources, data or hardware. That's where locks come in.

Unless you do it the "piggy" way and just disable interrupts, thus switching off preemption, for the period of time a resource is in use. Or in this case use TLOCK if I understand correctly.

CHIPKEN · 2014-03-01 20:40

David Betz wrote: »

I just noticed that all of the CALLx/RETx instructions save and restore C and Z as well as PC. I guess that means that it is impossible to write a function that returns a value in either C or Z. Well, not exactly impossible but it would require modifying the function's return address on the stack. I don't think C/C++ will need to return values in C or Z but is that something commonly done in PASM code?

All CALL instructions save the flags on the stack in bits 17 and 16, with the addres in bits 15..0. All RETs can optionally load Z and C from those same bits, but only if WZ/WC follow the RET instruction. Otherwise, the current Z/C are unchanged, so they CAN be function results.

Propeller II update - BLOG

Comments