Propeller II update - BLOG

Sapieha · 2014-03-07 01:35

Hi Chip.

NOT me to ----> BUT.

In that case even if anyone made program

Some code
......
TPAUSE
TRESUME
........
Some code

That still will be no problem for programmer else P2 any task.

cgracey wrote: »

I guess I could set a flag on TRESUME that would wait until the target task was out of the pipeline before incrementing its PC and clearing itself. I don't know if that would be a good idea, or not, though.

Sapieha · 2014-03-07 01:48

Hi Chip.

One more thing.

That flag can apply to all 3 instructions and then Statement on end will not be problem

cgracey wrote: »

I
T3SAVE 'save state of task 3 into WIDEs
T3LOAD 'load state of task 3 from WIDEs
TPAUSE D,S/# 'write S/# to D and loop in place
TRESUME D/# 'increment PC of task D/#

T3SAVE, T3LOAD, and TRESUME must be executed when the target task is no longer executing (out of the pipeline and no time slots given to it).
.

jmg · 2014-03-07 02:04

cgracey wrote: »

T3SAVE 'save state of task 3 into WIDEs
T3LOAD 'load state of task 3 from WIDEs
TPAUSE D,S/# 'write S/# to D and loop in place
TRESUME D/# 'increment PC of task D/#

T3SAVE, T3LOAD, and TRESUME must be executed when the target task is no longer executing (out of the pipeline and no time slots given to it).

TPAUSE is executed by a thread task to alert a scheduler task that it has reached a certain point and will eventually need a TRESUME to continue.

Above the T3Save and T3Load have the task number (3) in the opcode word, but the TRESUME seems to have it as a parameter ?

Also, the TPAUSE is reserved for slave threads only, whilst the other 3 are used soley in the 'master' thread.
That could get confusing, if they all use the same root ?

Also, what happens if TRESUME is fired on a slave thread that is not actually PAUSED ?

There may be cases where threads have varying completion times, and in some, PAUSE may not be reached.
I would guess a random bonus PC inc, could cause some very hard to find effects, given it is across-thread.

Perhaps TRESUME could have a simple gate, to have it ONLY INC the PC, on a thread what was executing a PAUSE.
If the only exit from PAUSE is TRESUME then a simple S-R FF would do this.
That would mean any other TRESUME would be just ignored, as the slave thread is not paused.

jmg · 2014-03-07 02:15

cgracey wrote: »

I guess I could set a flag on TRESUME that would wait until the target task was out of the pipeline before incrementing its PC and clearing itself. I don't know if that would be a good idea, or not, though.

Seems worth checking into.
On a 'correctly paced' pgm (one with enough delays as coded) that flag would be invisible.
It would only be acted on, when someone did a TaskMap change, with RESUME too quickly thereafter.
Code could then be packer-free and inherently safe

SETTASK SlaveOffMap
TRESUME
SETTASK SlaveOnMap
{slave Task may get in one or more jump-to-self before pipeline flushes, then slave Task is Frozen, master INCS PC, and restarts slave task.}

cgracey · 2014-03-07 02:32

jmg wrote: »

Above the T3Save and T3Load have the task number (3) in the opcode word, but the TRESUME seems to have it as a parameter ?

Also, the TPAUSE is reserved for slave threads only, whilst the other 3 are used soley in the 'master' thread.
That could get confusing, if they all use the same root ?

Also, what happens if TRESUME is fired on a slave thread that is not actually PAUSED ?

There may be cases where threads have varying completion times, and in some, PAUSE may not be reached.
I would guess a random bonus PC inc, could cause some very hard to find effects, given it is across-thread.

Perhaps TRESUME could have a simple gate, to have it ONLY INC the PC, on a thread what was executing a PAUSE.
If the only exit from PAUSE is TRESUME then a simple S-R FF would do this.
That would mean any other TRESUME would be just ignored, as the slave thread is not paused.

Any task can do a TPAUSE or a TRESUME, or have a TRESUME done to it.

Only Task 3 has the special task context save/load circuitry to facilitate multi-threading. You don't need the task context save/load to use TPAUSE/TRESUME with any tasks. T3LOAD/T3SAVE should be thought of separately from TPAUSE/TRESUME - its just a facility to help task 3 support rich-context thread switching.

And remember that you would only execute a TRESUME on a target task that was already evidently in a TPAUSE because you noticed a value in some register that the TPAUSE instruction put there. So, you just wouldn't be doing out-of-sync TRESUME's any more than you would be doing random JMPTASK's.

As far as preemptive thread switching goes with task 3, you would starve it of slots, let it empty through the pipeline, and then do the T3SAVE, then a T3LOAD, and give it some time slots to start it back up. In that case, task 3 would only use TPAUSE to signal to the scheduler task that perhaps it was ready to retire early, or something. So, TPAUSE/TRESUME are not required, at all, for pre-emptive thread switching. They are just there as extra tools, if needed.

jmg · 2014-03-07 02:43

cgracey wrote: »

And remember that you would only execute a TRESUME on a target task that was already evidently in a TPAUSE because you noticed a value in some register that the TPAUSE instruction put there.

- but that is adding more code semaphores and management, and does not address what happens should TRESUME fire on something not paused.
Having TRESUME conditional on being paused, helps ensure 'it just works', because any exception is safely skipped.

jmg · 2014-03-07 02:45

cgracey wrote: »

Only Task 3 has the special task context save/load circuitry to facilitate multi-threading.

I'd forgotten that detail.

cgracey · 2014-03-07 03:28

jmg wrote: »

- but that is adding more code semaphores and management, and does not address what happens should TRESUME fire on something not paused.
Having TRESUME conditional on being paused, helps ensure 'it just works', because any exception is safely skipped.

The whole point of TPAUSE/TRESUME is to provide a semaphore mechanism, right? Or do you see some other purpose? I just can't figure why someone would do a TRESUME without perfect confidence that the target task was in a TPAUSE. I mean, do we accidentally do COGNEWs and COGSTOPs, or do we have reasons?

mindrobots · 2014-03-07 03:32

jmg wrote: »

I'd forgotten that detail.

That is happening to all of us on a lot of these details, as Heater pointed out, that is a concern now as we are starting to propose and debate and redesign on partially remembered implimentations.....not a productive situation!

mindrobots · 2014-03-07 03:34

cgracey wrote: »

The whole point of TPAUSE/TRESUME is to provide a semaphore mechanism, right? Or do you see some other purpose? I just can't figure why someone would do a TRESUME without perfect confidence that the target task was in a TPAUSE. I mean, do we accidentally do COGNEWs and COGSTOPs, or do we have reasons?

If you do, it's a bug, bugs must be killed. I don't think the silicon can stop you from coding and logic bugs....none has ever stopped me!!

Heater. · 2014-03-07 03:40

Only Task 3 has the special task context save/load circuitry to facilitate multi-threading

Yep. Yet another violation of the KISS principle, or the principle of least surprise, breaking orthogonality with special cases.

Sapieha · 2014-03-07 03:48

Hi Chip.

I see all ones that are NOT confident in programing

cgracey wrote: »

The whole point of TPAUSE/TRESUME is to provide a semaphore mechanism, right? Or do you see some other purpose? I just can't figure why someone would do a TRESUME without perfect confidence that the target task was in a TPAUSE. I mean, do we accidentally do COGNEWs and COGSTOPs, or do we have reasons?

cgracey · 2014-03-07 03:57

Heater. wrote: »

Yep. Yet another violation of the KISS principle, or the principle of least surprise, breaking orthogonality with special cases.

This is not so much about least surprise as it is about least silicon. There's no reason to support context save/load on more than one task.

cgracey · 2014-03-07 03:58

Sapieha wrote: »

Hi Chip.

I see all ones that are NOT confident in programing

I think we are all getting a little punch drunk and irritable.

Ariba · 2014-03-07 04:26

I think these new instructions are good enough to do the following

TPAUSE Threading,#Done

   TRESUME #SERDES
   TRESUME #USB

Andy

Heater. · 2014-03-07 04:28

Andy,

+ 1 +

mindrobots · 2014-03-07 04:31

Ariba wrote: »
I think these new instructions are good enough to do the following
TPAUSE Threading,#Done

   TRESUME #SERDES
   TRESUME #USB
Andy

Haha!! ...and since we don't support interrupts, Chip should be good to go until he yields... I mean does a TPAUSE!

cgracey · 2014-03-07 04:55

It took me a while to figure out the joke. Very funny!

We'll be there soon. Just another day or two of this pre-emptive threading matter.

Bill Henning · 2014-03-07 09:13

Thanks Chip,

That has started to un-confuse me

With what you said, the last p2 docs, and other messages in this thread, my updated understanding is:

(I also added some suggestions including proposed best practices)

Task Related Instruction Summary

At cog startup, the tasks are initialized as follows, with only task 0 enabled:

task Z C PC

0 0 0 $0000
1 0 0 $0001
2 0 0 $0002
3 0 0 $0003

Given that task 0 is the default task, convention should be that task0 is used as the scheduler and/or debugger

SETTASK S/#

- controls slot asignment, 16 two bit fields

JMPTASK D/#,S/#

- force PC's in mask D/# to address S/#
- may have been changed to JMP tasknum,S/#
- only takes effect when a task has been assigned slots by SETTASK

TPAUSE D,S/#

- write S/# to D and loop in place
- can be used by any user task, not to be used by a scheduler task (as scheduler services TPAUSE's)
- once scheduler detercts D is non-zero, it starves the task that performed TPAUSE

TPAUSE can be used for:

- cooperative multi-threading
- pre-emptive multi-threading
- breakpoints
- waiting for signals/events
- system calls such as read/write etc
- threads/tasks exiting

Scheduler instructions (for use only by scheduler and/or debugger task)

TRESUME D/#

- increment PC of task D/#
- used to resume execution after a TPAUSE initiated by a task
- knows what task to resume by knowing which task pause register it was monitoring when it detected the TPAUSE

T3SAVE

- save state of task 3 into WIDEs
- to be used when task 3 is not executing (has been starved of cycles by SETTASK)
- only task 3's state can be saved in order to save flip-flops
- tasks 3 must be used as the task being debugged, so only one task can be debugged at once

T3LOAD

- load state of task 3 from WIDEs
- to be used when you want to re-start task 3 while it is not executing (has been starved of cycles by SETTASK)
- only task 3's state can be loaded in order to save flip-flops
- tasks 3 must be used as the task being debugged, so only one task can be debugged at once

How To Use these instructions:

To Single Step

- starve the task of cycles with SETTASK
- give the task one cycle of execution by setting the debugger to 15/16, task to 1/16, then immediately giving the debugger 16/16 cycles
- you can single step every task except the debugger, however you can only examine the PC/Z/C of task 3

Cooperative Threads

- have a very simple scheduler that
- notices when task three has invoked TPAUSE
- swap thread process
- TRESUME
- this style does not pre-empt, and requires spinkling TPAUSE's in the cooperative threads

Pre-Emptive Threads

- more complex scheduler
- threads can still voluntarily yielt with TPAUSE
- scheduler can pre-empt task three by taking its cycles, T3SAVE, T3LOAD, add cycles to task 3

For cogs that run multiple threads (cooperative or pre-emptive) task 0 should be the scheduler, and task 3 should be the threaded task.

Best Practices:

- tasks 1 & 2 could be drivers, but for best thread performance, only tasks 0 & 3 should be used when running multiple threads (due to there being only one line of dcache)
- if needed, tasks 1 & 2 could be used for low speed drivers, but ideally should not be used on threaded cogs when running multiple threads

Ok, that sums up my current understanding. Chip, any changes to this?

cgracey wrote: »

Oh, yeah, we are all mixed up with the names, but I think we've got the concepts straight. Here are the four instructions that govern this thread business:

T3SAVE 'save state of task 3 into WIDEs
T3LOAD 'load state of task 3 from WIDEs
TPAUSE D,S/# 'write S/# to D and loop in place
TRESUME D/# 'increment PC of task D/#

T3SAVE, T3LOAD, and TRESUME must be executed when the target task is no longer executing (out of the pipeline and no time slots given to it).

TPAUSE is executed by a thread task to alert a scheduler task that it has reached a certain point and will eventually need a TRESUME to continue.

Single-stepping Task 3 can be done by giving it a single time slot, then taking it away, so that one instruction executes in task 3.

Are we missing anything?

pedward · 2014-03-07 09:38

I feel I should respond to mindrobots' joke.

The standard nomenclature for threading and such is to YIELD.

You YIELD your time slot to another.

You YIELD the floor to the speaker.

You YIELD to another car at the roundabout.

You *don't* PAUSE a thread, pause your time at the podium, or pause at the roundabout.

On that subject, there are some other notions on locking that should be clarified:

sem_wait() decrements (locks) the semaphore pointed to by sem. If the semaphore's value is greater than zero, then the
decrement proceeds, and the function returns, immediately. If the semaphore currently has the value zero, then the call
blocks until either it becomes possible to perform the decrement (i.e., the semaphore value rises above zero), or a sig‐
nal handler interrupts the call.

sem_wait() is the standard POSIX function for semaphore locking. It's a usage count that increments and decrements in an atomic manner to control locking.

The mutex object referenced by mutex shall be locked by calling pthread_mutex_lock(). If the mutex is already locked, the
calling thread shall block until the mutex becomes available. This operation shall return with the mutex object refer‐
enced by mutex in the locked state with the calling thread as its owner.

In POSIX threads, pthread_mutex_lock() is used to lock a mutex. A mutex is an opaque lock, not a usage count.

In practice, POSIX mutexes and semaphores can be used in identical ways, you can either do a trylock or do a blocking lock.

There is a third locking semantic called a spin lock, which is a busy-waiting lock instead of a blocking lock.

Here is a good explanation of spin locking: http://stackoverflow.com/questions/6603404/when-is-pthread-spin-lock-the-right-thing-to-use-over-e-g-a-pthread-mutex

If we are going to support all of these advanced threading and tasking options, I strongly encourage Chip to give them canonical names which are aligned with their standardized counterparts.

Locks should have canonical names which match their standardized counterparts as well, to help differentiate their function between a semaphore, mutex, or spinlock.

It is surprising how important using known conventions is to avoid software bugs and allow the unwashed masses to use the new chip.

Bill Henning · 2014-03-07 09:54

(pedward meant to write mindrobot's joke... actually, I looked back, it was actually Ariba's joke)

I originally called it YIELD, Chip prefers TPAUSE. It's his chip, he gets to call the instructions whatever he wants.

sem_wait(), posix threads etc can easily be implemented with YIELD/TPAUSE.

pedward wrote: »

I feel I should respond to Bill's joke.

The standard nomenclature for threading and such is to YIELD.

You YIELD your time slot to another.

You YIELD the floor to the speaker.

You YIELD to another car at the roundabout.

You *don't* PAUSE a thread, pause your time at the podium, or pause at the roundabout.

On that subject, there are some other notions on locking that should be clarified:

sem_wait() decrements (locks) the semaphore pointed to by sem. If the semaphore's value is greater than zero, then the
decrement proceeds, and the function returns, immediately. If the semaphore currently has the value zero, then the call
blocks until either it becomes possible to perform the decrement (i.e., the semaphore value rises above zero), or a sig‐
nal handler interrupts the call.

sem_wait() is the standard POSIX function for semaphore locking. It's a usage count that increments and decrements in an atomic manner to control locking.

The mutex object referenced by mutex shall be locked by calling pthread_mutex_lock(). If the mutex is already locked, the
calling thread shall block until the mutex becomes available. This operation shall return with the mutex object refer‐
enced by mutex in the locked state with the calling thread as its owner.

In POSIX threads, pthread_mutex_lock() is used to lock a mutex. A mutex is an opaque lock, not a usage count.

In practice, POSIX mutexes and semaphores can be used in identical ways, you can either do a trylock or do a blocking lock.

There is a third locking semantic called a spin lock, which is a busy-waiting lock instead of a blocking lock.

Here is a good explanation of spin locking: http://stackoverflow.com/questions/6603404/when-is-pthread-spin-lock-the-right-thing-to-use-over-e-g-a-pthread-mutex

If we are going to support all of these advanced threading and tasking options, I strongly encourage Chip to give them canonical names which are aligned with their standardized counterparts.

Locks should have canonical names which match their standardized counterparts as well, to help differentiate their function between a semaphore, mutex, or spinlock.

It is surprising how important using known conventions is to avoid software bugs and allow the unwashed masses to use the new chip.

whicker · 2014-03-07 10:24

pedward,

Except that TPAUSE doesn't exactly yield.

When someone yields the podium, they don't end up standing there with a fixed expression until someone else picks them up and forcibly takes them down from the stage.

At the roundabout, TPAUSE would be for the car to continue circling around until another car bumped it into the grass area in middle.

Ym2413a · 2014-03-07 10:58

whicker wrote: »

At the roundabout, TPAUSE would be for the car to continue circling around until another car bumped it into the grass area in middle.

This made me laugh. lol. : ]

jmg · 2014-03-07 11:15

cgracey wrote: »

The whole point of TPAUSE/TRESUME is to provide a semaphore mechanism, right? Or do you see some other purpose? I just can't figure why someone would do a TRESUME without perfect confidence that the target task was in a TPAUSE. I mean, do we accidentally do COGNEWs and COGSTOPs, or do we have reasons?

The ideal is simplicity.

If the user still has to manage all the semaphore hand overs in SW, then what are the opcodes doing, exactly ?

ie In SW now, a slave task can Set-an-agreed-flag, then loop-until-the-adjacent-flag-changes.

The master task sees the Waiting flag, removes the slot allocation, and then flips the Go flag, before giving back slots.

All of that without special opcodes, and no risk of any external INC in PC when it was not expected ?

jmg · 2014-03-07 11:21

cgracey wrote: »

This is not so much about least surprise as it is about least silicon. There's no reason to support context save/load on more than one task.

That makes sense, saving silicon is always good.

Is there a significant cost to allowing run-time choice of which task uses that single SWAP resource ?

A problem is, as coded now, a user has to do a rebuild and task remap, to debug a different task.

Also, can a user still 'see' the PC and Flags of the other tasks - ie how much DEBUG visibility is possible, on the other tasks ?

potatohead · 2014-03-07 11:33

That is assuming they even choose to employ complicated, in depth, every single action, single step, etc... debug tools. Not everybody is going to need / want this. For those that do, they have sufficient understanding to map the task they want to debug into task 3.

For everybody else, having a standard task for that use case makes good sense, simple, consistent, etc... We don't need another instruction or register for this. And I would argue every single case where we can hide it, or prune down an instruction is a darn good case right now due to the large number of instructions and registers. We need 'em,don't get me wrong. But we don't need any more than the minimums, where we can leave it to software, or the use case can be simplified in a way that just eliminates instructions and registers, as it does with this task 3.

pedward · 2014-03-07 12:57

Bill Henning wrote: »

What joke?

Oh... you are trying to be sarcastic.

Presumably you did not really intend to denigrate my posting by calling it a joke, as it is not nice to put down voluntary (no cost) technical contributions.

I originally called it YIELD, Chip prefers TPAUSE. It's his chip, he gets to call the instructions whatever he wants.

sem_wait(), posix threads etc can easily be implemented with YIELD/TPAUSE.

It was mindrobots' post where he joked about YIELD and TPAUSE. You seem a little jumpy Bill, everything okay?

Bill Henning · 2014-03-07 13:10

I was responding to what I read... fyi, I looked back, it was actually Ariba's little joke.

After bothering to spend the time to document what the latest is on tasks/threads clearly, when I should have been working on RoboPi, I was annoyed that it was (at that time) called a joke. Now I understand you just typed the wrong name. It happens.

Now that it is corrected... everything is cool.

pedward wrote: »

It was mindrobots' post where he joked about YIELD and TPAUSE. You seem a little jumpy Bill, everything okay?

potatohead · 2014-03-07 13:29

Sometimes it really is like a zoo in here. (Thanks for your nice summary Bill)

mindrobots · 2014-03-07 13:39

pedward wrote: »

It was mindrobots' post where he joked about YIELD and TPAUSE. You seem a little jumpy Bill, everything okay?

Great, the ONE time somebody actually reads one of my posts and it causes a big brouhaha!!! :frown:

Propeller II update - BLOG

Comments