Propeller II update - BLOG

ctwardell · 2014-02-26 07:33

Any chance of having the task switch rate selectable?

If not then I guess we need to determine how many cycle "a number of cycles" ends up being and what % of cycles are tolerable to give up to task switching.

Say it ends up being 20 cycles, then with the 100Khz rate we lose 1% to task switching with a 200Mhz clock, that seems reasonable to me.

If I'm figuring it correctly we could even go to 1Mhz switching rate and only give up 10%, again that assumes 20 cycle to task switch.

C.W.

mindrobots · 2014-02-26 07:35

cgracey wrote: »

I'm thinking about this task redirecting issue. It will take a number of cycles to switch a task's context, but in preemptive multitasking, how often does that need to occur? I think probably 100KHz would be plenty fast, wouldn't it? If you need communication protocols fast, then use a whole task, but for general-purpose computing where you just want to write a big program without a bunch of cooperative yields, kilohertz is below human perception. Am I missing something?

If a "task timer" expiring is your pre-emption mechanism, then 100Khz should be OK and the context switch overhead is just built into (or a penalty to) the tasks allocated time slice. If your are being pre-empted by pseudo interrupts, then then you need to be able to context switch quickly enough to handle the pre-empting event. This eiteher takes quick context switching or alternative context to run in until the event is captured adequately.

It can just be a documented constraint for pre-emption.

If 100KHz isn't fast enough for the folks asking for it or if it is too much effort, then it becomes a P3 item.

All, IMHO, of course.

cgracey · 2014-02-26 07:43

I'm assuming the task context switching can be done by 1:16 slot task that monitors CNT for switch times. When it's time to switch, it takes all the time slots for itself, starving the task to be switched, so that it can grab its states, possibly including a swath of cog RAM, and then swaps everything out with the next context, then gives itself 1:16 slots and waits for the next switch time. Only two hardware tasks are required to do this - one for the switcher and one for the active context. This doesn't take any more hardware than an instruction to get and set the task state.

Bill Henning · 2014-02-26 07:45

Normally, tasks are scheduled every 1/100th to 1/1000th of a second on most systems. 1/10000 would be a very fine grain; any smaller grain would risk most of the time being spent in scheduling, not running threads.

Please take a peek at

http://forums.parallax.com/showthread.php/154167-What-s-needed-for-preemptive-multitasking?p=1245394&viewfull=1#post1245394

Two instructions, 1/16th of a cog for scheduling, and no need for a task timer. It would also allow very flexible ways of waiting for events and waking up. No interrupts either

cgracey wrote: »

I'm thinking about this task redirecting issue. It will take a number of cycles to switch a task's context, but in preemptive multitasking, how often does that need to occur? I think probably 100KHz would be plenty fast, wouldn't it? If you need communication protocols fast, then use a whole task, but for general-purpose computing where you just want to write a big program without a bunch of cooperative yields, kilohertz is below human perception. Am I missing something?

Bill Henning · 2014-02-26 07:47

Oh good!

I did not see this message until after my previous reply.

That's a great way to go, very flexible, and we can write different sorts of schedules for different applications.

Yes, when a threads slice expires, if there is another thread that is ready to run, the scheduler task will switch threads.

It would be useful to be able to retrieve the other tasks private pointers (PTRA/B etc) and to be able to set them.

cgracey wrote: »

I'm assuming the task context switching can be done by 1:16 slot task that monitors CNT for switch times. When it's time to switch, it takes all the time slots for itself, starving the task to be switched, so that it can grab its states, possibly including a swath of cog RAM, and then swaps everything out with the next context, then gives itself 1:16 slots and waits for the next switch time. Only two hardware tasks are required to do this - one for the switcher and one for the active context. This doesn't take any more hardware than an instruction to get and set the task state.

cgracey · 2014-02-26 07:53

Bill Henning wrote: »

Oh good!

I did not see this message until after my previous reply.

That's a great way to go, very flexible, and we can write different sorts of schedules for different applications.

Yes, when a threads slice expires, if there is another thread that is ready to run, the scheduler task will switch threads.

It would be useful to be able to retrieve the other tasks private pointers (PTRA/B etc) and to be able to set them.

Yes, we need to be able to get those PTRA/PTRB values. We could do that in software, but it would be a lot more elegant to be able to get them without becoming that task.

During the time when the 1:16 slot switcher is wating for the next switch time, it could be preparing the next context in another task, so that it ping-pongs between two hardware tasks for the main execution.

Bill Henning · 2014-02-26 08:07

Exactly!

We also need to be able to set them.

Just to confirm: INDA/INDB/PTRX/PTRY are not unique per task, correct?

I wonder if it might not be easier to simply set PTRA/PTRB to be common to all tasks when in multi-threaded mode?

Btw,

I strongly recommend using "tasks" to describe the hardware tasks, and "threads" to describe the threading model.

cgracey wrote: »

Yes, we need to be able to get those PTRA/PTRB values. We could do that in software, but it would be a lot more elegant to be able to get them without becoming that task.

During the time when the 1:16 slot switcher is wating for the next switch time, it could be preparing the next context in another task, so that it ping-pongs between two hardware tasks for the main execution.

cgracey · 2014-02-26 08:14

Bill Henning wrote: »

Exactly!

We also need to be able to set them.

Just to confirm: INDA/INDB/PTRX/PTRY are not unique per task, correct?

I wonder if it might not be easier to simply set PTRA/PTRB to be common to all tasks when in multi-threaded mode?

Btw,

I strongly recommend using "tasks" to describe the hardware tasks, and "threads" to describe the threading model.

You're right about INDA/INDB/PTRX/PTRY not being unique to different tasks. They are all shared.

It might be easiest just to temporarily "become" the task in question, in order to read and write PTRA/PTRB. This gets around the bugaboo of needing to convey which set of PTRA/PTRB you're dealing with.

Agreed about "threads" versus "tasks".

Thanks for you help, Bill.

Bill Henning · 2014-02-26 08:30

You are most welcome!

Regarding becoming the thread:

STOPTASK #task1..3, storereg (stops currently running thread, saves PC, C, Z, lock state etc in a long, scheduling task "becomes" the user thread for PTRA/PTRB purposes)

STARTTASK #task1..3, restorereg (starts thread from saved state in restorereg, restores schedulers normal PTRA/PTRB access)

I am assuming task0 will be the scheduler.

tasks 1-3 could each run multiple threads, however caching performance would be better if only one task ran threads.

Once usage case that comes to mind is:

task 0 - scheduler
task 1 - multi-threaded
task 2 - driver
task 3 - driver

But highest threaded performance would be:

task 0 - scheduler
task 1 - multi-threaded

Which is likely the case I will use

cgracey wrote: »

You're right about INDA/INDB/PTRX/PTRY not being unique to different tasks. They are all shared.

It might be easiest just to temporarily "become" the task in question, in order to read and write PTRA/PTRB. This gets around the bugaboo of needing to convey which set of PTRA/PTRB you're dealing with.

Agreed about "threads" versus "tasks".

Thanks for you help, Bill.

potatohead · 2014-02-26 08:42

BTW: For interactivity, 30hz is human "slightly drunk" perception. Moving a mouse and pointing at something, would be an example of a task where 30Hz is crappy, and noticable.

60Hz generally works, and few people notice.

100Hz will be out of the range of perception for nearly all people, though there are a few who will pick up on the difference between, 60, 80, 100, 120Hz, etc...

Above that, eye / move type interactions appear fluid. Ears are another matter. Best be above 20Khz so irregularities aren't perceived. 100Khz is outside perception overall.

I got the mouse metrics by working with various unix workstations over the years. SUN chose 30, and working on those desktops was always crappy. SGI had a variable rate, and 60Hz was the norm. Some advanced interactive environments required higher to avoid motion sickness and fatigue.

Heater. · 2014-02-26 08:53

potatohead

100Hz will be out of the range of perception for nearly all people,

Except. Run a LED off 50Hz, as in a Christmas light string. Now wave that LED around. You won't see streaks you will see dots. That 100Hz shows up really easily and is annoying as you glance around.
Similar stobe effects turn up in computer and video displays. Try moving your mouse around quickly and see how many cursors you can makeout on the screen.

Any way not much on topic.

Doesn't a Linux timer tick run at 1KHz now a days?

SRLM · 2014-02-26 08:58

potatohead wrote: »

BTW: For interactivity, 30hz is human "slightly drunk" perception. Moving a mouse and pointing at something, would be an example of a task where 30Hz is crappy, and noticable.

60Hz generally works, and few people notice.

100Hz will be out of the range of perception for nearly all people, though there are a few who will pick up on the difference between, 60, 80, 100, 120Hz, etc...

Above that, eye / move type interactions appear fluid. Ears are another matter. Best be above 20Khz so irregularities aren't perceived. 100Khz is outside perception overall.

I got the mouse metrics by working with various unix workstations over the years. SUN chose 30, and working on those desktops was always crappy. SGI had a variable rate, and 60Hz was the norm. Some advanced interactive environments required higher to avoid motion sickness and fatigue.

What about measuring human motion? How "fast" is the human body in "Hz"?

potatohead · 2014-02-26 09:02

You are right about LED's! In that case, two things impact the notice it factor, one is the duty cycle. Increase it, and the Hz required for the perception go down some, decrease it, and the Hz required go up, and of course the motion vector impacts all of that.

I don't have metrics for that case, other than what I've seen out in the wild. Most annoying case is on the back lights of cars. Seriously annoying to glance in the mirror, only to glance back and sift the road and cars from the dots left over... public safety hazard, if you ask me. And there is a clue to our brain image processing there too. Truth is, that same glance with a very high frequency, or constant light source, results in a smear, which our brains factor out near instantly. The blinking LED manifests as a lot of new objects. Both persist, and one can see this in the dark with a light easily. However, it's far easier for us to filter the smears out quickly, where the dots persist and are seen as entities of some kind for a considerably longer time.

Human factors interest me. Now you've kicked off that blinking LED project.

Need those metrics.

potatohead · 2014-02-26 09:12

How "fast" is the human body in "Hz"?

Which part?

You will want to look at this: http://en.wikipedia.org/wiki/Nerve_conduction_velocity just to get started. Really understanding that requires a lot of detail on the various senses, and there is quite a bit about perception we don't know and just use emperical metrics for.

Ever stub your toe, know about it, waiting for the hurt? Yeah, that's your slower body made aware to your much faster mind in play.

The eye has the fastest muscles, a blink is about 150ms, just for reference.

A lot of basic feedback metrics can be had by understanding nerve signaling speed, doubling it to form a closed loop, and then allowing for some perception time. Busy brains respond much slower than excited, focused or fearful ones do. These speeds vary with age and genetics. Some people have double the nerve speed of others. You don't want to go boxing with them, because they literally can see a move start, calculate their response and execute it in real time relative to the slower person.

The various senses differ too. Hearing is from 10-15Hz through something north of 20Khz for the younger, better humans. My own was 22Khz until about age 17 or so. But touch? Touch responds to higher frequencies than that, triggering things like the tickle, erotic, itch, etc.. responses. Things with textures end up triggering touch sense cells at high frequencies, where the basic pressure / temp sense is much slower and less distinct.

People have a hard time differentiating extreme heat from cold when suddenly exposed, for example. A sustained exposure will identify which, this effect is on the order of seconds.

There is a ton of this stuff, and sometimes our limits are surprising!

Heater. · 2014-02-26 09:39

SRLM,

How "fast" is the human body in "Hz"?

No idea but most of my body hurts most of the time now a days.

whicker · 2014-02-26 10:03

Bill Henning wrote: »

Regarding becoming the thread:

STOPTASK #task1..3, storereg (stops currently running thread, saves PC, C, Z, lock state etc in a long, scheduling task "becomes" the user thread for PTRA/PTRB purposes)

STARTTASK #task1..3, restorereg (starts thread from saved state in restorereg, restores schedulers normal PTRA/PTRB access)

...

I would personally prefer the use of TPAUSE and TRESUME mnemonics due to their side-effects.

Although even then it's more like task invade or task retreat but that would raise more than a few eyebrows.

jmg · 2014-02-26 10:47

Bill Henning wrote: »

tasks 1-3 could each run multiple threads, however caching performance would be better if only one task ran threads.

Once usage case that comes to mind is:

task 0 - scheduler
task 1 - multi-threaded
task 2 - driver
task 3 - driver

This raises a detail, that halting task1 to swap-everything, might not want to also steal from Task2.Task3.
I guess in that case, you would use the Atomic ME=100%, then immediately re-adjust to reshuffle 0.2.3 (most to 0), do the swap, and then Restore the old tasking levels.

Bill Henning · 2014-02-26 10:57

whicker:

TPAUSE/TRESUME sound good to me as the names.

jmg:

I was hoping only the specific task that was stopped would be effected

however I believe the likeliest usage scenario is task0=scheduler, task1=multithreaded, task2/3 unused

pjv · 2014-02-26 12:35

Hi Chip, ctwardell, mindrobots, Bill -and others-

cgracey wrote: »

I'm thinking about this task redirecting issue. It will take a number of cycles to switch a task's context, but in preemptive multitasking, how often does that need to occur? I think probably 100KHz would be plenty fast, wouldn't it? If you need communication protocols fast, then use a whole task, but for general-purpose computing where you just want to write a big program without a bunch of cooperative yields, kilohertz is below human perception. Am I missing something?

Just catching up on this thread, and perhaps I can add some practical experience perspective with P1 multi-threading.

Typically up to 8 co-operative threads will happily coexitst, provided only one of them is continually requiring a fast response time. A task switch consumes just over 1 uSec (80 MHz), and takes somewhat longer as the number of threads is increased. One can run 100 or so trivial threads if the threads are simple and not response demanding, such as flashing individual LEDs each at their own pace up to several KHz, and still be able to service one 38,400 baud UART link in the fray. Even with all of that going on, the time granularity is one clock cycle, although due to conflicting demands, the actual delivery (accuracy) of the time depends on how close the threads' timings are set to each other. For LEDs obviously not an issue. Eventhough there may be jitter due to threads competing with eaqch other for processor cycles, there is no loss in the long-term timing accuracy of any thread.

For more realistic applications such as simultaneous multiple UARTs, I2C, keyboard etc. where a reasonable response is required, 8 threads is probably the limit, and task switching is based on the dynamic (as in time variable) need of every thread, and does not occur at specific fixed intervals. At this point I don't pretend to undestand the proposed P2 switching mechanism, but hope that it is not only on (selectable) fixed time increments because that could be detrimental for performance, albeit improvement in jitter issues. There seems (at least from my simplistic viewpoint) to be a lot of complexity proposed that I don't yet understand, so hopefully that is all taken care of.

My estimate from experience would suggest that 4 co-operative tasks could run simultaneously at 100 KHz switch rate on a P1. On a P2 obviously much faster.

As far as pre-emptive switching, I would really not know yet as I'm unsure of the details how all that works; but it sure will be a wonderful new world to explore !!!!!

Hope that gives some insight.

Cheers,

Peter (pjv)

pedward · 2014-02-26 12:36

I want an instruction called TROUSER.

Bill Henning · 2014-02-26 12:45

Peter,

This tumbnail summary of the scheme being discussed may help.

The idea is to have two (hardware) tasks in a cog:

1) A scheduler, running one cycle out of 16

2) Multiple threads running as one task, 15 out of 16 cycles

The scheduler can pause / resume the thread task, and is responsible for saving/restoring the thread state.

Currently two new instructions are proposed for use by the scheduler

TPAUSE savereg,#1-3 ... save the PC, C, Z, etc state of task 1 or 2 or 3 into savereg after halting it, maps that tasks PTRA/B into the scheduler's PTRA/B

TRESUME loadreg,#1-3 ... restore PC, C, Z, etc state of task 1 or 2 or 3 from loadreg, and restart it, restores the schedulers PTRA/PTRB

How the scheduler operates is entirely up to the code running on a scheduler - this gives great flexibility in how tasks are scheduled. It could be as simple as waiting X cycles to elapse, or waiting until some flag says the data the thread is waiting for is ready.

Presumably the scheduler will also pause a task waiting for I/O if the task indicates it wants to yield.

The most common scenario I see is:

task 0 - scheduler
task 1 - controlled by the scheduler, runs threads

Bill Henning · 2014-02-26 12:49

All,

Threads will need a way of voluntarily suspending themselves until some event happens (ie SD data loaded, socket has data)

The simplest way I can think of is to set a "YIELD" bit in a cog location, and loop until that bit is clear

The scheduler can pause the thread when it sees the YIELD bit, and when the waited upon event happens, it can resume the thread after clearing the YIELD bit.

The beauty of this is that no new instructions or logic is required to implement this.

(Think "select" in *nix)

ctwardell · 2014-02-26 13:12

Peter brings up some good examples where higher thread switching frequencies are useful, "GUI speed" isn't always enough.

As Bill points out this can be highly customized since the scheduler is software controlled.

The time slots also won't necessarily need to be equal depending on the situation, that can reduce the time spent switching.

There will a balance somewhere where a user is going to need to decide if threading or separate tasks are appropriate.

C.W.

potatohead · 2014-02-26 13:20

To be clear, I only mentioned the lower bounds for reference.

One UX example would be a haptic device operating in the Khz range to provide "touch fluid" interaction...

Personally, I would love to ensure higher frequencies be on the table, regardless of their context time loss.

Bill Henning · 2014-02-26 13:26

I love the idea of such a small, simple and flexible scheduler.

And no need for any flavor of interrupts

Electrodude · 2014-02-26 13:27

Do normal tasks not work well or at all with hubexec and this is why hardware preemptive multitasking is being implemented? I probably missed something important in the recent deluge of posts.

Can there be an option to control the task switch rate with an NCO?

electrodude

pjv · 2014-02-26 14:10

Bill;

Thanks for trying to help me understand better...... not sure its working though. I'm unsure what you mean by the difference in threads vs. tasks. As well as a HARDWARE task.

As far as the thread SUSPENDing itself, yes that is an absolute must, but also any other thread suspending a thread is required. RESUMing is (currently on the P1) done via a single long in the cog's mailbox location. For eficcient operation there needs to be a fair bit of compact signalling going into/outof the mailbox. I use the mailbox lower 3 (or 4) bits to signal the scheduler (one in every cog) what the scheduler is to do to any thread whose ID is encoded into the next higher 5 (or 4) bits. The scheduler commands are such as SUSPEND, RESUME, DOWNLOAD, JUMP, etc. and any others that appear appropriate for the application. The next 9 bits pass a count parameter or cog address to the thread, and the top 16 bits of the mailbox are (typically) a hub address or other parameter to be passed to the thread. This concept permits many ways of telling the scheduler how to deal with multiple threads through a single long mailbox location. The scheduler polls that location at a rate determined by a configuration value, which of course can also be changed by any thread.

Lots and lots of possibilities..... lots of research to be done on the new P2.

Cheers,

Peter (pjv)

jmg · 2014-02-26 14:16

Electrodude wrote: »

Can there be an option to control the task switch rate with an NCO?

Yes, the example Chip gave has the scheduler checking CNT to pace the switch rates.
The present Task mapping means the scheduler needs 1:16 as a minimum, but most of those single slot will be to do a WAITxx

I'm not sure there is an easy way to lower that 1:16 to an even lower value, in hardware only.

The 1:16 means the main task is given up to 93.75% of CPU resource.

Bill Henning · 2014-02-26 14:20

pjv wrote: »

Bill;

Thanks for trying to help me understand better...... not sure its working though. I'm unsure what you mean by the difference in threads vs. tasks. As well as a HARDWARE task.

Hi Peter,

You are welcome.

task = one of up to four hardware cog tasks

thread = one of N threads running in a task managed by a scheduler (a task running at 1:16 cycles)

pjv wrote: »

As far as the thread SUSPENDing itself, yes that is an absolute must, but also any other thread suspending a thread is required. RESUMing is (currently on the P1) done via a single long in the cog's mailbox location. For eficcient operation there needs to be a fair bit of compact signalling going into/outof the mailbox. I use the mailbox lower 3 (or 4) bits to signal the scheduler (one in every cog) what the scheduler is to do to any thread whose ID is encoded into the next higher 5 (or 4) bits. The scheduler commands are such as SUSPEND, RESUME, DOWNLOAD, JUMP, etc. and any others that appear appropriate for the application. The next 9 bits pass a count parameter or cog address to the thread, and the top 16 bits of the mailbox are (typically) a hub address or other parameter to be passed to the thread. This concept permits many ways of telling the scheduler how to deal with multiple threads through a single long mailbox location. The scheduler polls that location at a rate determined by a configuration value, which of course can also be changed by any thread.

The beauty of the scheme being discussed is that you could write a scheduler exactly as you described ... or many other different schedulers, using the few proposed new instructions (TPAUSE, TRESUME)

pjv wrote: »

Lots and lots of possibilities..... lots of research to be done on the new P2.

Cheers,

Peter (pjv)

Yep, it's going to be fun!

Bill

Bill Henning · 2014-02-26 14:22

I am not sure it's worth adding additional complications to reduce the overhead below ~6%

jmg wrote: »

Yes, the example Chip gave has the scheduler checking CNT to pace the switch rates.
The present Task mapping means the scheduler needs 1:16 as a minimum, but most of those single slot will be to do a WAITxx

I'm not sure there is an easy way to lower that 1:16 to an even lower value, in hardware only.

The 1:16 means the main task is given up to 93.75% of CPU resource.

Propeller II update - BLOG

Comments