Looking for the best approach to parallel processing

ihmech · 2010-08-17 17:39

I am planning a winter project of overhauling a BS2 project into a Propellor project and adding LOTS of stuff to it.

I'm learning the Prop and I have been doing a lot of reading and studying. I'm loving it so far and can't wait to start writing my own code and working on my project.

My question is: What is the best way to look at a tackling a project in the parallel enviroment such as the Propellor?

I'm assuming that I would treat each object kinda like I did with subroutines, ie...what ever the subroutine does, write the code for it, test it, and then tweek it.

After all the subroutines/objects are done, tested, and tweeked, then start writing the main program/top object.

I understand the part about assigning another cog to take care of a task until it is done.
I'm also guessing that I would want to keep a cog active in checking sensors.

Am I correct in my thinking? I love to learn and I am open to any ideas/tips. Thanks!

Mike Green · 2010-08-17 18:19

A lot of the parallelism used in Propeller programming is implicit in the objects that you're using. For example, if you're using a copy of FullDuplexSerial to implement a serial comm channel, the object starts up a cog with the low level serial driver in it and the object handles the interfacing between your program and the driver. Similarly, if you use a copy of Servo32V2 to handle several servo motors, the object starts up a cog to actually handle the control of the servos and the object provides interface routines to communicate with the low level I/O driver. If you're using a PC compatible SD card, the low level SPI driver for the SD card runs in its own cog. If you want to use fast floating point in your program, the floating point routines will use one or two cogs, but you won't notice.

There are existing objects in the object exchange that handle various sensors and analog to digital converters. Have a look at them as examples. Some of them use a separate cog to read and do basic processing of the sensor data and some do not.

It's possible, but not necessary that you apply parallelism to your actual program. Usually for that, you first write your program using a single cog, then analyze the program and its behavior looking for places where parallelism significantly improves throughput without undue complication of the program. The details of that depend heavily on your application and how it behaves.

ihmech · 2010-08-17 18:48

Mike Green wrote: »

There are existing objects in the object exchange that handle various sensors and analog to digital converters. Have a look at them as examples. Some of them use a separate cog to read and do basic processing of the sensor data and some do not.

It's possible, but not necessary that you apply parallelism to your actual program. Usually for that, you first write your program using a single cog, then analyze the program and its behavior looking for places where parallelism significantly improves throughput without undue complication of the program. The details of that depend heavily on your application and how it behaves.

Thanks Mike, as always you are a big help! What you said makes sense. I'm guessing writing my program to first use one cog would help prevent writing sloppy programs that use up too many resources? I never thought about it like you described. Thanks!

Mike Green · 2010-08-17 20:16

It's more a matter that running portions of your program in parallel can introduce bugs that are very very hard to deal with and are unique to parallel processing. First get your program to run in a single processor, then you can figure out where parallel processing can help and you can introduce it a piece at a time, always with a single processor version to fall back to.

I don't count the implicit use of multiple processors where they're used for low level I/O since those objects are presumably debugged already, stable, and they hide the parallelism so it shouldn't affect your code.

Ale · 2010-08-18 01:33

Why do people keep calling the propeller Propellor ?... it comes too often

On topic:

As Mike said, find out where parallel processing can help and then go for it:

What needs to run faster ? how can I divide the input data in a way where multiple processors help ? how can I later combine results ?

There is another scenario: where more than 1 task has to be done simultanously and the latency of interrupts or lack of processing power make 2 or more processors a viable option. (like video generation!).

If you just need 1 faster thread... then the propeller is maybe not the best option...

Graham Stabler · 2010-08-18 04:20

Yes it should be Propellore!

You can think too deeply about "parallel processing" with respect to the Propeller when you first start out. When you work as a team you are working like the Propeller does. "YOU, keep an eye on this sensor and make sure this variable is updated, YOU, read that variable and control the motor based on it" etc etc.

That's the basic idea and when you decide to do some crazy hardcore programming you can look at divide and conquer type approaches where multiple processors work on the same overall task. Then after that the world is your lobster!

Graham

ihmech · 2010-08-18 04:40

Ale wrote: »

Why do people keep calling the propeller Propellor ?... it comes too often

Sorry for the mispelling...spelling and math are not strong points for me and that's why I don't do this for a living. I'm just not smart enough.

Heater. · 2010-08-18 04:43

ihmech:

The approach you outlined in your first post is definitely the way to begin.

Of course you will have an overall plan of what you want to do, so called "top down design". At least I presume you have an idea of what you want to do and what inputs and outputs it will require. But when you start out building the thing, especially if the facilities available are all new to you, the it's good to start at the bottom and build it up "brick by brick" and be sure that all your bricks are tested and working as you go along.

As other have stated, I would not worry too much about parallelism at the out set.

For example, one of the first things you may need is a serial link back to your PC so that you can get some information out of the system. So you drop in the FullDuplexSerial object or similar and get that working. Already you have two parallel tasks going on. One is your test application and the other is the guts of FullDuplexSerial. But, and here is the magic part, you don't have to worry about the parallelism here at all. FDX starts a COG to do it's work but provides a simple subroutine/method interface for your application to use. It hides all the complicated stuff. You have no worries about timing or interrupts, it just works. That is the beauty of the Props multi-core architecture.

And so you will probably continue with other objects, a Real Time Clock driver, say, or PWM interface, whatever. As long as you have COGs free for them to use you don't have to think about it. Just build your app as a big loop as normal.

Now as you go along I suspect you will be looking into the code of those objects you are using so as to gain some idea as to how they work. In doing so you will be learning how to create your own custom objects/drivers should you ever need them.

And finally, when the whole thing is assembled you may find that the thing could really do with running twice as fast, say. At that point you will have a clear idea how to split it up to run on two or more COGS.

However, this seems unlikely. The general model of Propeller applications is:
1) A big application loop running in Spin in one COG.
2) All the other COGS being used for device drivers.

Has anyone here got a good example of an application, not driver, that has had to split to run on two or more COGs?

Good luck and have fun.

Ale · 2010-08-18 07:52

heater:

The best example of parallel processing I can think of is the logic analyzer. Several COGS work simultaneously (synchronized and interleaved) to capture. Well processing... it is just reading a PIN and writting to memory but it is something !

ericball · 2010-08-18 08:38

There are two types of parallel programming:
1. Breaking a single task into multiple tasks which can be performed at the same time. So instead of one guy building a car from start to finish, two guys build two cars side by side.
2. Assigning independent asynchronous tasks to individual processors. So now two guys build the car, but one is working on the suspension while the other works on the interior.

The Propeller excels at type 2, but is less good at type 1. In fact, because the Prop doesn't have interrupts or a lot of dedicated interface hardware #2 is almost a requirement for any non-trivial application. Part of the reason the Prop is less good at traditional parallel programming (which is typically type 1) is because of the cog startup latency.

Anyway, as the previous posters have indicated, you need to look at your project and first identify your external interfaces as each will likely require a cog. (Although in some cases there are routines which can handle multiple interfaces of the same type using a single cog.)

Once you have that done look at the rest of your project and identify areas which are asynchronous and cannot be periodically polled (i.e. edge detection instead of level detection or tight timing tolerances). Split those tasks off into separate cogs. You'll typically have one cog running your SPIN mainline coordination routines - moving data collected by one cog to the next and polling low speed interfaces.

If you have cogs left over and you have something which requires heavy processing, then you can consider breaking it into separate cogs.

ihmech · 2010-08-18 09:55

Thanks for all the input everyone.

This gives me a better understanding on how to approach a project with the propeller. I think I was making it harder for me than it was.

My main reasons for choosing a prop for upgrading my BS2 project was:

1. Needing more I/O pins(a BS2P40 would take care of this problem, but cost is a big concern)

2. I thought this would be a good way to learn the prop. I have no experience with Spin and ASM might as well be chinese.

3. More memory, I maxed out my BS2 memory.

I would someday like to be able to use the prop to it's fullest. But I have a whole lot of learning to do before that happens.

Thanks again for all of the help!

davehein2 · 2010-08-18 10:21

The "best" approach to parallel processing really depends on your application. Some applications require multiple data streams with a data stream dedicated to each processor. Other applications are not I/O intensive, but are compute intensive. They may have multiple processors working on the same data stream or the same chunk of data in global memory. The technique used really depends on your application.

For the most part, the Prop is used as a single processor with multiple co-processors and peripherals. This makes it very flexible for developing for a wide range of applications. The OBEX contains many virtual peripherals such as drivers for serial, I2C and SPI. Co-processors include floating-point and integer math solutions.

Looking for the best approach to parallel processing

Comments