In lieu of a presentation in the "Early Adopters" series ...
RossH
Posts: 5,462
Hi all
I feel a bit annoyed that my crappy low-bandwidth, high-latency satellite internet will probably preclude me from ever hosting a presentation in the Propeller 2 "Early Adopters" series, so I thought I would instead package up something I have been working on recently with Catalina - a tutorial on how to take advantage of the parallel processing capabilities of the Propeller in C by turning a classic sequential algorithm into a parallel algorithm. The point is to see how much benefit we can get just by throwing cogs at a problem - using Catalina, of course!
I have written a document that goes through the process step-by-step. Some of the introductory information may seem a bit lame to Early Adopters, but it is intended to be accessible to newbies. It is also intended to be usable by either Propeller 1 or Propeller 2 users.
The document mentions Catalina 4.3 (which is now available on SourceForge) but it is also usable with Catalina 4.2.
Attached is the document, and also the zip file containing all the files you need to work through the tutorial yourself (the document is also in the zip file).
Comments welcome!
Ross.
EDIT: While the process described in these documents is still worth studying for application to other programs, the process of "parallelizing" a serial program such as is described in these documents has now been automated. It can now be accomplished by adding just a few "pragmas" to the original program.
For full details of the new process, including the amended Sieve program, see here.
I feel a bit annoyed that my crappy low-bandwidth, high-latency satellite internet will probably preclude me from ever hosting a presentation in the Propeller 2 "Early Adopters" series, so I thought I would instead package up something I have been working on recently with Catalina - a tutorial on how to take advantage of the parallel processing capabilities of the Propeller in C by turning a classic sequential algorithm into a parallel algorithm. The point is to see how much benefit we can get just by throwing cogs at a problem - using Catalina, of course!
I have written a document that goes through the process step-by-step. Some of the introductory information may seem a bit lame to Early Adopters, but it is intended to be accessible to newbies. It is also intended to be usable by either Propeller 1 or Propeller 2 users.
The document mentions Catalina 4.3 (which is now available on SourceForge) but it is also usable with Catalina 4.2.
Attached is the document, and also the zip file containing all the files you need to work through the tutorial yourself (the document is also in the zip file).
Comments welcome!
Ross.
EDIT: While the process described in these documents is still worth studying for application to other programs, the process of "parallelizing" a serial program such as is described in these documents has now been automated. It can now be accomplished by adding just a few "pragmas" to the original program.
For full details of the new process, including the amended Sieve program, see here.
Comments
I always wondered if 'standard' C is enough to exploit parallellism of P1/P2. Xmos had to create their own 'keywords' for doing that on their chips.
I think that it could be a great learning experience if someone continue on that example with 'Appendix A' showing a full assembly listing, or assembly listing of the critical section, and tricks on how to improve speed.
And as I think that there are also some debuggers out there, it could be great if someone can continue on that example showing an 'Appendix B' with information about how it could be debugged and if there is any tool to make 'profiling' to check which functions are called most and how much time each cog expend on those calls.
I will update the zip file and document.
Ross.
Since fixing this requires all the libraries to be recompiled, I will instead release 4.3 early.
Ross.
I thought about adding new keywords to C, or possibly just some preprocessor directives indicating which parts of an algorithm could be "parallelized" - but in the end I decided that if you are going to do that you may as well design a whole new language.
It is on my "to do" list to make Catalina's BlackBox source-level debugger work with threads - but before I do, I need to spend some time investigating the new debugging capabilities of the P2.
Ha! Famous last words!
I have decided (after exploring several alternatives) that the appropriate method is indeed to just add a few new preprocessor directives to C.
It turns out that this is fairly easy to do, and makes the whole process described in this thread completely unnecessary - it can now be accomplished using just 3 or 4 lines added to the original program source code.
For full details of the new process, including the new Sieve program, see here.
Some might be tempted to call this entire exercise a complete waste of time, but it wasn't - until I had "parallelized" a few different programs, I didn't realize how the new factory/worker paradigm (as described in this thread) would be such a game-changer!
Ross.
Indeed. If only we had the foresight to see how things would look in hindsight, we could probably all save ourselves an awful lot of effort!