Shop OBEX P1 Docs P2 Docs Learn Events
Communicating Sequential Processes by C.A.R. Hoare — Parallax Forums

Communicating Sequential Processes by C.A.R. Hoare

prof_brainoprof_braino Posts: 4,313
edited 2011-05-22 19:43 in Propeller 1
Is anybody interested in discussing

Communicating Sequential Processes by C.A.R. Hoare

http://www.usingcsp.com/

http://www.usingcsp.com/cspbook.pdf

Sal Sanci (the propforth guy) has advised that this would be a good resource for learning the ground work for parallel distributed processing on the prop. Its from olden times when the transputer was new, something about how its not necessarily the bible on parallel programming but its very close.

The propforth team might have the inter-prop communications channels implemented in the next couple weeks, and I would like to make an actual application that uses multiple cogs on multiple props working in concert to perform a task in a distributed arrangement.

Although my work will mostly focus on propforth, the techniques in the discussion should be applicable to assembler, spin, and anything else.

The material is a little beyond me, so any input would be appreciated
«1

Comments

  • LeonLeon Posts: 7,620
    edited 2010-10-24 11:23
    I've mentioned Tony Hoare's Communicating Sequential Processes several times here. Apart from his ideas being used for the transputer, the chip which must not be named also uses CSP. I'm not all that sure how feasible it would be to implement CSP on the Propeller, the architecture doesn't look all that suitable. Both the transputer and the chip which must not be named were designed to implement CSP as efficiently as possible, of course. Here is a nice presentation by David May, who designed the transputer and the chip which must not be named:

    http://async.org.uk/async2008/async-nocs-slides/Wednesday/Keynote/DavidMay-Keynote1-NOCS.pdf

    That should give you some idea of the difficulties.

    Implementing channels efficiently on the Propeller will be the biggest problem. They really need to be implemented in hardware, as was done on the aforementioned devices.
  • prof_brainoprof_braino Posts: 4,313
    edited 2010-10-24 12:59
    Leon wrote: »
    I've mentioned Tony Hoare's Communicating Sequential Processes several times here. Apart from his ideas being used for the transputer, the chip which must not be named also uses CSP. I'm not all that sure how feasible it would be to implement CSP on the Propeller, the architecture doesn't look all that suitable. Both the transputer and the chip which must not be named were designed to implement CSP as efficiently as possible, of course. Here is a nice presentation by David May, who designed the transputer and the chip which must not be named:

    http://async.org.uk/async2008/async-nocs-slides/Wednesday/Keynote/DavidMay-Keynote1-NOCS.pdf

    That should give you some idea of the difficulties.

    Implementing channels efficiently on the Propeller will be the biggest problem. They really need to be implemented in hardware, as was done on the aforementioned devices.

    Thanks for the link Leon.

    I don't have time to read all the threads, sorry that I did not run across yours. I am not familiar with many other micro-controllers, so I am not familiar with the chip you are not talking about.

    You are right, the hardware is probably unsuitable, we only got 5.5 thousand characters per second throughput on each of the eight application serial channels; but this is only for fun.

    I think the key point from the presentation you linked might be on the last page: "Emphasis on process structure will replace emphasis on data structures". That sounds really fun! This presentation is a couple years old, it that still the prediction being made?
  • LeonLeon Posts: 7,620
    edited 2010-10-24 13:26
    Yes, in a parallel processing context.

    Follow up David's affiliation on the first page of that presentation (it begins with X) for details of the chip I was referring to. Parallax doesn't like it to be mentioned.
  • pharseidpharseid Posts: 192
    edited 2010-10-24 14:50
    One of the choices you make when creating a MIMD computer is whether coordination of processors is going to be based on shared memory or communications channels. I read CSP back in the early 90's when I was doodling around with transputers and I think it's an excellent book and would give Prop users something to think about, but when you're given an architecture based on shared memory, it probably doesn't have direct application. Maybe for multiProp systems. One thing I can say based on my limited experience, I expected the CSP model to allow me to create a whole new set of bugs in my programs, but that wasn't the case. I don't recall experiencing a single problem with communications channels.

    I don't think many Prop users are clamoring for massively parallel systems, but if they were (and it's probably worth thinking about since the technology is going to allow putting more and more on a chip with time), I'd probably stick with shared memory and go the way of the dominant direction of supercomputers, with hierarchical memory, like local memory, cluster memory, global memory. I think the main programming paradigm for supercomputers today is distributed shared memory and that maps very well to hierarchical memory systems.

    If you get to the point where there are so many cpu's in the system you don't want to deal with them individually, I think you need to go to something like Linda, where you just define the job to be done and the sytem decides what processor is going to work on which part. Something like that also allows you to abstract away from the underlying architecture, so you don't have to write different programs for systems based on shared memory or communications channels. Of course you burn cpu cycles to make things easier for the programmer, but how far are we away from that? Like, "I don't want to explicitly program 20 different processors." Not far really. I would think for the people with multiProp systems, they're already there.

    -phar
  • RossHRossH Posts: 5,519
    edited 2010-10-24 16:07
    Hi prof_braino,

    I'm not a big fan of CSP, but for those who are - or for anyone else who wants to experiment with parallel programming on the Propeller at a higher level of abstraction than is currently possible with either PASM or SPIN - I am in the process of adding multithreading capabilities to Catalina.

    This is currently discussed (with examples) in http://forums.parallax.com/showthread.php?t=126253. I'll add a full description - and a new release of Catalina - shortly. For dedicated CSP-ers, such software emulation via multi-threading may not be the ideal - but unless you want to buy up a stock of antique transputers, it is a very cost-effective way to get started with concurrent programming.

    My intention with Catalina multi-threading is to take advantage of the Prop's existing concurrency capabilities as far as possible (i.e. each thread executing on its own cog) but also add simulated concurrency (i.e. multiple threads executing on the same cog) as seamlessly as possible for programs where 8 concurrent threads is simply not enough.

    Catalina can currently create about 170 concurrent C threads in the 32kb of a normal Propeller, executing across multiple cogs. I could increase this to around 200 with some internal kernel changes, but I am reluctant to do this as it would break compatibility with existing code (such as BlackCat debugger) and it also doesn't seem worthwhile for just a few extra threads.

    While Catalina's thread support is intentionally "lean and mean", it could also be used to re-create either a full POSIX pthreads library (described here - http://www.kernel.org/doc/man-pages/online/pages/man7/pthreads.7.html) or a more CSP-like libthread library (described here - http://plan9.bell-labs.com/magic/man2html/2/thread). The problem with such "heavy" threading libraries is that because the Propeller is so limited in memory space, using such libraries would leave you with so little space left that you could probably only create a few dozen concurrent C threads - barely more than there are cogs!

    As well as supporting the Prop I, I am targeting Catalina multithreading at the Prop II, where it should be possible to execute nearly 800 concurrent C threads - that should be enough for quite a few interesting projects in concurrency.

    Ross.
  • prof_brainoprof_braino Posts: 4,313
    edited 2010-10-25 04:42
    Leon wrote: »
    Yes, in a parallel processing context.

    Follow up David's affiliation on the first page of that presentation (it begins with X) for details of the chip I was referring to. Parallax doesn't like it to be mentioned.

    I found the chip you mentioned. I know I found the correct one, your name appears on most of the discussion pages for it that I saw.

    Do you know if they ever got tforth (transputer forth) ported to it? If it doesn't have a version of forth already running on it, I can't use it for my experiments. I had thought to use tforth, but I could never find anything beyond discussion.

    My thought is to take PropForth in the direction of parallel forthhttp://www.ultratechnology.com/4thpar.html

    Up till now I have not had a pool of processors with a method of coordination, this should be available in a couple weeks.
  • LeonLeon Posts: 7,620
    edited 2010-10-25 04:51
    I think that there is a Forth for it, and Peter Jakacki was considering implementing Forth on it. It can't use the transputer Forth, the chip architectures are completely different.
  • pharseidpharseid Posts: 192
    edited 2010-10-25 14:23
    Professor Braino;

    Have you looked at Jeff Fox' F*? It's primary advantage would seem to be that it presents a very simple model to the programmer. It has its limitations, but I believe the com ports on the chip which must not be named have support for direct memory access of other cpu's. I'm sure Leon will tell us if this is the case or not. If so, it would appear possible to turn F*'s shared memory inside out and directly access memory on other cpu's designated as global.

    -phar
  • mparkmpark Posts: 1,305
    edited 2010-10-25 15:15
    If you guys want to talk about porting Forth to X-chip, please take it to the X-forums.
  • prof_brainoprof_braino Posts: 4,313
    edited 2010-10-26 06:22
    @mpark

    Actually, the discussion is still about CSP. In my case, with regards to PropForth.

    We've just established that the other version of forth that is most similar to the topic under discussion in not available on another micro-controller, so so is not a suitable reference for this work.

    That out of the way, the discussion should return to the main topic of CSP on the prop.

    Right guys? (@Leon)

    @pharseid

    Actually, I'm looking to go the route of parallel forth
    http://www.ultratechnology.com/4thpar.html

    This seems to be the furthest along work of Fox et al.

    Now that I have available to me the proper multi-core processor, and the inter-processor communications channels needed, I can finally get started on this.

    Has anyone actually read the CSP material and want to discuss it?
  • pharseidpharseid Posts: 192
    edited 2010-10-26 12:25
    I read CSP many years ago, but I don't remember anything specifically in the book. I remember comments at the time that the approach in the book was fairly conservative, given a lot of what was going on then to try to exploit parallelism, like functional and dataflow languages. Probably says a lot about Hoare's judgement.

    -phar
  • prof_brainoprof_braino Posts: 4,313
    edited 2010-10-26 17:55
    The first chapter is all about defining mathematical abstractions for the interactions between a system and its environment. It talks about defining a process in advance of implementation by describing a trace of the operations it performs. At least that's what the introduction says, the actual text makes me dizzy. It has a few hundred simple definitions of Logic, Sets, Functions, Traces, Special Events, Processes, Algebra, Graphs, and candy machines. I'm surprised this stuff didn't catch on. Most folks are sweet on candy machines.

    Right away it starts getting into examples in LISP. LISP strikes me as needing more memory than less, list of language for the prop talks about running out of memory after a few lines of LISP. Probably won't switch to LISP for this project.

    I DO like the stuff about processes. Understanding and controlling processes is pretty much the key to everything it seems.
  • prof_brainoprof_braino Posts: 4,313
    edited 2010-10-26 18:18
    pharseid wrote: »
    comments at the time that the approach in the book was fairly conservative, given a lot of what was going on then to try to exploit parallelism, like functional and dataflow languages. Probably says a lot about Hoare's judgement.

    -phar

    I wonder what that means to us now? Was it too conservative, in that it was bogged down in unsuitable paradigm; or was it thorough, methodical analysis, and simply too far ahead of it's time? Lately I've run across discussions of parallel processing, but this is about concurrent processing, which might be more useful. Some of the material looks like it could have been written with the Prop in mind. But then again, everything looks like a nail to a guy with a hammer. I tend to think of projects in terms of whats in my tool kit
  • ErNaErNa Posts: 1,752
    edited 2010-10-26 23:33
    When I first came in touch with the transputer, I had the feeling, all the problems I faced are solved. No more AD 9511 floating point processor, no more incompatibilities event between number representation in ONE operating system covering 8 and 16 bit ( 8080 and 8086/87) microcontrollers, everything gone! But then reality came and if I tried to parallelize a thousand processes doing a single instruction, even the hardware scheduler of the transputer dir surrender ;.)

    Life is a compromise and so are all the ideal ideas limited when there is a real implementation. But we can try to make the best of it. If it is not allowed to eat an apple, why not invent apple juice?
    With the propeller I found a tool, that gave me back a part of my life, and I am still very enthusiastic, even without P][. I sometimes tried to propose a mechanism, which allows to transparently distribute task in a network of processors, even if these are located in different chips. I'm also willing to read Hoares book and gain a better understanding of the mathematical and logical background and find relations between theory and a real, physical implementation. Further I design a board "coreprop" intended to put together propellers in a unified way, but I have to change the connectors, because they are not reliable and too expensive. So very basic issues. It alway the horseshoe nail. And the time available. And the tools, used.

    So I am very interested in starting up such an project. And I even don't hesitate to use the Xword, because we always should do the best with what we have!
  • prof_brainoprof_braino Posts: 4,313
    edited 2010-10-27 18:34
    In my case, I don't need floating point, I usually find scaled integer is more accurate and way faster for what I'm doing (but that's just me, I'm not very good at math). I don't have plans for any hardware scheduler, the experiments seemed to say it isn't needed.

    The goal here is not to parallelize per se, that is, I don't want to has a bunch of cores doing the exact same thing in lock step. Rather, the goal is to get a large problem, such as calculating a 1024x768 fractal, break it into subtasks and distribute the work over N cogs, and have them all work concurrently. The one that finishes first gets a new chunk to work on , and so on. Two cogs should be fast than one, and ten should be faster than two, and twenty should be faster than ten, until we get to a point where adding more cogs doesn't cause a decrease in execution wall clock time. If this point is not seen on 8 cogs, maybe we would get Humanoido to try it on his tower or something.

    This is what I have in mind.

    I'm up to chapter two so far, the material does not read quite like L4D2 "The Sacrifice"
  • ErNaErNa Posts: 1,752
    edited 2010-10-28 02:40
    You misunderstood what I was saying, at least a part. Parallelism is a concept. And this concept must fit a certain task. If a task is strictly serial, it can not be parallelized. And then there are problems which are inherently parallel, like vector operations. Then you need special hardware. What I wanted to say is Hoare's theory gives some guidelines, how to synchronize parallel running tasks using elements of a programming language, especially designed to do this. In Occam language these elements are channels and in Spin/ assembler we have locks. But when doing this job, it has to be transparent, how the processes are distributed onto the processes, and the program should not care about the localization, this has to be "parallel". So what we need is a simple and effective transportation mechanism for informations. One cog should have access to another cogs shared variables, even if the cog is in a different chip. With only 8 cogs, you very soon reach an stop when parallelizing.
  • SapiehaSapieha Posts: 2,964
    edited 2010-10-28 07:10
    Hi ReNa.

    It is for that I mentioned to Chip ---> One of side step of my ideas
    Very fast serialization possibility's in Prop II and Internal Ports to communicate between COG's

    ErNa wrote: »
    You misunderstood what I was saying, at least a part. Parallelism is a concept. And this concept must fit a certain task. If a task is strictly serial, it can not be parallelized. And then there are problems which are inherently parallel, like vector operations. Then you need special hardware. What I wanted to say is Hoare's theory gives some guidelines, how to synchronize parallel running tasks using elements of a programming language, especially designed to do this. In Occam language these elements are channels and in Spin/ assembler we have locks. But when doing this job, it has to be transparent, how the processes are distributed onto the processes, and the program should not care about the localization, this has to be "parallel". So what we need is a simple and effective transportation mechanism for informations. One cog should have access to another cogs shared variables, even if the cog is in a different chip. With only 8 cogs, you very soon reach an stop when parallelizing.
  • ErNaErNa Posts: 1,752
    edited 2010-10-28 10:27
    Yes, I'm curious, what P][ will do. But that doesn't release us from finding the right usage. Like always. And surely there will be good reason to have a uniform mechanism. By the way: could you give some support for the redesign of the core prop board?
    ErNa
  • pharseidpharseid Posts: 192
    edited 2010-10-28 17:34
    You can break parallelism down into different types; data parallelism, in which you apply the same steps to each item, specialist parallelism, where each processor is dedicated to some specific task, as in pipelining, and agenda parallelism, where problems pop up in random fashion and are handled case by case, as might be in banking software, where random (as seen by the program) customers are requesting random operations on their account. While you're not trying to do things in lockstep in your fractal example, I would still look at it as data parallelism because you're applying the same steps to different data.

    -phar
  • prof_brainoprof_braino Posts: 4,313
    edited 2010-10-28 20:05
    ErNa wrote: »
    misunderstood a part. Parallelism is a concept.

    there are problems which are inherently parallel, like vector operations. Then you need special hardware.

    In Occam language these elements are channels and in Spin/ assembler we have locks.

    But when doing this job, it has to be transparent, how the processes are distributed onto the processes, and the program should not care about the localization, this has to be "parallel". So what we need is a simple and effective transportation mechanism for informations. One cog should have access to another cogs shared variables, even if the cog is in a different chip. With only 8 cogs, you very soon reach an stop when parallelizing.

    Ok, we might have transparent channels. Right now it looks like accessing a cog on the next prop is the same as accessing a cog on this prop..

    instead of the one prop version to execute a command on a neighbor cog:
    c" 27 35 MyCommand" <cog#> cogx
    

    the multiprop version is
    c" 27 35 MyCommand" <prop#> <cog#> cogx
    

    There are eight channels so there can be full duplex communication with 4 props or eight half duplex channels to eight props, max (Or I guess there could be eight parallel channels at one prop, I hadn't thought about about that till just now). I'm sure we'll try experiments with a mix of both full and half duplex channels.

    I'm curious about the special hardware statement. If we already had the channels, and don't claim a need to keep all the props synchronized beyond that which is needed for the channels (which is achieved by the standard crystal), do we still need special hardware or can it be implemented in software?

    Right now it looks like there might be a nice way to have eeprom-less slaves, wherein the master wakes up and notices being master, and assigns the slaves ID and uploads their executable image to them.
  • prof_brainoprof_braino Posts: 4,313
    edited 2010-10-28 20:37
    pharseid wrote: »
    You can break parallelism down into different types; data parallelism, in which you apply the same steps to each item, specialist parallelism, where each processor is dedicated to some specific task, as in pipelining, and agenda parallelism, where problems pop up in random fashion and are handled case by case, as might be in banking software, where random (as seen by the program) customers are requesting random operations on their account. While you're not trying to do things in lockstep in your fractal example, I would still look at it as data parallelism because you're applying the same steps to different data.

    -phar

    I need to get my head around the special considerations for data parallelism vs specialist parallelism vs agenda parallelism.

    So data parallelism is the fractal; whatever the data, the same steps are applied. I think this is the easy one, at least to understand the concept.

    Specialist parallelism is pipelining; so this is for example a particular unit in the bunch is assigned to handle a specific piece of hardware. This does sound applicable, as the I/O pins not used as communications channels are intended to be used as general I/O to devices as on a single prop. So a bunch of eight props with all communications channels assigned would have 8 * 24 = um, that looks like 192 pins for devices. That should be enough for a little while. I'm thinking something like a GPS, compass, accelerometer, temp, humidity, photo-diode, SD card, extended ram, VGA, Keyboard, mouse, mic, audio, servos, and ESC, and whatever else I have in this box here, all hooked up at once on dedicated pins and running full speed. I don't know WHY I would do this, but it seems fun. Having specialist parallelism doesn't seem to necessarily contradict having data parallelism, there could be a pool of cogs processing data for a variety of dedicated hardware. Cool enough so far!

    Now this agenda parallelism. I don't see the prop in this banking example, mostly because I don't see the application software I would write for it. Is agenda parallelism different from the other two in that the problem domain is not common to all tasks (did I say that right?). If many bank customers chose different transactions from the same menu, is this still agenda parallelism? Could you make another example for this one, I've confused myself. Or is this type where we start thinking about heuristics and AI etc?

    This is good discussion. Thanks for contributing this, this what I was hoping for. I can feel my brain getting exercise.
  • ErNaErNa Posts: 1,752
    edited 2010-10-28 23:29
    It is always good to know, what someone is talking about! ;-)

    If you have data and instruction as two independent items, and single and multiple as two properties, you easily can created this set:

    single instruction, single data
    single instruction, multiple data
    multiple instruction, single data
    multiple instruction, multiple data,

    or, in a shorter representation: 00, 01, 10, 11.

    As it seems the simplest task, science first looked to single instruction, multiple data. That mostly is pure number crunching. Like inverting matrices, vector operations, ... This just needs specialized hardware.

    The propeller is a tool to solve 11-problems. As we are taught to serialize a parallel problem, because in history we only had serial computers, and as we are proud to be able to do this, we do not see, that the more natural solution is processing in parallel. Like in every days life, we can divide our workload in small junks and waste time in organizing the work flow. That is exactly the way, a scientist does research ;-))
    OK, he does it as an example and searches for the best way, so necessarily there is a lot of nonsense done in gaining a result. No criticism.

    I use the propeller to control a motor and run different tasks in parallel, one manages the display. So I have to pass parameters from task to task and do this in a non ideal way, but always following on scheme. So I do not call different processes with a different number of parameters, but I start a process and pass a list of addresses to him, where the number of elements is in the list. That works very well.
    Now I found, that having access to the global memory with the physical address allows me to access to single values, to arrays, to pointers, to pointers to arrays. That makes the software really flexible and powerful. But if you own such an instrument, it still takes exercise to have the mechanism present, and to have an overview.
    I have the strong feeling, that, knowing the physical address of a spin variable in global memory, it should be even possible to exchange data from spin to spin without using a global read/write. That may be confusing: local access to global memory?
    How that? Normally we still have a main spin program and this program starts cogs and passes pointers to variables to them. So the started process knows the address of the variable in the main program, the "global" memory. If the process want to hold a "local" copy of that variable, and more of them are running, than they each have such local copies and can not directly access to the others copies, only using global data share.
    Now, eight processes is a very limited amount, if there is a real life problem and, e.g. for home automation we need a distributed system. But I do not want to change the software, when the configuration changes. It needs some networking without a multi level protocol to connect processes. That is surely not simple, but worth to think about.
    By the way: since I see your avatar continuously, I have problems to order a pizza frutti di mare. Don't you have a better alternative?
    ErNa
  • Heater.Heater. Posts: 21,230
    edited 2010-10-29 01:22
    Erna, pharseid, prof_braino:

    There seems to be some confusion going on here between hardware and software/algorithmic aspects of parallelism.

    Down at the hardware level we have such ideas as SIMD, MIMD as outlined by Erna. These are really aspects of processor design.
    Do we operate one pair of integers/floats at at time with one instruction as in ADD X, Y, as in a normal CPU?
    Or do we operate on two, four, eight... pairs of operands simultaneously, as in Intels SSE instructions http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions?
    Or do we operate on large arrays of operands in one go, as in vector processors http://en.wikipedia.org/wiki/Vector_processor?

    At a lower level there is even pipelining and parallel execution of the different instruction phases, fetch, decode, operate write result etc.

    Then there can be hardware support for shared memory systems or support for channel communications between processes.

    Mean while up at the more abstract software/algorithm level there are different ways of breaking up problems for parallelism.
    Do you take the CSP approach, no shared memory between processes and channel communications between them. Or do you take the shared memory approach?

    CSP itself is some what abstract. A way of looking at algorithms. By removing the possibility of shared memory it makes it possible to reason about algorithms more easily. It becomes possible to transform algorithms like you can transform mathematical formulae.

    The terms "specialist parallelism", "pipelining, "agenda parallelism" used at the higher abstract level of thinking about and implementing a parallel algorithms to problems. These basic paradigms for parallelism are defined nicely here[FONT=Times New Roman, Times, serif][/FONT][FONT=Times New Roman, Times, serif][/FONT]http://www.lindaspaces.com/book/chap2.htm.
    (Although they use "result parallelism", "agenda parallelism" and "specialist parallelism")

    These "basic paradigms" are applicable to all software problems whether they are actually running on hardware that supports parallelism or not. And perhaps without regard to whether the hardware supports shared memory or processes and channels.

    I see no reason why result, agenda or specialist approaches are not all applicable to the Propeller (or multiple Props.)






  • ErNaErNa Posts: 1,752
    edited 2010-10-29 03:02
    Heater, you are right. In the past I often had discussion, what "parallel" really means. It is not simple and needs a lot of experience, to think in "abstractions". Whatever color, if only black ;-) Working with a transputer, people tried to convince me, that this is not "true parallel". But that was before the MATRIX, now everybody knows, that we are only a sequential simulation. To me the answer is: something is running in parallel, if there is no way to determine, that it is not running parallel. So, if you call a function, you just wait, until the function returns a result. In a parallel environment you actively have to wait (or do something else), in a serial one you just do not exist, until the function returns. And CSP is just a way to wait and do something else in between -you send a message and as long as the buffer is empty, you don't care about waiting- or a scheduler uses your waiting to have someone else running. The question to me is: how can a person, who perfectly manages his life, delegate responsibility to a software engineer, who is taught to strictly think SEQUENTIAL ;-)))

    OK, the propeller brings a different experience to all the people and so there is hope!
    But lets go on. I MUST have something to present next UPEW.
  • Heater.Heater. Posts: 21,230
    edited 2010-10-29 04:31
    Erna,

    Nothing hard about "parallel". In the context of this discussion it's just "doing more than one thing at at any given time". Of course that is a very general statement and applies at all levels of detail from the flip-flops that make up the storage cells in your CPU registers to the Google search engine farm. Pitching it at the right level for any given discussion might be a bit more tricky.
    Working with a transputer, people tried to convince me, that this is not "true parallel".

    I wonder what they meant and what they thought "true parallel" means". Clearly a single Transputer is only operating sequentially, at the program level, even if your program contains many threads (Say, using PAR in Occam). But that's not the point, the Transputer, for the first time, made it trivially easy to hook up many processors that would then operate in parallel. The Occam language made it easy to write a single program that would be executed on many processors. When you have a hundred CPUs all munching away on the Mandlebrot set, or whatever, from a single program, how could they say it is not "true parallel"?

    ...CSP is just a way to wait and do something else in between...

    No. One can "wait and do something else in between" without CSP. In a single processor shared memory system with a multitasking operating system it happens all the time.

    CSP is a bit more rigorous. If it were not there would be no point in having it. For example: It introduces the rule that a processes is not allowed to access the data space used by any other process. No matter if those processes are executing on a sequential machine with a scheduler or on multiple CPUs with shared memory or on totally separate CPU/memory systems.

    Why make such a rule?

    Consider the evolution of programming languages...

    Originally we had assembler and primitive languages like BASIC. They made programming easier and had few restrictions. All variables/data were probably global, meaning that any part of the program could read/write any part of the data. There was JMP/GOTO such that all points in your program are essentially global, meaning that at any point in the code it would be possible to tranfer control to any other point in the code.

    This enables one to get programs written more easily but without care leads to chaos and makes it very hard to reason about a program or test whether it works or not.

    For example if a write a subroutine "A" which gets input, provides output and uses its own data on the way. Then I can read it and think it probably works. I can test it and find out it does work for a lot of test cases. But now I write a function "B" that may uses the same data as "A". There comes a problem. If the combined system fails some times because the data is messed up, is it "A" or "B" going wrong?

    When we get to routines "C", "D" "E" etc etc this becomes impossible to work with. As a result languages introduced the idea of local variables, the idea that functions should not have side effects, the idea that global data is not a good. And so on.

    Similarly GOTO is frowned upon. It's very useful but turns out it's better to partition your code into isolated junks that you can't just jump into or out of at any place. This makes code more easily relied upon, reasoned about, tested, and reused.

    CSP is carryiing on in that tradition of organizing software design by introducing some more simple restrictions for the situation where things can be done in parallel and on multiple processors.
    ...to a software engineer, who is taught to strictly think SEQUENTIAL.

    Is that really true? I have no idea about what they teach in CS departments nowadays. I did not do CS but back in the 70's my friends who did were well into getting stuff to happen in parallel. That's what operating systems have been about for a long time. The problem has always been one of hardware availability, computers were big and expensive. It's only relatively recently we are getting the luxury of multi-core PCs, multi-core micro-controllers (Thank you Parallax) and the means to build parallel systems for the masses of software engineers to tinker with.
  • prof_brainoprof_braino Posts: 4,313
    edited 2010-10-29 05:39
    Sapieha wrote: »
    It is for that I mentioned to Chip ---> One of side step of my ideas
    Very fast serialization possibility's in Prop II and Internal Ports to communicate between COG's

    Hi Sapieha

    If this is something you are working on, would you be will to give feedback on the implementation when it is available? The earliest would be a couple weeks.

    Of course, its only on the prop 1 and is only several single serial channels between separate props. But I would be interested to know how close it comes to achieving your target and how it is different from your concept.
  • Heater.Heater. Posts: 21,230
    edited 2010-10-29 05:57
    I think both Sapieha and I have suggested this from time to time. Both inspired by the Transputer I guess.

    The Prop I has very fast serialization output possibilities. That is the video hardware. Sadly it does not have any corresponding fast input serialization.

    So the idea is that the Prop II have a more flexible output serializer hardware and the matching input serialization. We are talking about shifting bits in and out at the chips clock frequency.

    As this is on a COG by COG basis it opens the possibility to have fast serial channels between COGS on a Propeller or from COGS on one Propeller to COGs on another.

    So then, run out of COGS? No problem, just slap on another Prop chip and hook up the serial links. All of a sudden it does not matter if that FullDuplexSerial or Mandlebrot set process is on this Prop or some other one.
  • ErNaErNa Posts: 1,752
    edited 2010-10-29 06:14
    Hi Heater, again I mostly agree and just have some remarks: When I worked with transputers, I always had to explain, that occam is a language based on parallel running processes and it doesn't matter, if there is one cpu or a thousand. And what about the dual- quad, octa- ...multicore hype? What about seminars on debugging multi core applications. When you leave university, you know about loops, arrays, ... Sure, you were taught much more, but you passed examination, that's it. If "normal" engineers had a deeper understanding of what's going on, the Propeller would not be known by those enlightened ;-) OK, I do not hesitate, I hope we find a way to have the cogs communication better than todays standard.
  • pharseidpharseid Posts: 192
    edited 2010-10-29 07:05
    To really understand the classification system I mentioned, you should go to the original paper I got this from "How to write parallel programs: a guide to the perplexed" by David Gelernter. I think the most familiar example of agenda parallelism is the task management on a multitasking operating system. You have a bunch of tasks, you may (or may not) have a bunch of cores to run them on, no core is dedicated to a particular task, the operating system decides. If all those tasks were generated by the same program, you would have an example of agenda parallelism.

    Over the years it has struck me that that is an excellent way of looking at parallelism because we have proven examples of hardware/software specializing in each type. The most familiar example of data parallel hardware is probably the MMX instruction set. SIMD parallelism on a grander scale was employed in the massively parallel SIMD machines of the late 80's, like the Massively Parallel Processor and the Connection Machine. The languages on those machines would seem to meet IrNa's wish to be able to perform the same operation on many data items with one instruction, but do not handle the other types of parallelism well. There was an article in The Journal of Forth Application and Research on MPP Forth; I would imagine Professor Braino would find that interesting. I met a guy who programmed in MPP Forth on NASA's MPP years ago, he got me interested in SIMD machines and I eventually built a simple SIMD machine with PLD's and memory chips. One of my bucket list items is to learn to program FPGA's and build a bigger SIMD machine someday.

    The Propellor/Spin is a good example of an architecture for implementing specialist parallelism. You create specific tasks, assign them to specific cogs. You don't have the simplicity of dealing with a processor array as a single unit like a SIMD machine, but it's much more flexible.

    Finally, as an example of an agenda machine, you have Linda and the various architectures that it runs on. It eases the burden on the programmer by removing the requirement of delagating tasks to specific processors, you just dump data and tasks into tuple space and let the system sort out what does what. As I mentioned earlier, if we get massively parallel MIMD machines, I think you need something like this, but how do you handle situations when you want to use hardware available on a particular cpu, like a specific I/O port?

    Of course, a language running on any MIMD machine should be able to utilyze all of software constucts of all these examples, but would that make it the ultimate parallel programming language or a bloated monstrosity that encourages programmers to write in styles that run poorly on their particular machines? I've thought about this, but trying to implement it isn't even on my bucket list (maybe in my next incarnation?)

    -phar
  • LeonLeon Posts: 7,620
    edited 2010-10-29 07:40
    The transputer and the chip which must not be named are typical MIMD machines, and allow I/Os on any core (and thread) to be accessed.
Sign In or Register to comment.