Program Bouncing in Multi-Cog Sysems
Humanoido
Posts: 5,770
I have an interest in program bouncing through propagation by the use of many small programs that get loaded into specific cogs through a variable cog loader, that has access to a waiting array of small exampling programs.
"Program Bouncing" is an expression coined by Ari at the Propeller mini-super computer thread in post number 18. http://forums.parallax.com/showthread.php?129131-Propeller-mini-super-computer
Arrays of these tiny programs can load across the board, get used, then replaced by other program bouncers. I've done some experiments, with programs numbering in the thousands, which appear to have great potential, though I'm still trying to more effectively have a concrete development app that fully harnesses this effect.
The power from such a system could result from high numbers of small exampling programs, perhaps even "subroutines" that each contribute to the whole. Are the limitations simply based on memory storage or can the expansions concern equally small yet self modifying code, or the complete replacement of code after it finishes its work?
The MCP can dish out small programs across multi-cog systems, utilize the programs objectives, and bounce in a new ones, perhaps like a waiting array of programs. The bouncing comes from the availability of a "slot" when any number of programs complete their duty cycles and vanish, making space open for the next. As the size of the machine increases, the magnitude of this method becomes more pronounced.
I'm working with 170 processors for this effect. If this all seems a little uncommon, it's because I'm working on a Propeller-based brain and want to move and recycle thought patterns, in the form of small programs, in and out of the Propeller collective.
"Program Bouncing" is an expression coined by Ari at the Propeller mini-super computer thread in post number 18. http://forums.parallax.com/showthread.php?129131-Propeller-mini-super-computer
Arrays of these tiny programs can load across the board, get used, then replaced by other program bouncers. I've done some experiments, with programs numbering in the thousands, which appear to have great potential, though I'm still trying to more effectively have a concrete development app that fully harnesses this effect.
The power from such a system could result from high numbers of small exampling programs, perhaps even "subroutines" that each contribute to the whole. Are the limitations simply based on memory storage or can the expansions concern equally small yet self modifying code, or the complete replacement of code after it finishes its work?
The MCP can dish out small programs across multi-cog systems, utilize the programs objectives, and bounce in a new ones, perhaps like a waiting array of programs. The bouncing comes from the availability of a "slot" when any number of programs complete their duty cycles and vanish, making space open for the next. As the size of the machine increases, the magnitude of this method becomes more pronounced.
I'm working with 170 processors for this effect. If this all seems a little uncommon, it's because I'm working on a Propeller-based brain and want to move and recycle thought patterns, in the form of small programs, in and out of the Propeller collective.
Comments
By itself, a cog cannot singly solve the task - it can only contribute, and It can only load in (bounce in), do its work, and get out, even delete itself to make room for the next one.
Remember, the HUB RAM is currently 32K and a single chip with 8 cogs and 2K Spin interpreters must be utilized in extremely efficient ways, by recycling.
We know the app is a machine thought. The thought process must be carried out and followed through. This following through is the task. The task is like the game of Chess, in looking forward at the possibility of conditions.
The small exampling contributory programs don't need to constantly monitor a sensor input but rather can sample the input data once at a time, based on a flags of another conditional requirement.
This input data need not be external. Data can come from internal sources, i.e. generated as a parameter from another program (or thought).
There are thousands of these needed because the artificial brain "thinks" of many conditions. It's that "thought" which is a process like looking forward in a Chess game that must unfold with consideration of all the possibilities.
But, we know these possibilities are fuzzy, they depend on the changing conditions of the moment, of the thought, experienced by parameters in that moment (to make it more simple).
What do I know about this so far? and what is working? The Propeller collective is doing a density of one thousand of these small contributory thought programs for every eight cogs - this amounts to 21 boards with 168 cogs and a total of 21,000 thought programs, which fits into our established model range.
The challenge is the overall collective mechanism of knowing the schema of thought solving. I believe it depends on the multi-algorithm model, or a kind of established "thought solving" similar to Conway's Life rule establishment and the aforementioned game of Chess with pre-existing rules.
Like Conway's cell of life, either the thought or the supported bouncing program can live (contribute to the task), grow (take on data), and die (delete or recycle itself).
As for bouncing. What's the big deal?
When I program my Prop a binary image of a program gets "bounced" from my PC into the Prop and then it runs.
When my Zog wants to use a UART it "bounces" a binary image of FullDuplexSerial" into a COG and then it runs. That binary image could well be read from an SD card or fetch over a network. That COG can be stopped and another image bounced in.
So for "bouncing" code into COGs you need:
a) A Loader.
This will fetch or be sent images from other Props or computers over some network. Or it could get them from SD or whatever.
It will load the received image into a COG where it will run.
b) A sender
This will forward images to loaders to be run as and when required.
c) A communications medium for all this to work.
d) A bunch of useful images that can be run anywhere whenever required
e) A means of deciding which image to run where and when.
a), b) c) describe a typical PC to MCU download system like a Prop and Prop Tool.
d) e) is the hard part and so far we have no idea what functionality is required there.
When you have all this in place you will probably find that it takes enough time to send a COG image across the net and run it on a remote Prop that you might as well have run it locally:)
That is to say, if there is a lot of bouncing going on then there is little time for actually computing.
Example \Ex*am"ple\, v. t. [imp. & p. p. Exampled; p. pr. &
vb. n. Exampling.] To set an example for; to give a precedent for; to exemplify;
to give an instance of; to instance. [Obs.] ``I may example
my digression by some mighty precedent.'' --Shak.
Back home in England they have a better word starting with B for this:)
this is not any new concept, it's simply a multi threaded "nested" process....I have never attempted to write such a program....although I have used some similar concepts to control 3 digital potentiometers, each connect to the leg of an RGB led....then simply calling values of each pot equals a hue change, and at the same time altering duty cycle (pwm) allows for a change in luminance....
so essentially a program that is interdependent on another program (this requires calling an external clock ref)....since duty cycle and impedance switching run in the same do loop...and a variation in either property can equal a hell of a lot of variation....like an RGB pixel on steroids....
most modern displays run at 32-bit color gamut....the propeller is that gamut x8....just depends on how patient you are with debugging all of it
same concept for the bouncing program....it relies as being a nested calling of objects....distributed compute
when a task or series of tasks finishes, so can multiple sub-sets....just depends on how much memory you have on hand (at what speed) and how elegant your code is....
When doing English translation work in Asia, it's always important to know what the corporate client requires, American English or British English.
Yes...basically (pun?) just write the management program. Then break that program up into chunks. Save those chunks as callable objects (not one single object). Then nest the chunks in another program/task that has been broken up in the same way. Then load the chunks, containing chunks of other objects into multiple cogs. Synchronize the cogs to run in a linear fashion. So when all cogs finish one process, many other process could also complete.
The problem with this idea I am presenting though, is that it must be run in a linear time line. If one single cog was to fail or the sequence was to get out of alignment, ALL of the nested tasks would fail. Also you would have to dedicate at least one or more cogs to "strapping" the program into other cogs. My guess is that the more complex the nested objects became, the more cogs you would have to employ for the loading and timing process.
That was kind of why I had called the whole concept "pointless" in the other thread. You would just be duplicating a "multithreaded" process. There are much cheaper and faster solutions on the market for doing that on a large scale. The bottom of the scale would be an intel chip (especially one with speed-step, as it uses nested process to call extra power consumption). This is a marriage between chip and software.
It would be much easier to write such a program in Java or C. Since there are so many add on's and existing objects for both languages. You could just strip apart some linux code to see how these nested process take place.
Like I said, I certainly haven't attempted to do such a thing. I am just as much a neophyte in SPIN as anyone else. I do understand the concepts though. I personally like BASIC a heck of a lot more than SPIN, although I understand the reason for both. Neither one is even going to hold a candle to Java Scripting for callable objects. The only downside to Java is that it's a resource hog (and some could argue that it poses a massive security risk).
I would look into high precision timing IC's, and ditch the crystal clock. It has a drift of a few PPM per minute. That is enough to screw up a distributed task, where many Propellers rely on on timing reference. The tighter your timing ref is, the less you would have to call an offset command.
I am fascinated by the concept that you proposed in the other thread. Especially the part about the cluster being self healing and auto distributing. Those are the unique ideas you have introduced. The other stuff is just distributed computing. I think you could use fuzzy logic to help limit the complexity of some of the code, but how you would get nested objects withing nested objects to call (without a central process management "hub") is something I would have to think about a bit more.
Again, I think the biggest concept that you have presented is that these distributed tasks would be non-linear. Oh wait I just had an epiphany!!! This is almost exactly how a digital non-linear reverb works (I am an audio/acoustics engineer by trade). I think we could probably find you a working example of this code you and I are describing, by ripping apart the driver algo. from a simple non-lin reverb plugin. This is a device that relies on a program that runs out of sequence to create a unique sound effect....hmmmmm......
with proper nesting, the bouncing would be the computing.....
and I think you are 100% correct....line item e. (in your description) is the tough part. How to run a non centralized loader and timing ref. is a bear to get your head around *scratches head*.
I think calling the objects from a large RAM pool or memory device would be simple. The memory device could be dedicated to the object resource pool. As you said though, how man i/o resources and cogs would ultimately be required to interface with the object pool, could defeat the entire purpose and make it more efficient to run a self contained program locally.
I think it would be a fun experiment to try to do this with 2 propellers, sharing a object pool....and a high precision clock ref dictating timing to both.
The real key in timing will be to make sure that your wiring is symmetrical in resistance, and that your power source is HIGHLY regulated. Even the smallest drift in timing ref could cause the whole process to become non-linear....
I had brought up the issue (in the other thread) of most people not having the equipment to analyze the timing cycles across hundreds of cogs, and make sure they lay over each other with minimal drift....you would need a MEGA propscope/viewport....that was capable of overlaying hundreds of plots, just to be sure your offsets are correct....also you would need to invest in some really tight tolerance components....I wonder if Parallax would do a "matched set" of Prop chips....since Parallax has never printed a proper sepc. sheet of the Propeller, I don't know what their acceptable tolerances from chip to chip are...
P.S. I am more of a hardware guy than software, so please excuse my over-simplification of the software side of things....I often tend to build hardware sub-systems that require software that is well beyond my own scope (yes I am full of puns)