I was wondering what had happened to him, perhaps we've convinced him that there isn't much point in what he's trying to do. I told him a long time ago to forget about using Propellers and simply simulate the whole thing on a PC.
Many years ago I supervised a student MSc project using an ANN to recognise sign-language gestures produced by someone wearing a DataGlove, interfaced to a PC. I gave the student some C code for a simple ANN that I had previously typed in from a book to get her started.
Humanoido is not one to give up, especially after all the research and work he's put into this project -- a true labor of love. My hope is that he's busy working on some demo code that he can share with us. Until then, I prefer to reserve judgment on the merits of the project itself. It's really not fair, at this point, to say whether or not his methods and objectives are realistic, since there's so little concrete info to go on. I'm sure he will find that, when he transitions to sharing schematics and source code, the community will rally to support his efforts, as it has in his other, less speculative threads.
One thing is undeniable; this is one well-viewed thread at 21,500 hits. If Humanoido had gotten paid for each hit, he would be a rich man and he could be have a fully-funded research lab. Something to consider for future posts...
I told him a long time ago to forget about using Propellers and simply simulate the whole thing on a PC.
I think whether or not it could be done better with a PC is besides the point. I'm guessing Humanoido enjoys working with the Propellers. I know I do.
Ever since seeing Humanoido's tower of Props on the Boe-Bot, I've been looking for a problem to solve with many Props. Not because I think many Props could do it better or cheaper than using a PC but because I think it would be fun.
I'm waiting for Jazzed to sell his TetraProp boards so I can experiment with 20 Propellers (I plan to buy five boards).
I doubt I'll pursue a neural network to solve my yet to be determined problem. I have been thinking about, what Jazzed once mentioned, using an eight bit data bus between Props. I'm hoping to impliment something like that myself.
While I still haven't decided on a problem to solve, I suspect it will probably deal with machine vision and/or some sort of display technology (pretty vague, I know). It's been fun to think about.
I beleive Dave Hein's Lerner program was inspired by Humanoido's work. That makes at least two of us.
I think Humanoidio should be allowed to continue "his brand of progress" without my or others interference.
So, I'm inclined to stop posting in this thread and move the "functional" discussion to another place such as the TetraProp thread after this response if that makes sense.
We've all heard that there are left brain logic/precision/fact tending and right brain emotion/approximation/creativity tending thinkers. Humanoidio appears to be more of the creative type so asking him to switch modes is probably wasteful.
... So yes I was thinking an RPC mechanism would be used to enable the matrix operations. However, a distributed system like this is a whole lot of overhead for a math problem. ...
Very nice higher level methodology description in your post. Yes, an array is a much simpler organization mechanism. I can see where simple set of RPCs can be a benefit for the math operations with respect to a centralized "change" dispatch for the distributed system.
I think part of the problem better solved by a distributed network is one of locality. An "intelligent" and adaptable robot will require a distribution mechanism for having mechanical controls and sensors not all being connected to a central core.
Neural nets will probably never achieve any "intelligence" in isolation. It is likely that multiple roles would need to be assigned. I would like to study Genetic Algorithms "GA" more since there may be some behavioral glue in that. There are micro language interpreters (Urban Muller's BrainF*** for example) that can fit in a COG with lots of room to spare that can execute small programs that could be written and modified by GA.
Finally, two questions. Is it worth populating the center of an array with props since it will be difficult to bring those pins to the edges for use as I/O? Can they be used for processing data?
Perhaps the vision ideas you mention are high enough priority to have more connections in the central array. Maybe higher level connectivity does make sense. I have an idea where a Super HUB can have operations manipulated in a round-robin fashion as problem and messaging storage.
Jazzed, I agree that moving this discussion to another thread is a good idea so I will start one titled "Simulating neurons with an array of props" in the propeller forum and put my response there.
The Journey of the Big Brain Is it the journey or the destination?
The Big Brain project is one of gained new experiences, knowledge and understanding. It's a new project path taken in this new day and age with new technology (and old technology applied in new ways - which is perfectly acceptable).
Much of the project is about what we can develop and initiate and discover along the way. It can be compared to a great voyage. When Columbus set sail, it was to find a faster route to the Indies and not to discover America. America just popped up along the way. Who knew? I propose that we too are on a great journey and know not yet what we will learn, find, and discover along the way.
It's a big field and no one person can be expected do it all, or have all the requirements to accomplish all the goals of everyone's opinion. To make such great strides in this world, we need left brained people as well as right brained people and combined diverse talent and skills for emerging understanding and accomplishment.
When NASA Mars rovers Spirit and Opportunity explored the surface of Mars it was because of grand cooperation of thousands of scientists across the world - it led to Opportunity celebrating its 7th birthday at Santa Maria crater, a name embedded in Columbus' great journey.
In this great world, we need a diversity of people who are all important in the grand scheme of life - who can dig ditches, operate taxis, build bridges and computers, run companies, design tall skyscrapers and elevators, manufacture and design chips, and check out food in grocery stores, just as much as we need auto technicians, artists, philosophers, thinkers, engineers, secretaries, dreamers and programmers.
It is when we can draw upon and collectively bring together such people resources that great things begin to happen, i.e. people working together in positive constructive ways can make a big difference in this world. Do you think the New World would have been discovered if Columbus' crew mutinied half way there? No! it would have held back progress and the science of world trade and exploration.
So is there a killer app? We don't know. Will the journey lead to one or will one develop along the way. We do not know. I do know thus far, from my own perspective, the path is very rich and fully embodied with a constitution of new learning, discovery and ideas with new ways of looking at things and what could be more valuable this or making new friends along the way?
In communications with Parallax Forum member Michael O'Brien, a man who I consider to be a great thinker of our time, in reference to ongoing development of his 80-Propellers Parallax Machine, he explained, "My hardware is a form of relaxation and necessary creativity...even if it takes me years to complete a project."
So I propose that some projects are about the journey and not the destination. You see, when the destination is reached, the journey is ended, over, and nothing more is gained. So be careful what you wish for.
In the words of poets and inspirational philosophical thinkers:
"The road of life twists and turns and no two directions are ever the same. Yet our lessons come from the journey, not the destination. -Don Williams, Jr. (American Novelist and Poet, b.1968)
Focus on the journey, not the destination. Joy is found not in finishing an activity but in doing it. -Greg Anderson (American best-selling Author and founder of the American Wellness Project., b.1964)
Success is a journey...not a destination. -Ben Sweetland
Inspirational Words of Wisdom/ It's The Journey That's Important
A gentle reminder to us all that life is about living. We can at times find ourselves rushing and forgeting the important things that life has to offer. The Poet, John McLeod, expresses these thoughts in his words of wisdom throughout this poem. http://www.wow4u.com/journey/index.html
It's The Journey That's Important...
Poet: John McLeod
Life, sometimes so wearying
Is worth its weight in gold
The experience of traveling
Lends a wisdom that is old
Beyond our 'living memory'
A softly spoken prayer:
"It's the journey that's important,
Not the getting there!"
Ins and outs and ups and downs
Life's road meanders aimlessly?
Or so it seems, but somehow
Leads us where we need to be,
And being simply human
We oft question and compare....
"Is the journey so important
Or the getting there?"
And thus it's always been
That question pondered down the ages
By simple men with simple ways
To wise and ancient sages....
How sweet then, quietly knowing
Reaching destination fair:
"It's the journey that's important,
Not the getting there!"
Improving Neural Injector Data Line Signal Waveforms
During reconstruction of a new supporting Quick Brain (another 5-prop brain used for quick testing but with added enumeration test interface), it was discovered that touching the data leads produced a change in signal. If you did not touch the data leads, the machine would not inject (this would happen only with some specific chip). If you touched the leads and the body acted as a ground, the signal improved and injection functioned.
There appeared to be a greater sensitivity to this effect in some Propeller chips over others which was either related to the Propeller chip internals, the chip's physical nature, the length of the wiring, the solderless breadboard or all or some combination of the above. Replacing the chip with another could sometimes improve the signal waveform but not always.
For additional analysis, the solderless breadboard in question was totally disassembled to discover its hidden construction and determine if it contributed to this effect. These breadboards do appear to have a direct correlation to circuit functioning. More information will be provided about this analysis at a later date.
To improve the circuit and make it very stable, route a CT4-0805-Y5V (63V-104-M) 10nf multilayer ceramic disc capacitor (obtained from Nanjing Chiyang Electronics) from each chip's orange injection data line to ground. Additional filtering capacitors were installed, a 3300uf electrolytic and 10uf across each power bus. Install decoupling capacitors on BOTH sides of each P8X32A-D40.
[insert schematic here] The Neural Injector has these wiring improvements
to improve reliability of data signal waveforms
Originally Posted by Mike G I'm having a hard time understanding ...sounds like Dr. Mario or Humanoido know how to interface the Prop with an AMD Radeon HD video card. Dr. Mario or Humanoido, can you or have you interfaced the Prop with a AMD Radeon HD video card?
Looking back, a post is dedicated to this topic. Bringing it up to date, yes, both Dr. Mario and myself have developed differing ideas to use GPU cards in relationship with large Propeller-based projects - in particular the massive Dendou Oni Supercomputer being designed by Dr. Mario, and the emerging Propeller-Based Big Brain.
I'm currently working towards providing a common platform between the Big Brain Propeller collective and AMD Radeon. To answer your question, is this completed? No. Can I do it? Yes, I have faith it will be accomplished. However, it's going to take some time as I'm not in any kind of hurry and there could be numerous pre-processes required to get up to speed.
I think this will be an exciting project because I worked on something similar in the past which was very successful. There are incredible gains from merging GPUs with Cogs, such as speed, number of core computers, etc. so it's worth looking at it.
Also remember that our props were working in the realm of GFLOPS (as an approximation of FP to FLOPS-see other posts) with big machines like the UltraSpark 40, however a technological merge with AMD can result in a quantum leap up to the TERA-FLOP realm! The next level is Peta-FLOPS where the majority of supercomputers reside today.
The props in the Big Brain with partitions comes out to 24,000 MIPS using the Parallax data sheet. You can equate that to 24 Billion Instructions Per Second and make any additional conversions you want.
More information about Radeon is found here: page 33 post 650
I don't have a drawing program yet. A block diagram would show three Big Brain Partitions to the left, an adjoining interface, a computer host for software programming, interpretation etc., and a high end GPU card divided into an array. Also to diagram, channeling for software, wiring taking place with software, the hard NMI and Enumerator, neural processors, and soft versions of NMI and Enumeration.
Also remember that our props were working in the realm of GFLOPS (as an approximation of FP to FLOPS-see other posts) with big machines like the UltraSpark 40,
What? I really do not believe this.
I've just been looking over Lonsocks F32 floating point object. It's the fastest, smallest floating point object we have I believe. Now for a floating point addition F32 has to run through approximately 100 PASM instructions. If the Prop had nothing else to do we have 8 COGS at 20MIPS = 160MIPS available. That is then 1.6 MFLOPS. Let's be generous and call it 2 MFLOPS.
A collection of 40 Props is then achieving 80 MFLOPS which is an order of magnitude less that a GFLOP.
Now given that in order for such a machine to do any useful work it has to get data in, put data out and generally communicate and coordinate amongst it's processors. So we can safely divide that 80 MFLOPS by 10 again I would imagine.
Conclusion: 40 Props gets you about 8 MFLOPS.
Of course here we are only talking addition and subtraction when you get onto multiply and divide we can divide by 10 again I might guess. Say 1 MFLOP.
How many MIPS are you actually getting out of it? With what program?
Leon, aside from the theoretical quote, I don't know. I hope to know in the future. Actual speed in props is based on numerous factors like the programming language, the wiring interface (different interfacing may be used at different times), the actual instruction size and time to execute, etc. There's numerous posts on this topic for single props including the benchmark thread. Perhaps the information from benchmarking multiple cogs can be applied to multiple props.
One project I'm interested in is developing hardware and software to extrude out the maximum amount of speed and processing power. I'm interested in exceeding the quoted chip speed ratings without changing out the crystal, super-cooled hypering, or overclocking. This would be an academics experiment for the pure enjoyment of exploring the results. There were some posts about this topic a while back. It will be some time before I get to this phase.
I've just been looking over Lonsocks F32 floating point object. It's the fastest, smallest floating point object we have I believe. Now for a floating point addition F32 has to run through approximately 100 PASM instructions. If the Prop had nothing else to do we have 8 COGS at 20MIPS = 160MIPS available. That is then 1.6 MFLOPS. Let's be generous and call it 2 MFLOPS. A collection of 40 Props is then achieving 80 MFLOPS which is an order of magnitude less that a GFLOP.
http://forums.parallax.com/showthread.php?123828-40-Props-in-a-Skyscraper&p=922390&viewfull=1#post922390 Yep, took it into account. Floating point is implemented as PASM code on the prop, and is on the order of 20x-100x slower than integer code. Actually even integer multiplication and division has to be implemented as pasm code [noparse]:)[/noparse] Still, 6.4BIPS ~= 64MFLOPS - still quite respectable (look up early super computers such as a VAX as proof)
Trying to execute floating-point out-of-order may improve the speed a bit as I may plan to implement branch-prediction statistics to cut in cycles (to do it in a short time). But still, I would imagine we may get maybe up to 96 MFLOPS while for integer, it may speed up from 640 to 700~800 MIPS. I don't know yet, as I may have to take my DIP-40 Propeller and try that out (writing a kernel softwaree that would do the trick is a bit tricky...)
There's always difference between in-order and out-of-order processors in term of the speeds, even with the same integer unit slices (where the ALUs are - the simplest out-of-order processors may have at least two ALUs or two separate identical integer units) and core frequency. Cogs in P8X32A are in-order processor, though (Prop II's Cogs smell like Out-of-Order processors, but we would know soon....) There are few methods to force out-of-order execution, however with prices.
Trying to execute floating-point out-of-order may improve the speed a bit as I may plan to implement branch-prediction statistics to cut in cycles (to do it in a short time). But still, I would imagine we may get maybe up to 96 MFLOPS while for integer, it may speed up from 640 to 700~800 MIPS. I don't know yet, as I may have to take my DIP-40 Propeller and try that out (writing a kernel softwaree that would do the trick is a bit tricky...)
There's always difference between in-order and out-of-order processors in term of the speeds, even with the same integer unit slices (where the ALUs are - the simplest out-of-order processors may have at least two ALUs or two separate identical integer units) and core frequency. Cogs in P8X32A are in-order processor, though (Prop II's Cogs smell like Out-of-Order processors, but we would know soon....) There are few methods to force out-of-order execution, however with prices.
Dr. Mario, can you briefly define or describe the in-order and out-of-order technique? Thanks.
In-order processing is basically what it do, allowing the programs to execute its own machine codes in steps (the instructions), as the computer programmer insisted.
Out-of-order processing, once again, what it is: The processor can just skip some certain instructions or just ditch the duplications, cutting down on processing cycles by cutting out unnecessary resource, or just delay what's not necessary right now.
And, to outline that instruction ordering (kinda like the firing ordering on the car engine):
And, also the out-of-order processor can handle more than two threads, doing it much quicker. VLIW processor is an in-order processor, but it's very easy to force it to run like an out-of-order processor, only much cheaper (the cost of processor die) BY introducing delayed-issue into compiler, thus moving the machine codes' ordering back or forward depending on how it is programmed.
Dr. Mario, thanks for these details. This may open up previous doors that can now be used in new algorithmic ways for the Propeller. The HPL-93-52 is particularly useful. We can see where the sequential firing and sharing of multiple data piles, even sections of program constructs that act like subroutines, can be conducive to faster operations, minimizing redundancy and period latency and opening up more memory, a very good technique to exploit in a parallel Prop Big Brain and D-Supercomputer.
VLIW processors are viewed as an attractive way
of achieving instruction-level parallelism because
of their ability to issue multiple operations per
cycle with relatively simple control logic. They
are also perceived as being of limited interest as
products because of the problem of object code
compatibility across processors having different
hardware latencies and varying levels of parallelism.
In this paper, we introduce the concept of
delayed split-issue and the dynamic scheduling
hardware which, together, solve the compatibility
problem for VLIW processors and...
The rigid assumptions built into the program about the hardware are viewed as
precluding object code compatibility between processors built at different times with different
technologies and, therefore, having different latencies.
Dr. Mario wrote: In-order processing is basically what it do, allowing the programs to execute its own machine codes in steps (the instructions), as the computer programmer insisted.
Out-of-order processing, once again, what it is: The processor can just skip some certain instructions or just ditch the duplications, cutting down on processing cycles by cutting out unnecessary resource, or just delay what's not necessary right now.
And, to outline that instruction ordering (kinda like the firing ordering on the car engine):
And, also the out-of-order processor can handle more than two threads, doing it much quicker. VLIW processor is an in-order processor, but it's very easy to force it to run like an out-of-order processor, only much cheaper (the cost of processor die) BY introducing delayed-issue into compiler, thus moving the machine codes' ordering back or forward depending on how it is programmed.
However, for my firmware, I would prefer to have the codes analyzed for the branch prediction violations or anything that would slow the Cogs down, and to retain the way we intended to code the software.
Also, Hub will need to be left alone, or more precise, start up separate counter threads using the on-die finite-state machine timer / counter to wind down the time that Cogs need to be prepared to access the Hub window without violating the Hub access rules, or even mess up the executing software threads.
Here, I can say of few out-of-order execution methods here: Branch prediction analysis and reshuffling, dynamic JITC (Just-in-time compiler) and delayed issuing for this chip. All of those need to be done within the Cogs themselves so that way the round-robin rules aren't violated - and of course, the pinout electrical confliction / contention rules too.
I'm currently working towards providing a common platform between the Big Brain Propeller collective and AMD Radeon. To answer your question, is this completed? No. Can I do it? Yes, I have faith it will be accomplished. However, it's going to take some time as I'm not in any kind of hurry and there could be numerous pre-processes required to get up to speed.
I think this will be an exciting project because I worked on something similar in the past which was very successful. There are incredible gains from merging GPUs with Cogs, such as speed, number of core computers, etc. so it's worth looking at it.
I see how to write C++ code using the AMD SDK (OpenCL) on a PC. That's pretty straight forward.
Being that you had success in the past, can you describe how to merge GPUs with Cogs? Is the Big Brain Propeller collective physically connected to the AMD Radeon?
I see how to write C++ code using the AMD SDK (OpenCL) on a PC. That's pretty straight forward. Being that you had success in the past, can you describe how to merge GPUs with Cogs? Is the Big Brain Propeller collective physically connected to the AMD Radeon?
I think you could have a lot of fun writing straight forward programs. Are you planning to have a go at it?
In the past, the project used a preconfig host. This time around, the idea is that access will be passed through a similar preconfig host that fits the requirements set forth by AMD and therefore there is a physical connection between Propellers, GPUs and all preconfig hosts.
However, perhaps it's not that simple. Some connections anticipated are soft wiring, as mentioned in another post. So GPUs can soft wire to Cogs and Propellers. Is this a direct connection? Yes. Is it a direct physical hard connection? No.
In the scheme of things, is it possible to merge Cogs with GPUs? Yes. Can they collectively work together in parallel? Yes. Can soft wiring result in the desired arrays and trans-neural matter? Yes, I believe this is possible.
Popular Downloads
AMD Accelerated Parallel Processing (APP) SDK
Aparapi
ACML-GPU
AMD CodeAnalyst Performance Analyzer
x86 Open64 Compiler Suite
Featured Community Submissions
OpenCL_FFT (Apple)
bdt_nbody (Brown Deer Technology)
bdt_seismic3d (Brown Deer Technology)
Developing for APUs
The APU (Accelerated Processing Unit) gives software developers the power to unleash their imaginations.
AMD Accelerated Parallel Processing (APP) SDK
AMD Fusion Developer Summit
AMD's High Performance Computing (HPC) solution stack delivers powerful performance advantages through a software ecosystem tuned for the hardware that it runs on.
Visit the HPC Zone
See our Tutorials
OpenCL™ Zone
OpenCL™ gives developers what they have been demanding: a cross-vendor, non-proprietary solution for accelerating their applications on CPU, GPUs and APUs.
Visit the OpenCL Zone
Visit the OpenCL Forum
OpenCL Webinars
What's New - Featured Item:
OpenCL Training Courses - Sunnyvale, NY, Houston
Friday, June 10, 2011
New Blog: Free OpenCL Textbook to AFDS Attendees!
Friday, June 10, 2011
AMD and Leading Software Vendors Continue to Expand Offerings Optimized for OpenCL™ Standard
Thursday, June 09, 2011
AMD’s Alexander Lyashevsky on the future of OpenCL and Facial Recognition
Tuesday, June 07, 2011
New OpenCL algorithm - clsurf
Friday, June 03, 2011
Forums
GLSL shader not working with glDrawPixels on Radeon HD3450
Friday, June 10, 2011
Photo Recovery Software
Friday, June 10, 2011
Tootle library issues assertion
Thursday, June 09, 2011
Measuring GFLOPS
Thursday, June 09, 2011
Detecting Crossfire with ADL
Thursday, June 09, 2011
Tags: 32-bit 64 bit compilers 64-bit 64-bit Computing accelerated parallel programming ACML AMD APP AMD64 ATI Stream Benchmarks Cache Optimization CodeAnalyst Compilers debugger Direct X GPGPU gpgpu GPU gpu perfstudio gpu perfstudio GPU Programming heterogeneous computing HPC Image Convolution Java Linux® Memory Multicore Multi-threading NUMA open64 OpenCL OpenGL opengl Opteron optimization Optimization Optimization Techniques Parallel Programming performance bottlenecks performance tuning Processor Architecture Processors Profiler profiler Profiling profiling Programming Languages radeon Solaris Stream Thread Pools Threading Tools Vectorization Virtualization Visual Studio Windows
AMD (to Parallax Propeller) Development Boards Source
You can find various development boards from AMD for various connections with Parallax Propellers. It's up to you to make the connections. Development boards can often shave months off development time.
AMD APP SDK Support for APUs, Cayman GPUs
01/27/2011 03:58 PM
Advanced Micro Devices on Thursday released the updated AMD accelerated parallel processing (APP) software development kit (SDK) v2.3 with full support for the first AMD Fusion accelerated processing units (APUs), OpenCL 1.1 and AMD Radeon HD 6900-series graphics cores. The new SDK will allow software designers to utilize advantages of APUs as well as latest graphics processing units (GPUs).
AMD APP SDK v2.3 empowers software developers to write new applications that can take full advantage of the parallel processing power of heterogeneous computing platforms, such as those based on new AMD E-series and C-series APUs that combine a multi-core CPU and DirectX11-capable GPU on a single die. The new SDK also offers improved runtime performance and math libraries for OpenCL.
“When developers harness the power of parallel processing within our APU designs, they can fundamentally change the PC experience to help not only make it faster, but also to create new possibilities in software,” said John Taylor, director of client product and software marketing, AMD.
I think you could have a lot of fun writing straight forward programs. Are you planning to have a go at it?
I compiled and ran the AMD SDK Hello World demo using Visual Studio. I guess I could give it a go as suggested ... but I have no clue how to connect the Propeller logic to the video card; hardware or software.
I compiled and ran the AMD SDK Hello World demo using Visual Studio. I guess I could give it a go as suggested ... but I have no clue how to connect the Propeller logic to the video card; hardware or software. Please advise.
Apparently the AMD card is installed in a Windows PC. This is the 1st step. Before getting to the point of connecting Propellers, one should be able to program the GPUs. Do you have OpenGL installed? The sources listed above should be helpful in reaching this objective.
Givin' it a shot. I have AMD Radeon HD 4670, so it's all jolly and good.
I also have Phenom II Deneb in the compuiter I wrote this sentence on. It's mostly to deal with heavy codes such as those found in high-end IC design CAD and maybe some 3D stuff that also include PCB creation with other softwares in the background.
BTW, do you know which the best benchmark program for the Propeller? I may want some of the source codes so I can write the out-of-order execution demo for the P8X32A chip.
Givin' it a shot. I have AMD Radeon HD 4670, so it's all jolly and good. I also have Phenom II Deneb in the compuiter I wrote this sentence on. It's mostly to deal with heavy codes such as those found in high-end IC design CAD and maybe some 3D stuff that also include PCB creation with other softwares in the background.
Dr. Mario, this is fantastic - with three of us working on this, sharing notes and ideas, we can make some great progressive strides.
The AMD Radeon HD 4670 has 320 streaming processors and has come down in price ranging from $50 to $25 street. You could build one massive multiple core machine with 320 cores with only one AMD and a few Propellers at the front end.
Apparently the AMD card is installed in a Windows PC. This is the 1st step. Before getting to the point of connecting Propellers, one should be able to program the GPUs. Do you have OpenGL installed? The sources listed above should be helpful in reaching this objective.
Humanoido, I targeted the newer stuff, OpenCL. You might want to take a closer look at the source you posted and do some research yourself Secondly, how is OpenGL, or OpenCL in this case, related in any way to connecting a Propeller to a video card? How does the data go from the Prop to the video card and back?
And, there's some problem: If we should use PCIe x4, we may need maybe 120 Props to satisfy the throughput requirements. PCIe x1 should be no problem to try and snake the data throughout, while PCIe x16 is a huge problem (will need 320 Propeller I / 24 Propeller II to keep up with that kind of bandwidth - at least if we use Cyclone III FPGA to cut out our jobs, that would solve the bandwidth problems - and beware: Altera Quartus II tools are HUGE [I am glad I got 500GB hard drive here. -___-;] and I have it installed on my workstation already.)
And running OpenCL on Propeller II is doable. It's all is need is that libraries be copied and converted into PASM then saved onto the boot firmware flash ( / FeRAM or even SPI-compatible Magnetic RAM) and running. OpenCL is supposed to be barebone (very small set of library - all by itself).
Comments
Many years ago I supervised a student MSc project using an ANN to recognise sign-language gestures produced by someone wearing a DataGlove, interfaced to a PC. I gave the student some C code for a simple ANN that I had previously typed in from a book to get her started.
-Phil
One thing is undeniable; this is one well-viewed thread at 21,500 hits. If Humanoido had gotten paid for each hit, he would be a rich man and he could be have a fully-funded research lab. Something to consider for future posts...
I think whether or not it could be done better with a PC is besides the point. I'm guessing Humanoido enjoys working with the Propellers. I know I do.
Ever since seeing Humanoido's tower of Props on the Boe-Bot, I've been looking for a problem to solve with many Props. Not because I think many Props could do it better or cheaper than using a PC but because I think it would be fun.
I'm waiting for Jazzed to sell his TetraProp boards so I can experiment with 20 Propellers (I plan to buy five boards).
I doubt I'll pursue a neural network to solve my yet to be determined problem. I have been thinking about, what Jazzed once mentioned, using an eight bit data bus between Props. I'm hoping to impliment something like that myself.
While I still haven't decided on a problem to solve, I suspect it will probably deal with machine vision and/or some sort of display technology (pretty vague, I know). It's been fun to think about.
I beleive Dave Hein's Lerner program was inspired by Humanoido's work. That makes at least two of us.
I agree with Phil.
I hope Humanoido continues his work. (Not that most of you are suggesting he doesn't.)
Duane
So, I'm inclined to stop posting in this thread and move the "functional" discussion to another place such as the TetraProp thread after this response if that makes sense.
We've all heard that there are left brain logic/precision/fact tending and right brain emotion/approximation/creativity tending thinkers. Humanoidio appears to be more of the creative type so asking him to switch modes is probably wasteful.
Very nice higher level methodology description in your post. Yes, an array is a much simpler organization mechanism. I can see where simple set of RPCs can be a benefit for the math operations with respect to a centralized "change" dispatch for the distributed system.
I think part of the problem better solved by a distributed network is one of locality. An "intelligent" and adaptable robot will require a distribution mechanism for having mechanical controls and sensors not all being connected to a central core.
Neural nets will probably never achieve any "intelligence" in isolation. It is likely that multiple roles would need to be assigned. I would like to study Genetic Algorithms "GA" more since there may be some behavioral glue in that. There are micro language interpreters (Urban Muller's BrainF*** for example) that can fit in a COG with lots of room to spare that can execute small programs that could be written and modified by GA.
Perhaps the vision ideas you mention are high enough priority to have more connections in the central array. Maybe higher level connectivity does make sense. I have an idea where a Super HUB can have operations manipulated in a round-robin fashion as problem and messaging storage.
Indeed it does. And a Spinneret can be a peripheral that provides services for example.
Three new posts and several office interruptions since i started this response .... will get to that as I feel the urge.
Duane ... I haven't forgotten you. I have some things to wrap up first before shooting new boards.
Is it the journey or the destination?
The Big Brain project is one of gained new experiences, knowledge and understanding. It's a new project path taken in this new day and age with new technology (and old technology applied in new ways - which is perfectly acceptable).
Much of the project is about what we can develop and initiate and discover along the way. It can be compared to a great voyage. When Columbus set sail, it was to find a faster route to the Indies and not to discover America. America just popped up along the way. Who knew? I propose that we too are on a great journey and know not yet what we will learn, find, and discover along the way.
It's a big field and no one person can be expected do it all, or have all the requirements to accomplish all the goals of everyone's opinion. To make such great strides in this world, we need left brained people as well as right brained people and combined diverse talent and skills for emerging understanding and accomplishment.
When NASA Mars rovers Spirit and Opportunity explored the surface of Mars it was because of grand cooperation of thousands of scientists across the world - it led to Opportunity celebrating its 7th birthday at Santa Maria crater, a name embedded in Columbus' great journey.
In this great world, we need a diversity of people who are all important in the grand scheme of life - who can dig ditches, operate taxis, build bridges and computers, run companies, design tall skyscrapers and elevators, manufacture and design chips, and check out food in grocery stores, just as much as we need auto technicians, artists, philosophers, thinkers, engineers, secretaries, dreamers and programmers.
It is when we can draw upon and collectively bring together such people resources that great things begin to happen, i.e. people working together in positive constructive ways can make a big difference in this world. Do you think the New World would have been discovered if Columbus' crew mutinied half way there? No! it would have held back progress and the science of world trade and exploration.
So is there a killer app? We don't know. Will the journey lead to one or will one develop along the way. We do not know. I do know thus far, from my own perspective, the path is very rich and fully embodied with a constitution of new learning, discovery and ideas with new ways of looking at things and what could be more valuable this or making new friends along the way?
In communications with Parallax Forum member Michael O'Brien, a man who I consider to be a great thinker of our time, in reference to ongoing development of his 80-Propellers Parallax Machine, he explained, "My hardware is a form of relaxation and necessary creativity...even if it takes me years to complete a project."
So I propose that some projects are about the journey and not the destination. You see, when the destination is reached, the journey is ended, over, and nothing more is gained. So be careful what you wish for.
In the words of poets and inspirational philosophical thinkers:
"The road of life twists and turns and no two directions are ever the same. Yet our lessons come from the journey, not the destination. -Don Williams, Jr. (American Novelist and Poet, b.1968)
Focus on the journey, not the destination. Joy is found not in finishing an activity but in doing it. -Greg Anderson (American best-selling Author and founder of the American Wellness Project., b.1964)
Success is a journey...not a destination. -Ben Sweetland
Inspirational Words of Wisdom/ It's The Journey That's Important
A gentle reminder to us all that life is about living. We can at times find ourselves rushing and forgeting the important things that life has to offer. The Poet, John McLeod, expresses these thoughts in his words of wisdom throughout this poem.
http://www.wow4u.com/journey/index.html
It's The Journey That's Important...
Poet: John McLeod
Life, sometimes so wearying
Is worth its weight in gold
The experience of traveling
Lends a wisdom that is old
Beyond our 'living memory'
A softly spoken prayer:
"It's the journey that's important,
Not the getting there!"
Ins and outs and ups and downs
Life's road meanders aimlessly?
Or so it seems, but somehow
Leads us where we need to be,
And being simply human
We oft question and compare....
"Is the journey so important
Or the getting there?"
And thus it's always been
That question pondered down the ages
By simple men with simple ways
To wise and ancient sages....
How sweet then, quietly knowing
Reaching destination fair:
"It's the journey that's important,
Not the getting there!"
During reconstruction of a new supporting Quick Brain (another 5-prop brain used for quick testing but with added enumeration test interface), it was discovered that touching the data leads produced a change in signal. If you did not touch the data leads, the machine would not inject (this would happen only with some specific chip). If you touched the leads and the body acted as a ground, the signal improved and injection functioned.
There appeared to be a greater sensitivity to this effect in some Propeller chips over others which was either related to the Propeller chip internals, the chip's physical nature, the length of the wiring, the solderless breadboard or all or some combination of the above. Replacing the chip with another could sometimes improve the signal waveform but not always.
For additional analysis, the solderless breadboard in question was totally disassembled to discover its hidden construction and determine if it contributed to this effect. These breadboards do appear to have a direct correlation to circuit functioning. More information will be provided about this analysis at a later date.
To improve the circuit and make it very stable, route a CT4-0805-Y5V (63V-104-M) 10nf multilayer ceramic disc capacitor (obtained from Nanjing Chiyang Electronics) from each chip's orange injection data line to ground. Additional filtering capacitors were installed, a 3300uf electrolytic and 10uf across each power bus. Install decoupling capacitors on BOTH sides of each P8X32A-D40.
[insert schematic here]
The Neural Injector has these wiring improvements
to improve reliability of data signal waveforms
Propeller supercomputer hardware questions
http://forums.parallax.com/showthread.php?125614-Propeller-supercomputer-hardware-questions/page11
Originally Posted by Mike G
I'm having a hard time understanding ...sounds like Dr. Mario or Humanoido know how to interface the Prop with an AMD Radeon HD video card. Dr. Mario or Humanoido, can you or have you interfaced the Prop with a AMD Radeon HD video card?
Looking back, a post is dedicated to this topic. Bringing it up to date, yes, both Dr. Mario and myself have developed differing ideas to use GPU cards in relationship with large Propeller-based projects - in particular the massive Dendou Oni Supercomputer being designed by Dr. Mario, and the emerging Propeller-Based Big Brain.
I'm currently working towards providing a common platform between the Big Brain Propeller collective and AMD Radeon. To answer your question, is this completed? No. Can I do it? Yes, I have faith it will be accomplished. However, it's going to take some time as I'm not in any kind of hurry and there could be numerous pre-processes required to get up to speed.
I think this will be an exciting project because I worked on something similar in the past which was very successful. There are incredible gains from merging GPUs with Cogs, such as speed, number of core computers, etc. so it's worth looking at it.
Also remember that our props were working in the realm of GFLOPS (as an approximation of FP to FLOPS-see other posts) with big machines like the UltraSpark 40, however a technological merge with AMD can result in a quantum leap up to the TERA-FLOP realm! The next level is Peta-FLOPS where the majority of supercomputers reside today.
The props in the Big Brain with partitions comes out to 24,000 MIPS using the Parallax data sheet. You can equate that to 24 Billion Instructions Per Second and make any additional conversions you want.
More information about Radeon is found here: page 33 post 650
The Big Brain Speed Doodle is found here
http://forums.parallax.com/showthread.php?124495-Fill-the-Big-Brain/page42
Additional thoughts at the Propeller Supercomputer Hardware Thread
http://forums.parallax.com/showthread.php?125614-Propeller-supercomputer-hardware-questions&p=1008502&viewfull=1#post1008502
EDIT: Please change GFLOPS to MFLOPS
http://forums.parallax.com/showthread.php?125614-Propeller-supercomputer-hardware-questions&p=1008454&viewfull=1#post1008454
AMD provides diagrams, technical details for micro programming, operations, and interfacing. This is currently the best source.
http://www.amd.com/us/products/technologies/Pages/technologies.aspx
I don't have a drawing program yet. A block diagram would show three Big Brain Partitions to the left, an adjoining interface, a computer host for software programming, interpretation etc., and a high end GPU card divided into an array. Also to diagram, channeling for software, wiring taking place with software, the hard NMI and Enumerator, neural processors, and soft versions of NMI and Enumeration.
What? I really do not believe this.
I've just been looking over Lonsocks F32 floating point object. It's the fastest, smallest floating point object we have I believe. Now for a floating point addition F32 has to run through approximately 100 PASM instructions. If the Prop had nothing else to do we have 8 COGS at 20MIPS = 160MIPS available. That is then 1.6 MFLOPS. Let's be generous and call it 2 MFLOPS.
A collection of 40 Props is then achieving 80 MFLOPS which is an order of magnitude less that a GFLOP.
Now given that in order for such a machine to do any useful work it has to get data in, put data out and generally communicate and coordinate amongst it's processors. So we can safely divide that 80 MFLOPS by 10 again I would imagine.
Conclusion: 40 Props gets you about 8 MFLOPS.
Of course here we are only talking addition and subtraction when you get onto multiply and divide we can divide by 10 again I might guess. Say 1 MFLOP.
What am I missing from this crude analysis?
Leon, aside from the theoretical quote, I don't know. I hope to know in the future. Actual speed in props is based on numerous factors like the programming language, the wiring interface (different interfacing may be used at different times), the actual instruction size and time to execute, etc. There's numerous posts on this topic for single props including the benchmark thread. Perhaps the information from benchmarking multiple cogs can be applied to multiple props.
One project I'm interested in is developing hardware and software to extrude out the maximum amount of speed and processing power. I'm interested in exceeding the quoted chip speed ratings without changing out the crystal, super-cooled hypering, or overclocking. This would be an academics experiment for the pure enjoyment of exploring the results. There were some posts about this topic a while back. It will be some time before I get to this phase.
Drats! My error! I meant MFLOPS and not GFLOPS.
40 Props in a Skyscraper
There's always difference between in-order and out-of-order processors in term of the speeds, even with the same integer unit slices (where the ALUs are - the simplest out-of-order processors may have at least two ALUs or two separate identical integer units) and core frequency. Cogs in P8X32A are in-order processor, though (Prop II's Cogs smell like Out-of-Order processors, but we would know soon....) There are few methods to force out-of-order execution, however with prices.
In-order processing is basically what it do, allowing the programs to execute its own machine codes in steps (the instructions), as the computer programmer insisted.
Out-of-order processing, once again, what it is: The processor can just skip some certain instructions or just ditch the duplications, cutting down on processing cycles by cutting out unnecessary resource, or just delay what's not necessary right now.
And, to outline that instruction ordering (kinda like the firing ordering on the car engine):
In-order = 1, 2, 3, 4, 5, 6, 7, 8
Out-of-order = 1, 3, 4, 2, 5, 6, 8, 7
And, also the out-of-order processor can handle more than two threads, doing it much quicker. VLIW processor is an in-order processor, but it's very easy to force it to run like an out-of-order processor, only much cheaper (the cost of processor die) BY introducing delayed-issue into compiler, thus moving the machine codes' ordering back or forward depending on how it is programmed.
HP have been researching on Out-of-order Execution technique in VLIW processors since early 1990s, so it seemed: http://www.hpl.hp.com/techreports/93/HPL-93-52.pdf
Dr. Mario, thanks for these details. This may open up previous doors that can now be used in new algorithmic ways for the Propeller. The HPL-93-52 is particularly useful. We can see where the sequential firing and sharing of multiple data piles, even sections of program constructs that act like subroutines, can be conducive to faster operations, minimizing redundancy and period latency and opening up more memory, a very good technique to exploit in a parallel Prop Big Brain and D-Supercomputer.
However, for my firmware, I would prefer to have the codes analyzed for the branch prediction violations or anything that would slow the Cogs down, and to retain the way we intended to code the software.
Also, Hub will need to be left alone, or more precise, start up separate counter threads using the on-die finite-state machine timer / counter to wind down the time that Cogs need to be prepared to access the Hub window without violating the Hub access rules, or even mess up the executing software threads.
Here, I can say of few out-of-order execution methods here: Branch prediction analysis and reshuffling, dynamic JITC (Just-in-time compiler) and delayed issuing for this chip. All of those need to be done within the Cogs themselves so that way the round-robin rules aren't violated - and of course, the pinout electrical confliction / contention rules too.
I see how to write C++ code using the AMD SDK (OpenCL) on a PC. That's pretty straight forward.
Being that you had success in the past, can you describe how to merge GPUs with Cogs? Is the Big Brain Propeller collective physically connected to the AMD Radeon?
In the past, the project used a preconfig host. This time around, the idea is that access will be passed through a similar preconfig host that fits the requirements set forth by AMD and therefore there is a physical connection between Propellers, GPUs and all preconfig hosts.
However, perhaps it's not that simple. Some connections anticipated are soft wiring, as mentioned in another post. So GPUs can soft wire to Cogs and Propellers. Is this a direct connection? Yes. Is it a direct physical hard connection? No.
In the scheme of things, is it possible to merge Cogs with GPUs? Yes. Can they collectively work together in parallel? Yes. Can soft wiring result in the desired arrays and trans-neural matter? Yes, I believe this is possible.
AMD has Developers Central[/B] with Tools, SDKs, Libraries, Samples & Demos, Docs, Zones, Community Forums and Support.
http://developer.amd.com/pages/default.aspx
Popular Downloads
AMD Accelerated Parallel Processing (APP) SDK
Aparapi
ACML-GPU
AMD CodeAnalyst Performance Analyzer
x86 Open64 Compiler Suite
Featured Community Submissions
OpenCL_FFT (Apple)
bdt_nbody (Brown Deer Technology)
bdt_seismic3d (Brown Deer Technology)
Developing for APUs
The APU (Accelerated Processing Unit) gives software developers the power to unleash their imaginations.
AMD Accelerated Parallel Processing (APP) SDK
AMD Fusion Developer Summit
AMD's High Performance Computing (HPC) solution stack delivers powerful performance advantages through a software ecosystem tuned for the hardware that it runs on.
Visit the HPC Zone
See our Tutorials
OpenCL™ Zone
OpenCL™ gives developers what they have been demanding: a cross-vendor, non-proprietary solution for accelerating their applications on CPU, GPUs and APUs.
Visit the OpenCL Zone
Visit the OpenCL Forum
OpenCL Webinars
What's New - Featured Item:
OpenCL Training Courses - Sunnyvale, NY, Houston
Friday, June 10, 2011
New Blog: Free OpenCL Textbook to AFDS Attendees!
Friday, June 10, 2011
AMD and Leading Software Vendors Continue to Expand Offerings Optimized for OpenCL™ Standard
Thursday, June 09, 2011
AMD’s Alexander Lyashevsky on the future of OpenCL and Facial Recognition
Tuesday, June 07, 2011
New OpenCL algorithm - clsurf
Friday, June 03, 2011
Forums
GLSL shader not working with glDrawPixels on Radeon HD3450
Friday, June 10, 2011
Photo Recovery Software
Friday, June 10, 2011
Tootle library issues assertion
Thursday, June 09, 2011
Measuring GFLOPS
Thursday, June 09, 2011
Detecting Crossfire with ADL
Thursday, June 09, 2011
Tags: 32-bit 64 bit compilers 64-bit 64-bit Computing accelerated parallel programming ACML AMD APP AMD64 ATI Stream Benchmarks Cache Optimization CodeAnalyst Compilers debugger Direct X GPGPU gpgpu GPU gpu perfstudio gpu perfstudio GPU Programming heterogeneous computing HPC Image Convolution Java Linux® Memory Multicore Multi-threading NUMA open64 OpenCL OpenGL opengl Opteron optimization Optimization Optimization Techniques Parallel Programming performance bottlenecks performance tuning Processor Architecture Processors Profiler profiler Profiling profiling Programming Languages radeon Solaris Stream Thread Pools Threading Tools Vectorization Virtualization Visual Studio Windows
You can find various development boards from AMD for various connections with Parallax Propellers. It's up to you to make the connections. Development boards can often shave months off development time.
http://www.amd.com/us/products/embedded/develop-and-design/Pages/development-boards.aspx
For Big Brain Propeller to GPU dev, here's a PC option for development of AMD GPU boards in addition to a Mac option.
http://www.xbitlabs.com/news/other/display/20110127155809_AMD_s_Software_Development_Kit_Now_Supports_Fusion_Chips.html
AMD APP SDK Support for APUs, Cayman GPUs
01/27/2011 03:58 PM
Advanced Micro Devices on Thursday released the updated AMD accelerated parallel processing (APP) software development kit (SDK) v2.3 with full support for the first AMD Fusion accelerated processing units (APUs), OpenCL 1.1 and AMD Radeon HD 6900-series graphics cores. The new SDK will allow software designers to utilize advantages of APUs as well as latest graphics processing units (GPUs).
AMD APP SDK v2.3 empowers software developers to write new applications that can take full advantage of the parallel processing power of heterogeneous computing platforms, such as those based on new AMD E-series and C-series APUs that combine a multi-core CPU and DirectX11-capable GPU on a single die. The new SDK also offers improved runtime performance and math libraries for OpenCL.
“When developers harness the power of parallel processing within our APU designs, they can fundamentally change the PC experience to help not only make it faster, but also to create new possibilities in software,” said John Taylor, director of client product and software marketing, AMD.
Please advise.
I also have Phenom II Deneb in the compuiter I wrote this sentence on. It's mostly to deal with heavy codes such as those found in high-end IC design CAD and maybe some 3D stuff that also include PCB creation with other softwares in the background.
BTW, do you know which the best benchmark program for the Propeller? I may want some of the source codes so I can write the out-of-order execution demo for the P8X32A chip.
Dr. Mario, this is fantastic - with three of us working on this, sharing notes and ideas, we can make some great progressive strides.
The AMD Radeon HD 4670 has 320 streaming processors and has come down in price ranging from $50 to $25 street. You could build one massive multiple core machine with 320 cores with only one AMD and a few Propellers at the front end.
http://www.anandtech.com/show/2616
With an investment of $500, get 20x320=6,400 processors. A thousand dollars yields a harvest of 12,800 processors.
With 514 million transistors, this board has a 2,000 MHz data rate.
http://www.amd.com/us/products/desktop/graphics/ati-radeon-hd-4000/hd-4600/Pages/ati-radeon-hd-4600-overview.aspx
Specs
http://www.amd.com/us/products/desktop/graphics/ati-radeon-hd-4000/hd-4600/Pages/ati-radeon-hd-4600-specifications.aspx
And, Mike G - you can do that, I think it's mentioned in the datasheet for Radeon HD 2000 series: http://developer.amd.com/media/gpu_assets/ATI_Radeon_HD_2000_programming_guide.pdf - otherwise, how would we use the GPU to re-encode our home video?
And, there's some problem: If we should use PCIe x4, we may need maybe 120 Props to satisfy the throughput requirements. PCIe x1 should be no problem to try and snake the data throughout, while PCIe x16 is a huge problem (will need 320 Propeller I / 24 Propeller II to keep up with that kind of bandwidth - at least if we use Cyclone III FPGA to cut out our jobs, that would solve the bandwidth problems - and beware: Altera Quartus II tools are HUGE [I am glad I got 500GB hard drive here. -___-;] and I have it installed on my workstation already.)
And running OpenCL on Propeller II is doable. It's all is need is that libraries be copied and converted into PASM then saved onto the boot firmware flash ( / FeRAM or even SPI-compatible Magnetic RAM) and running. OpenCL is supposed to be barebone (very small set of library - all by itself).