As I posted previously, I have a lot of experience with the hardware and communications aspect of a design. Where I come up short is on the neural net side of things. I had some interest in the research being done several years ago and followed it for a while. I do understand the basic concepts, but my understanding of the field as it is at present could best be described as fuzzy. What are your opinions on how practical it would be to run a neural net on an array op props? Your opinions along with the reasons for them would be appreciated.
@All:
I have proposed an array of props and multiple buses, and asked for alternate suggestions as well as comments/criticism of that proposal. So far I have not seen any alternate suggestions or comments on the hardware. Should I take this as meaning there are no better suggestions?
I have proposed an array of props and multiple buses, and asked for alternate suggestions as well as comments/criticism of that proposal. So far I have not seen any alternate suggestions or comments on the hardware. Should I take this as meaning there are no better suggestions?
The hardware connections are important, but the software should allow any hardware within reason.
I've already recommended using a bidirectional ring approach with redundant ring connections. Not sure why that has been ignored. It is the simplest and safest approach.
The hyper-cube design is interesting and may be fast, but it seems unnecessarily complicated. The tree design is fine since it allows the shortest number of hops for a destination, but it requires the comm protocol to route packets and any node is a single point of failure.
I wasn't suggesting that you use it, but the hypercube has several advantages. It isn't complicated. Seeing how other people implement things can be useful, and can save a lot of time and effort.
I've just noticed that Scilab includes an ANN toolbox. I installed the toolbox and tried one of the demos, it seems to work OK.
Most neuron code examples are useless for PASM except for the general algorithms.
I've already started designing neuron COG code. Maybe your COG code would be better?
The hardware connections are important, but the software should allow any hardware within reason.
I've already recommended using a bidirectional ring approach with redundant ring connections. Not sure why that has been ignored. It is the simplest and safest approach.
The hyper-cube design is interesting and may be fast, but it seems unnecessarily complicated. The tree design is fine since it allows the shortest number of hops for a destination, but it requires the comm protocol to route packets and any node is a single point of failure.
Sorry jazzed, I forgot you suggested that in the big brain thread. Any estimate of the bandwidth and latency of your proposal? Lets assume 64 props for the array.
I know nothing about neural nets but that does not stop me making an off the wall suggestion.
The connection between neurons only has to carry a single value. So rather than use some complex communication protocol between Props that eats up a cog why not use the counters to transmit and recieve that connection value over a pair of pins. Represent the value as a frequency rather like real neurons are said to do. Or perhaps pulse width modulation. This way the counter hardware does the coms protocol for you and the cog is free for calculating.
Sorry jazzed, I forgot you suggested that in the big brain thread. Any estimate of the bandwidth and latency of your proposal? Lets assume 64 props for the array.
@kwinn
I'm assuming a conservative 115200bps asynchronous rate, 10 bytes per message (2 byte destination id + 8 bytes data), and a simple FullDuplexSerial repeater pair on every node in the ring (UPSR). NOP or other messages are always active.
More agressive TDM "Time Domain Multiplexing" communications schemes like in SONET should be able to hit 8Mb/s fairly easy. TDM would be a perfect application for Propeller's determinism.
The time it will take for a message to traverse 64 nodes in a conservative 115200bps unidirectional configuration (UPSR) would be about 5ms. Maximum 2.5ms for a BLSR where rings are bidirectional.
While that is relatively slow, the advantages are: hardware simplicity, redundancy, software controlled connections, and minimal software footprint.
@Heater
I really like your idea for a small rigidly configured network. Hardware is forever pin constrained though.
Regarding PASM code and implementations. There are many types of neural networks that can be used for different purposes. Eventually a distributed infrastructure should be created that would easily support any type. The choice of type to be used is up to the user.
I plan to implement neural node types in steps from simplest to more complicated as my bandwidth permits. I hope other programmers can join with productive effort.
Below is an overly simplified McCulloch-Pitts Model that I rolled for the Big Brain forum. After I wrote the "Brute Force" code, I realized that if I had a bunch of these guys I might be able to recognize simple 2D shapes.
CON
_clkmode = xtal1 + pll16x
_xinfreq = 5_000_000
DAT
pattern byte %10101100
output byte $00
OBJ
pst : "Parallax Serial Terminal.spin"
PUB Main | i
pst.Start(115_200)
waitcnt((clkfreq / 1_000 * 2_000) + cnt)
SampleInput(0, $AC)
SampleInput(1, $BB)
SampleInput(2, $AC)
SampleInput(3, $BB)
SampleInput(4, $AC)
SampleInput(5, $BB)
SampleInput(6, $AC)
SampleInput(7, $BB)
pst.bin(output, 8)
pst.char(13)
SampleInput(0, $BB)
SampleInput(1, $AC)
SampleInput(2, $AC)
SampleInput(3, $AC)
SampleInput(4, $AC)
SampleInput(5, $BB)
SampleInput(6, $AC)
SampleInput(7, $BB)
pst.bin(output, 8)
' Check input byte and output a 1 if
' the pattern matches, otherwise output a 0
PUB SampleInput(id, input) | tmp
tmp := |< id
if(input == pattern)
output |= tmp
else
tmp := !tmp
output &= tmp
Below is an overly simplified McCulloch-Pitts Model that I rolled for the Big Brain forum. After I wrote the "Brute Force" code, I realized that if I had a bunch of these guys I might be able to recognize simple 2D shapes.
... snip ...
Thanks Mike. I'll probably add summation to a PASM version.
BTW, I guess you noticed Google paid tribute to Les Paul Thursday the 9th. http://www.google.com/logos/2011/lespaul.html
Below is an overly simplified McCulloch-Pitts Model that I rolled for the Big Brain forum. After I wrote the "Brute Force" code, I realized that if I had a bunch of these guys I might be able to recognize simple 2D shapes.
CON
_clkmode = xtal1 + pll16x
_xinfreq = 5_000_000
DAT
pattern byte %10101100
output byte $00
OBJ
pst : "Parallax Serial Terminal.spin"
PUB Main | i
pst.Start(115_200)
waitcnt((clkfreq / 1_000 * 2_000) + cnt)
SampleInput(0, $AC)
SampleInput(1, $BB)
SampleInput(2, $AC)
SampleInput(3, $BB)
SampleInput(4, $AC)
SampleInput(5, $BB)
SampleInput(6, $AC)
SampleInput(7, $BB)
pst.bin(output, 8)
pst.char(13)
SampleInput(0, $BB)
SampleInput(1, $AC)
SampleInput(2, $AC)
SampleInput(3, $AC)
SampleInput(4, $AC)
SampleInput(5, $BB)
SampleInput(6, $AC)
SampleInput(7, $BB)
pst.bin(output, 8)
' Check input byte and output a 1 if
' the pattern matches, otherwise output a 0
PUB SampleInput(id, input) | tmp
tmp := |< id
if(input == pattern)
output |= tmp
else
tmp := !tmp
output &= tmp
Mike, somehow I expected the code for even a very simple neuron to be more complex than that. Unless other neurons are orders of magnitude more complex I get the feeling it should be possible to write a "neuron interpreter" program in PASM and instantiate neurons in hub ram with a table. Any thoughts on that idea?
The Google thing was very cool... played around with it for a while.
kwinn and all; What's missing from my very simple McCulloch-Pitts Model is teaching and weighted inputs. I just hard coded a pattern each with a weight of 1. But absolutely, we should be able to code a PASM interpreter the reads and write to ram. As already discussed, the hard part is getting the data in and out. I imagine input data needs follow some formatting rules. there's also the teaching part which takes more RAM.
Honestly, I'm not totally dedicated to this neuron stuff... I have the Spinneret thing going on... but I do believe there is some merit to going down this road. I'm willing to help time permitting.
In other words instead of trying to come up with an optimum connection scheme for current neuron software and attempting to fit the prop to that, lets come up with an optimum connection scheme for an array of propellers and write neuron software that takes advantage of the prop's strengths.
Have you given more thought to this?
One idea i've tossed around is to have a Super-HUB for message passing and data storage. There are advantages and disadvantages to this approach, but some issues can be resolved by inter cluster communications.
A Super-HUB is essentially an external memory extension of the Propeller's current architecture. That is, each Propeller would have access to external memory in a round robin way like COGs have access to on chip HUB RAM today. The access slot can be set by a pin counter from a master controller which can access the Super-HUB and do some common memory transaction setups. Two or more of such Super-HUB clusters can be used to add capabilities.
Some advantages to a Super-HUB would be simple hardware and fast access to messages and data. Hardware can be connected on a parallel bus and each Propeller would have equal access. Each propeller gets "atomic" access to the hub for reading and depositing data. Locks could be implemented as semaphores in the master's memory space to allow for sequential (non atomic) access to a resource.
Some disadvantages to the Super-HUB scheme are single points of failure (one master, one clock, and one RAM) and limited number of devices possible on a bus. While the disadvantages seem to be a deal killer especially with the limited number of devices, it may be possible to connect a number of Super-HUB clusters together.
The programming model for a Super-HUB would be about the same complexity as any other scheme for a single cluster, but it would require more infrastructure for connecting multiple clusters.
@Leon. An interesting product. I did not see any examples of real world applications. Are you aware of any, or is this product so new that there are none at this time?
@jazzed. I have not had much time to think about it. This is the busiest time of year for me, and I will not get much free time until September. I am lucky to have internet access at some customer sites so at least I can read or answer a post or two while waiting for compiles, calibrations, or diagnostics to complete.
Having said that however, and having been in field service for so many years I find myself favoring the rugged and simple designs that offer some form of redundancy. So far I have not seen any proposals that would perform better or more reliably than a simple cubic array of props with multiple x, y, and z buses.
I do like your idea of external memory, but would prefer something more reliable than a single super hub. Perhaps a shared memory chip between a prop and each of it's neighboring props. That might even work as a replacement or addition to the bus system.
There are several, including people counting and tracking (I used to work for a company doing that) and facial recognition. See the list on the left of the page.
I really like your idea for a small rigidly configured network.
Well perhaps it not workable for a lot of connections. I just thought COGS can be swapping signals via LONGS in HUB which is a lot of possible connections and then multiple Props have less connection possibilities.
Again I know little of ANNs, it's a long time since I read up a little on it. I get the idea that running the ANN is quite simple but the training is much more complex and requiring a lot more horse power. So I imagine a set up where:
1) The Prop/Props are collecting input data from whatever system this is working in.
2) They forward this data to your PC where a ANN resides (Perhaps written in C or whatever) and all the training is done.
3) When you have a suitably trained network you download the weighs to the Prop which now runs the same ANN algorithms this time in PASM.
So yes the end result is a rigidly configured network that gets on with the job it is trained to do.
Of course this setup is a bit more complicated where the training involves, say, ensuring that a balancing bot stays upright. In that case the ANN being trained on the PC would need to return results back to drive the physical bot to see if it falls over or not and hence learn how to drive the thing. Or you have to include an accurate physics simulation of the bot on the PC.
Comments
As I posted previously, I have a lot of experience with the hardware and communications aspect of a design. Where I come up short is on the neural net side of things. I had some interest in the research being done several years ago and followed it for a while. I do understand the basic concepts, but my understanding of the field as it is at present could best be described as fuzzy. What are your opinions on how practical it would be to run a neural net on an array op props? Your opinions along with the reasons for them would be appreciated.
@All:
I have proposed an array of props and multiple buses, and asked for alternate suggestions as well as comments/criticism of that proposal. So far I have not seen any alternate suggestions or comments on the hardware. Should I take this as meaning there are no better suggestions?
http://forums.parallax.com/showthread.php?132152-Simulating-neurons-with-an-array-of-props.&p=1008233&viewfull=1#post1008233
The usual approach is a single fast processor with fast RAM. Parallel processing can be useful, on the right hardware.
I suggested choosing a simple problem to start with, like recognising hand-drawn numerals, and implementing that.
The hardware connections are important, but the software should allow any hardware within reason.
I've already recommended using a bidirectional ring approach with redundant ring connections. Not sure why that has been ignored. It is the simplest and safest approach.
The hyper-cube design is interesting and may be fast, but it seems unnecessarily complicated. The tree design is fine since it allows the shortest number of hops for a destination, but it requires the comm protocol to route packets and any node is a single point of failure.
http://forums.parallax.com/showthread.php?132152-Simulating-neurons-with-an-array-of-props.&p=1008269&viewfull=1#post1008269
I've just noticed that Scilab includes an ANN toolbox. I installed the toolbox and tried one of the demos, it seems to work OK.
Sorry, your constant X references appear to be nothing more than a sales pitch to me and others.
It's really hard to tell when you are offering something of value in that respect.
I'm not against using other CPUs, but there is a time and place for everything.
Please have some respect for people in this place.
Has anyone apart from me actually implemented one?
Here is some C code I've found that does facial recognition:
http://www.cs.cmu.edu/afs/cs.cmu.edu/usr/mitchell/ftp/faces.html
http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-8/faceimages/code/
Implement something on Propeller for us and show that you are capable of adding value.
Why don't you want to look at some code that actually works?
You're not helping; you're just doing what you always do.
I've studied ANN for years and have seen many code examples.
I've already started designing neuron COG code. Maybe your COG code would be better?
Sorry jazzed, I forgot you suggested that in the big brain thread. Any estimate of the bandwidth and latency of your proposal? Lets assume 64 props for the array.
The connection between neurons only has to carry a single value. So rather than use some complex communication protocol between Props that eats up a cog why not use the counters to transmit and recieve that connection value over a pair of pins. Represent the value as a frequency rather like real neurons are said to do. Or perhaps pulse width modulation. This way the counter hardware does the coms protocol for you and the cog is free for calculating.
@kwinn
I'm assuming a conservative 115200bps asynchronous rate, 10 bytes per message (2 byte destination id + 8 bytes data), and a simple FullDuplexSerial repeater pair on every node in the ring (UPSR). NOP or other messages are always active.
More agressive TDM "Time Domain Multiplexing" communications schemes like in SONET should be able to hit 8Mb/s fairly easy. TDM would be a perfect application for Propeller's determinism.
The time it will take for a message to traverse 64 nodes in a conservative 115200bps unidirectional configuration (UPSR) would be about 5ms. Maximum 2.5ms for a BLSR where rings are bidirectional.
While that is relatively slow, the advantages are: hardware simplicity, redundancy, software controlled connections, and minimal software footprint.
@Heater
I really like your idea for a small rigidly configured network. Hardware is forever pin constrained though.
I plan to implement neural node types in steps from simplest to more complicated as my bandwidth permits. I hope other programmers can join with productive effort.
The McCulloch-Pitts Model is simplest and will likely be produced first.
http://wwwold.ece.utep.edu/research/webfuzzy/docs/kk-thesis/kk-thesis-html/node12.html
A simple Percepton is a more complicated variation.
http://en.wikipedia.org/wiki/Perceptron
There are of course other more valuable and more complicated models. This is just a beginning.
Thanks Mike. I'll probably add summation to a PASM version.
BTW, I guess you noticed Google paid tribute to Les Paul Thursday the 9th. http://www.google.com/logos/2011/lespaul.html
Mike, somehow I expected the code for even a very simple neuron to be more complex than that. Unless other neurons are orders of magnitude more complex I get the feeling it should be possible to write a "neuron interpreter" program in PASM and instantiate neurons in hub ram with a table. Any thoughts on that idea?
kwinn and all; What's missing from my very simple McCulloch-Pitts Model is teaching and weighted inputs. I just hard coded a pattern each with a weight of 1. But absolutely, we should be able to code a PASM interpreter the reads and write to ram. As already discussed, the hard part is getting the data in and out. I imagine input data needs follow some formatting rules. there's also the teaching part which takes more RAM.
Honestly, I'm not totally dedicated to this neuron stuff... I have the Spinneret thing going on... but I do believe there is some merit to going down this road. I'm willing to help time permitting.
Have you given more thought to this?
One idea i've tossed around is to have a Super-HUB for message passing and data storage. There are advantages and disadvantages to this approach, but some issues can be resolved by inter cluster communications.
A Super-HUB is essentially an external memory extension of the Propeller's current architecture. That is, each Propeller would have access to external memory in a round robin way like COGs have access to on chip HUB RAM today. The access slot can be set by a pin counter from a master controller which can access the Super-HUB and do some common memory transaction setups. Two or more of such Super-HUB clusters can be used to add capabilities.
Some advantages to a Super-HUB would be simple hardware and fast access to messages and data. Hardware can be connected on a parallel bus and each Propeller would have equal access. Each propeller gets "atomic" access to the hub for reading and depositing data. Locks could be implemented as semaphores in the master's memory space to allow for sequential (non atomic) access to a resource.
Some disadvantages to the Super-HUB scheme are single points of failure (one master, one clock, and one RAM) and limited number of devices possible on a bus. While the disadvantages seem to be a deal killer especially with the limited number of devices, it may be possible to connect a number of Super-HUB clusters together.
The programming model for a Super-HUB would be about the same complexity as any other scheme for a single cluster, but it would require more infrastructure for connecting multiple clusters.
http://www.general-vision.com/products/chips-and-modules/CM1K-Chip/index.html
It has 1024 neural elements operating in parallel.
@jazzed. I have not had much time to think about it. This is the busiest time of year for me, and I will not get much free time until September. I am lucky to have internet access at some customer sites so at least I can read or answer a post or two while waiting for compiles, calibrations, or diagnostics to complete.
Having said that however, and having been in field service for so many years I find myself favoring the rugged and simple designs that offer some form of redundancy. So far I have not seen any proposals that would perform better or more reliably than a simple cubic array of props with multiple x, y, and z buses.
I do like your idea of external memory, but would prefer something more reliable than a single super hub. Perhaps a shared memory chip between a prop and each of it's neighboring props. That might even work as a replacement or addition to the bus system.
Well perhaps it not workable for a lot of connections. I just thought COGS can be swapping signals via LONGS in HUB which is a lot of possible connections and then multiple Props have less connection possibilities.
Again I know little of ANNs, it's a long time since I read up a little on it. I get the idea that running the ANN is quite simple but the training is much more complex and requiring a lot more horse power. So I imagine a set up where:
1) The Prop/Props are collecting input data from whatever system this is working in.
2) They forward this data to your PC where a ANN resides (Perhaps written in C or whatever) and all the training is done.
3) When you have a suitably trained network you download the weighs to the Prop which now runs the same ANN algorithms this time in PASM.
So yes the end result is a rigidly configured network that gets on with the job it is trained to do.
Of course this setup is a bit more complicated where the training involves, say, ensuring that a balancing bot stays upright. In that case the ANN being trained on the PC would need to return results back to drive the physical bot to see if it falls over or not and hence learn how to drive the thing. Or you have to include an accurate physics simulation of the bot on the PC.
Interesting stuff anyway.