Use of the Propeller Processor

amveronis · 2011-05-20 18:19

I am so disappointed at the way the Processor is being used and articles that describe. To begin, I am a new subscriber and do not know if the same comments have been registered by someone else. If they have, so be it.

The Propeller is a parallel processor and a rather powerful at that. I have been researching the topic of parallel processing as well as teaching it for many years, since the time the Connection Machine, the Maspar Parallel Processor, and other similar beasts were alive. I even followed with interest the Beauwolf project. I expected some serious articles about using a number of Propellers connected together to form a parallel processor. And yet, all I have been reading is hooking up the propeller to turn an LED on and off or to make the propeller drive a speaker! Mundane things that an 8-bit PICAXE chip or an arduino can do without much effort. There are a couple of books out, but, again, very little, if at all, said about parallel processing. Isn't there anyone out there that can write some serious stuff about this truly remarkable and beautiful chip, stuff like sorting out a database with a million items in it in a matter of a couple of minutes. If so, please, let me know so that I can communicate further with that person.

Dr.Andrew Veronis

doggiedoc · 2011-05-20 18:50

Dr Veronis, let me be the first to welcome you to the forums. I am sure you will find more information once you've had time to meet some of the veteran propeller users here. There are several threads on multiple propeller projects that my be helpful for you.

This is one that comes to mind and may be a good starting point for other links on parallel processing with propellers.

Others will surely chime in.

Again, welcome!
Paul

Heater. · 2011-05-20 20:09

amveronis,

Yes, welcome to the forum. You will find a lot of very enthusiastic, helpful and talented Propeller users on this forum. It's a constant surprise to find out the amazing things that have been done with this chip.

There is a running joke around here that you need only say that such and such a task is impossible on the Prop and someone will almost immediately pop up and show you how it is done. Making a USB host controller for example.

There are many here that bemoan the fact that the Prop gets little media coverage compared to other more traditionally architected micros. But I guess that is up to us users to shout more about it.

There are also many who have toyed with the ideas of parallel processing on the Prop and the tantalizing idea of multiple parallel propellers. I'm sure they will comment here soon.

However from a practical perspective things like:

...using a number of Propellers connected together to form a parallel processor.

and

...sorting out a database with a million items in it in a matter of a couple of minutes

are not reasonable applications for the Propeller and can almost certainly be done faster and cheaper with a single traditional processor.

Why is that?

1) Whilst a Prop can execute instructions at 20MIPs that is only within the confines the 496 instruction space of a COG. As soon as you need access to a lot of data in HUB RAM or elsewhere your execution rate drops. As soon as you need bigger code space, fetching instructions from HUB with ovelays or the LMM technique you execution rates drops to less than 5MIPS.

2) Memory limitations 496 COG registers and 32KB HUB RAM is not very much for serious computing work.

3) No provision for high speed communication between multiple Props.

There I said, it's impossible for the Prop to out perform a regular CPU for serious parallel processing tasks, or at least not for the same cost and without a lot of extra complexity. Perhaps someone will now prove me wrong.

Not to be discouraging you from using the Prop, it's just that I don't feel it is suited to the tasks you propose.

If you like a challenge I have a Fast Fourier Transform for the Prop that runs in one COG. The FFT can conceptually be easily distributed over multiple parallel processors. So the challenge is to speed up that FFT by distributing over 2, 4 or more COGS:)

potatohead · 2011-05-20 20:11

Welcome. It is a beautiful design. Warning: Very addictive. That said, jump in and have fun. As has been written already, there are a lot of very sharp people here, and a synergy that's compelling and productive.

Cluso99 · 2011-05-20 21:16

I too will add my welcome words. The Prop is very addictive because of its architecture and instruction set.

Large databases using multiple cores and chips are not necessaryily its best use, depending of course on the application. For instance, by using external SRAM as a fast store could certainly be a way. But this does not permit multiple cogs to access the SRAM concurrently because any locking would be too much overhead.

Any code that can be truly shared between cores should shine as long as the code is small or LMM (large memory model where the pasm code is fetched from hub to be executed in cog) and its sacrifice in speed is acceptable. The advantage of the prop is that each core can execute its code without fear of interrupts. Of course this requires a different way of thinking, but in the end it becomes a pleasure not having to deal with the complexities of interrupts. You can simply debug small modules of code and add them together. This speeds up development time enormously.

We also use the cores to implement intelligent peripherals, rather than have them in hardware to begin with. This means they can use virtually any pins. And we only have one chip to deal with, no matter if we want 8 UARTs or 6 SPI or VGA and TV, etc etc.

Once again, welcome. Feel free to ask anything you like - this is a fantastic forum.

amveronis · 2011-05-21 11:42

I thank everyone who has welcomed me to the forum. I now realize that I will be dealing with serious people, yet friendly. I like that!

I have not played with the Propeller a lot as yet, but, you are right, all about this beautiful chip is addictive. The easy-to-use IDE, the included examples, ALL of which work, and the innovative architecture. I also realize that the overall capabilities are somewhat limited to do, say, sorting of a very large database in a couple of minutes, but nevertheless the Propeller is indeed a parallel processor of a small scale. There is nothing else around. I did run the example with the three LEDs, each manipulated by a different cog, and saw parallel processing. Whoever has designed this processor and its languages needs to be entered in the Science Hall of Fame.

I bought the Gadget Gangster board, and this one is also a thing of beauty!

I will be using the Prop in some teaching and would be grateful to all of you for any answers to my questions. For starters, how can I invoke the Serial Terminal so that the latter can be activated automatically when a result has a reply to be shown at the terminal?

Andrew

potatohead · 2011-05-21 11:55

The designer is Chip Gracey, and he's self-taught. Real, live, fun to talk to, genius who will very likely be hacking around tonight with all the propheads that made the trek to Rocklin, CA, and who is also very humble about it too. Great guy, and his spark for this stuff is catchy as hell.

If you look at the live stream thread, you can probably see him speak at some point today. Prop II goodies, we hope! Anyway, Chip's design brought me back to doing this stuff, and I've had a good time so far.

(wish I could have gone this year)

Prop I is entirely custom, down to the last polygon, basically designed by one person from start to finish. It shows. http://www.parallax.com/dl/docs/article/WhythePropellerWorks.pdf

Mike Green · 2011-05-21 11:57

There is no built-in mechanism for what you want (automatic invocation of something like the Serial Terminal when data is sent or to be sent). Such a mechanism exists for the Stamp Editor in that, when the Stamp Editor is running and a DEBUG statement is executed, the Stamp Editor will detect this and open a DEBUG window. BST (a free 3rd party Spin compiler / assembler / IDE) allows you to open a terminal window and properly shares the serial line between the PC and the Propeller. When you compile and download a program, BST disables the terminal window, downloads the program, then re-enables the terminal window.

Granz · 2011-05-21 14:13

Again, welcome.

I believe that Heater hit it directly; the Propeller is a control system and not a data processing system. It was never in the game plan for the Prop to be handling huge databases, or continent-wide weather prediction. On the other hand, a Cray XMP is kind of overkill for controlling a robot or most of the other control jobs for which we have tasked the Prop. And, you can get a whole lot of Prop systems for the price of a decent PC.

That said, take a look at Humanoido's parallel-processing projects. Some of those look pretty powerful and may do what you are looking for. (Take a look at: http://forums.parallax.com/showthread.php?124495-Fill-the-Big-Brain)

wjsteele · 2011-05-21 16:27

Take a look at my Wingman on page 8 of the Parallax manual. It utilizes the Prop the most of any program I've ever seen. I have 7 cogs running PASM and one running SPIN utilizing most of the available RAM. We even use the LMM in one of the cogs to support additional functions by swapping them in from the upper 32k of the 64k eeprom.

The parallel tasks we schedule include processing the IMU data (gyro, accel and magneto,) GPS NMEA parser, dual-serial driver, SD/eeprom driver, data filtering, LCD/I2C driver, Graphics library and a "Miscellanous Functions" processor.

We're down to the point that squeezing any more functionallity in the core requires us to find bytes one at a time in our optimizations to squeeze it in. By the way, the Wingman is so complete, it even has 2gig of on board flash that we use to reference terrain, obstacle and navigational data for flight planning (including rendering highway in the sky.)

Our next design actually uses a secondary prop to help offload some of the swapped functions from the LMM model to speed it up. We're breaking it up into graphcics/lcd/data and inertial reference/gps/main loop functions between the two Props.

The power of the Prop is simply amazing. All the tricks we pulled in the Wingman's code is available right here on this forum... it's just a matter of keeping your eyes open and pulling it all together.

Bill

Dr_Acula · 2011-05-21 17:34

And yet, all I have been reading is hooking up the propeller to turn an LED on and off or to make the propeller drive a speaker! Mundane things that an 8-bit PICAXE chip or an arduino can do without much effort.

Welcome to the forum. Speaking for myself, I'm using the propellor because picaxe and arduino cannot do the things I want to do. With $10 in my pocket I can get more out of a prop than out of other chips.

Flashing a led is just a beginners project. The prop can do more - run a complete emulation of computers from the 1970s to about the mid 80s. Fly a quad copter. Run a music synthesiszer. Play movies. http://www.smarthome.jigsy.com/propeller Run a windows lookalike GUI. http://forums.parallax.com/showthread.php?131741-A-GUI-for-the-Propeller

It is the parallel nature of the propeller that makes much of this possible - eg it can scan a keyboard, scan a mouse, run four serial ports and run a VGA or TV display, all at the same time. It is not possible to do that with a picaxe or arduino because you would need so many interrupts that the display would never work.

There is much to explore with parallel processing with multiple chips. I don't think we have even scratched the surface yet.

prof_braino · 2011-05-21 17:38

amveronis wrote: »

I am so disappointed at the way the Processor is being used

... researching the topic of parallel processing ... that person.

Hi Dr Veronis, and Welcome

A couple folks were discussing parallel processing on this thread:
http://forums.parallax.com/showthread.php?126589-Communicating-Sequential-Processes-by-C.A.R.-Hoare

I am also interested in parallel processing and how to do some on the prop.

From the discussion it turns out there are several different classes of parallel processing, some of which are not so applicable to the prop, while others might be. What kind of task are you looking at performing?

My team has a method for connecting and arbitrary number of props with minimum hardware (seven wires, 5 I/O lines, and three resistors per additional prop). We are looking at how to arrange problems so they can be addressed by our configuration. That's a big part of the trick right there.

I'm working on a variation of the Sieve of Erotosthenes to test where additional cores or props has an impact. But the effort got side tracked due to adding extra memory and refactoring etc.

I would be inteested in hearing more about your investigations.

amveronis · 2011-05-22 17:31

I am happy to know the name of this excellent designer. I would also like to know what Chip had in mind when he came up with Prop. A few of the reasons parallel processing systems, like the Connection Machine, failed is that (a)They were expensive and (b)Their designers thought that the units could replace general computing. Wrong. I am very impressed with the Prop, and, yes, I have started getting addicted to it. What I would like to do is write a workbook for the newbies who have no idea what parallel processing is and use the Prop as the vehicle for this workbook. Unfortunately, I have to depart from mundane tasks that an Arduino or a PICAXE can do, such as turning on and off an LED or having a dc motor turn left and right. I would like to use the prop on a small FFT project, etc. I would be ever so grateful to Chip and to all of you if you can guide me as to what I should include in this workbook. Yes, there is the Fundamentals kit and workbook, but I want to go a bit further. Believe it or not, some university profs still want to teach parallel processing, and the prop is one of the best ways to do it. Yes, the prop does not have enough memory, but you can demonstrate simple examples.

Andrew

amveronis · 2011-05-22 17:57

Heater

I would be very grateful to you if you send me your FFT code. As for doing serious parallel processing work and using a single processor of those available currently, I have serious doubts.

(a) I do not know of any currently available processors that is equipped internally with a number of sub-processors, or COGS. Therefore, any parallel processing will have to be done by connecting the processors externally. This will be too much overhead. We can hook up several Arduinos on an I2c bus and use the Master-Slave approach, but I am not too crazy using a number of processors on the same bus running at 16MHz!

(b) The command set of a currently available single processor must contain facilities that handle semaphores, etc for communication between processors. There are naturally other ways, albeit costly.

(c)The physical size should also be a consideration.

(d)Power requirements should also be a consideration.

(e)Danny Hillis' machine contained 64,000 single elements but occupied a fair amount of space and required a good amount of cooling.

We all need to promote the Prop and bring it to a level that it so deserves. Yes, there is the ARM, but that is a different animal. I just read a post , where a group of engineers has run the Prop at a speed of 114MHz. That is not bad at all.

Andrew

amveronis · 2011-05-22 18:00

Dr Acula

I fully agree with the last sentence of your post. I would be ever so grateful to you for ideas or code that I could include in my proposed workbook, whenever I get around to write it.

Andrew

amveronis · 2011-05-22 18:08

Hi Mike

I noticed that one of the projects in the Fundamentals Kit uses the serial monitor for output display. How does this monitor get activated?I did download it but don't know how to bring it into a project.
Andrew

bill190 · 2011-05-22 18:58

I did something interesting with the Propeller...

That was use one Propeller chip to develop and at the same time test an IR transmitter.

One cog was used to create a 38 kHz carrier.
Another cog was used to modulate the carrier (or send the bits).
And a third cog was used to receive the IR signal and display on a PC what I was sending.

So on my breadboard, I had an IR LED transmitter pointing at an IR receiver and both connected to the same chip!

madrfskills · 2011-05-22 23:28

Welcome to the forum;

As others have pointed out, the propeller architecture is not what you are looking for in you are interested in massive data set manipulation or performing complex mathematical operations parallelized across a structure. What I would suggest you look into is the Cell Broadband Engine Architecture (http://en.wikipedia.org/wiki/Cell_(microprocessor)). I'd love to have the time to play with these!

That said, I've been able to do crazy things with a Prop I that would be exceptionally difficult on a single core (albeit multithreaded) CPU. The key is to make the mental shift away from interrupt-driven processes and embrace memory- and semaphore-linked concurrent processes. What the Prop I brings to the table is the ability to control multiple processes at the same time with absolute determinism. It also excels at handling events occuring at much different rates - for example, waiting for human input while simultaneously performing fast-as-you-can-go data acquisition and processing - requirements hard to balance using interrupt handlers in conventional hardware.

Case study - three years ago I did a control system for a frequency-hopping RF transmitter using the Prop I.

Cog 1 was dedicated to generating the frequency hop pattern and commanding the RF synthesizer hardware to tune as appropriate. This cog was also responsible for blanking output during tuning. Very simple to write this one.

Cog 2 performed power metrology on the transmitted waveform using two 16-bit A/D's, some Analog Devices log detectors, and lots of other hardware. This cog would sense when cog 1 tuned to a new channel and transmitted a burst, would run the A/D repeatedly, apply calibration correction factors to the data, and store the resulting power information in main RAM. Uses integer arithmetic.

Cog 3 performed closed loop feedback automatic gain control (2nd-order loop, implemented in integer arithmetic) based on the data obtained in cog 2. From a math perspective this was the most "fun", but the implementation was easy.

Cog 4 would perform the floating point math needed to convert the power measurements from cog 2 into something human-readable and would pass the information to cog 5. Modified some code from the community.

Cog 5 would update a LCD display and serial interface (RS-422) with the operator station and other equipment (transmitter set)

Cogs 6 and 7 would run a NMEA string parsing service for a GPS receiver, to establish time sync for the hop pattern. Modified some code from the community.

Cog 8 did some housekeeping chores for A/D conversion and also ran a D/A converter and some other stuff for diagnostic purposes.

The bottom line is that we were able to build an extremely effective, high end control system on the order of $500, packaged. This was after an otherwise very good team utterly FAILED to achieve the same using very high end DSP hardware - our Prop effort was an unimportant, back-burner, backup development effort done as 1) a curiosity and 2) just in case the "real" effort had problems. The system is still up and running well.

The $12 Prop I probably saved my job that week...

V/R
Mike

Heater. · 2011-05-23 07:37

amveronis,

You can find my attempt at the FFT for the Propeller in Spin and PASM here:

http://forums.parallax.com/showthread.php?128292-Heater-s-Fast-Fourier-Transform.&highlight=heater_fft

I suspect the thing could be optimized some more but as it stands the Spin version is written for utmost simplicity of understanding, as a pseudo code if you like, and the PASM version was supposed be a direct and simple translation of that. The whole thing was the result of my trying to understand how on earth an FFT works anyway and a long discussion about it here: http://forums.parallax.com/showthread.php?127306-Fourier-for-dummies-under-construction&highlight=heater_fft.

I was amazed to find that I could finally write an FFT from scratch, no peeking at existing implementations just the maths in hand. I quite like the idea of splitting it over two or four COGS to get the speed up. Perhaps just two is optimum if you actually want the Prop to be useful for anything else at the same time.

There is another FFT implementation in PASM by Ale here: http://propeller.wikispaces.com/FFT. I have yet to have a play with that one.

(a) I do not know of any currently available processors that is equipped internally with a number of sub-processors, or COGS.

Well, I could tell you of a such a device, 4 cores with 8 independent hardware threads each and 64K RAM for each core. Running of a 400MHz clock. But then I would be lynched (again) by the forum members here:)

Leon · 2011-05-23 09:22

I mentioned it on another thread of his about parallel processing, and nothing happened.

Heater. · 2011-05-23 09:37

Leon,

Strangely enough it is currently the first in the list of similar threads that I am seeing now. That's despite the fact it has not been mentioned by name in this thread at all.

This forum systems obviously knows us too well:)

RiJoRi · 2011-05-23 13:38

amveronis --
A few years ago I wrote some code for a Prop Starter kit to control some LEDs for what started as a Halloween display but wound up as a Christmas Tree display. (I'll tell you tomorrow whether or not I procrastinate!)

One cog does the PWM-type stuff to make the topmost, white LED brighten and dim while flickering; the main code drives 8 color LEDs in an almost-random pattern. Because of the limited outputs on the Starter kit, I used a 74LS138 as a driver.

I'm mentioning this because it is an intermediate-style project; more than flashing an LED, but less than many other projects here. I looked at the code when I was finished, and was amazed at how concise the SPIN code is!

--Rich

magdrop · 2011-05-25 15:01

A good project for taking a stab at real-time parallel processing might be to try and create a low-light-level camera system. Just as a proof of concept, it is possible to obtain cheap, low resolution, digital cameras from China, [url=http://www.dealextreme.com/p/mini-2-in-1-digital-camera-and-webcam-300kpixel-6656#open full view]such as this one from DealExtreme[/url]. It has 640x480 pixels and its supposed to have 4x16Mb of SDRAM built in.

The question is whether or not one of these things can be hacked in such a manner that the SDRAM can be used as several re-circulating buffers, in order that horizontal and vertical filtering can be performed on the fly, in addition to implementing a scan-to-scan FFT function on each filtered pixel. This scheme should achieve an image at very, very low light levels. The idea is that the determinant timing of the Prop can be utilized to execute one (incredibly large) continuous calculation in a pipeline fashion, on each pixel as it comes in. Then, several clock cycles later, do it all again on the next pixel. It would take several seconds for the pipeline to fill up initially, but then it could process a whole lot of raw video on the fly.

Assuming a video frame rate of 5-to-10 frames/sec and a pixel rate in the neighborhood of 2 or 3 Mhz, there would probably be plenty of processing speed for the digital filtering, but I have no idea how much overhead would be used in order to implement the FFT. My guess is that more than one Prop will be necessary, but I have no way of knowing that at this point. My arithmetic skills are not up to the task of figuring any of that out. And, of course the real challenge with this type of processing isnt so much about how to add more processors, but in how to keep the data synchronized so that it remains one coherent equation throughout the whole pipeline. In other words, as each pixel comes in it has to be factored against all of the previous readings from that particular pixel, in addition to being compared to its neighbors. (This proposed approach to implementing a low-light-level camera would not be very useful for moving video. At these slow frame rates the thing would have to stare for several seconds (or minutes) in order for the transform to do its magic.)

If its possible to get something like this working well with the Prop, then when the Prop II arrives the road for developing more of these real-time signal processing techniques will be better traveled. For $15, Im ordering one of these cameras tonight, in any event, just so I can tear it apart and see whats in it. I'm going to go try and muddle through the thread on Fourier Transforms for Dummies while I'm waiting for it to get here.

Use of the Propeller Processor

Comments