Shop OBEX P1 Docs P2 Docs Learn Events
Can someone explain in layman's terms the real difference of P8X32A and say 328AVR? — Parallax Forums

Can someone explain in layman's terms the real difference of P8X32A and say 328AVR?

D_D_S_BD_D_S_B Posts: 8
edited 2013-11-05 06:38 in General Discussion
I am familiar with arduino, netduino, teensy and .NET C# so maybe not totally "layman's terms", and I am currently on the quest for a new micro for my next project and I've been eying the Due but I would very much NOT want to deal with the 144 pin BGA packages as my project requires very small pcbs. So the P8X32A offers 8 cogs running @ 80mhz, does that actually mean each cog will react the same as an arm running at 80mhz?

Thank you...

Comments

  • kwinnkwinn Posts: 8,697
    edited 2013-10-17 22:53
    Yes......sort of. The cogs are 32 bit cpu's so in some ways they are much more powerful than an 8 bit CPU at the same frequency. It does not have dedicated peripherals (serial, I2C, spi, etc.), but a cog can perform most of those functions in software. The cogs have only 2K of ram (512 x 32bits) so they are limited as to the size of programs they can run, however this is somewhat offset by having 32KBytes of hub ram. The hub ram can hold much larger programs in spin, an interpreted language optimized for co-ordinating programs running in individual cogs. It is a unique and very powerful architecture that is difficult to describe.
  • Mike GreenMike Green Posts: 23,101
    edited 2013-10-17 23:13
    "react the same as an ARM running at 80MHz" ... not really. We're talking apples and grapes here. They're both fruits, but very different. The Propeller has 8 identical 32-bit RISC (Reduced Instruction Set Computer) processors, each with it's own 2K byte memory, counter/timer pair, video generator, I/O registers with access to all 32 I/O pins. There's a single shared 32K byte RAM and 32K byte ROM and some other shared resources including a mechanism for the 8 processors to share these resources (the hub). It normally uses an external EEPROM, minimally 32K although it can use a 64K or 128K EEPROM. This is used to store the program and can be used to store data as well.

    The 328AVR has a single RISC processor, but it only works on 8-bits at a time. The 20MHz system clock is roughly equivalent to a Propeller cog at 80MHz since the Propeller takes 4 clock cycles to execute an instruction while the 328AVR can do one instruction usually in one clock cycle. The Propeller's instructions are more powerful if you're using 16 or 32 bit values since the 328AVR has to handle them 8 bits at a time. The 328AVR has a variety of peripheral function blocks to do things like serial I/O while the Propeller dedicates a cog to do the same functions although the Propeller is more flexible. For example, a single cog can provide 4 serial UARTs. A single cog can provide a video output with a text display 40 columns x 16 lines. One cog can run a speech synthesis vocal tract model and you can run 4 of these at the same time in 4 cogs to get 4 voices singing in harmony. A 5th cog can combine these by using delays along a stereo left to right axis to form a chorus.
  • Heater.Heater. Posts: 21,230
    edited 2013-10-18 00:01
    D_D_S_B,

    You are not familiar with the Propeller architecture and I do understand that if one takes a brief look at the documentation one might immediately think "What on earth good is that?". I mean, you will see that each of it's 8 processors can only directly execute 512 instructions from their private memory space. What use is that? The Propeller has no dedicated hardware support for UART or SPI or I2C etc. Hopeless. And so on.

    Direct comparison to other MCU's is not really possible.

    But the Propeller has it's own great advantages:

    1) Most of those expected peripheral features, UART, SPI, etc can be done in software running on a COG. (core). Software is ready made and freely available to do that. It is very easy to drop such functionality into your application and use it. That makes the Propeller very flexible. If you want 7 UARTS or 13 PWM servo drivers or odd combinations of peripherals you can do.

    2) The 512 instruction limit on native code size is not so bad. It's enough to make those peripherals. Larger programs are handled by the Spin language which is compiled to byte codes, run by an in COG interpreter. That all sounds complex but you don't see that, it's built in and very easy to use. Or you can program in C for a gain in speed although generating bigger code.

    4) Because there are 8 COGs (cores) it's easy to run code in parallel. There are no interrupts to worry about. They cannot "steal" time from each other. They just run independently. This makes mixing and matching your code and code objects from others (See OBEX) very easy. Running things in parallel is not hard on the Prop, in fact it is just the natural thing to do.

    5) Any Pin on the Prop can be used for any purpose (except perhaps the two loader pins and two eeprom pins) this means you never have to worry about which pin is used for which hardware peripheral or what alternatives the pins can be set to. PCB layout can become easier as you can move functions to any pins convenient in the layout.

    6) The Prop can do video if you like.

    Best would be if you could say something about your planned project: What peripherals and interfaces will it need? How much code might be in there. How fast does it need to go. Then we can get an idea if the Prop is a reasonable solution and suggest how it might be done.
  • LoopyBytelooseLoopyByteloose Posts: 12,537
    edited 2013-10-18 02:15
    In layman's terms, the differences between any two microprocessor architectures is best appreciated by actual use over an extended period of time.

    I doubt that there are any of us that have NOT been in the position where we wished that we could acquire a simple comparison before getting involved.
    But I also think the general concensous amongst those that really know is that learning to fully use any processor is better than general speculation of pros and cons without a speicific task.

    If you have a specific application and you hit upon limitations, that is generally when comparisions between microprocessors become lucid. IOW, features related to your desired application are very important.

    So, the criteria for preliminary comparison may not be so obvious. Quality of the support and ease of learning may actually be of foremost importance for someone that does not have a lot of experience.

    With that in mind, the Propeller offers support for color VGA and NTSC/PAL video with keyboard and mouse.. many other devices do not.

    The Propeller1 requires that you build your ADC and DAC extermally; while a lot of other products have these built in. The lack of internal ADC and DAC may seem at first a big negative if you application needs those; but the Propeller will actually teach you more about ADC and DAC by creating them in firmware. You may actually end up doing more with less. And if you prefer you can use a wide selection of ADC and DAC chips as your project requires.

    Apples and Grapes? That is always the case. Since the 1970s there have been endless benchmarking debates for digital processors... and they are still endless.

    I believe that each cog will run continously at 80Mhz if each is deployed in an intependent task and not passing data to other cogs (but the pipeline is upset by each branch or test in your program logic). And when you start passing data through the Hubram or the i/o pins; some delays are inherent. And there are other features in each cog (two counters that can run independently at 80mhz).

    Is this quest for speed a genuine need or just an attempt to create yet another benchmark?

    I suspect the real Propeller1 advantage versus a 328AVR is that of having 32bit processing combined with 80mhz over the 328AVR having 8bit processing at 20Mhz, but that is a very broad brush and the devil is always in the details.
  • prof_brainoprof_braino Posts: 4,313
    edited 2013-10-18 04:16
    D_D_S_B wrote: »
    maybe not totally "layman's terms"

    Right tool for the right job - do you need to respond in a consistent (realtime) manner to multiple 12.5ns events? If yes, consider prop. Also its way cool. A very nice bit of tech to have in the tool box.
  • LoopyBytelooseLoopyByteloose Posts: 12,537
    edited 2013-10-18 10:35
    Well, to grossly over simply...
    If you take the Propeller at 80Mhz and 32bits and compare to the 328AVR at 8bits and 20Mhz.. you roughly have one processor faster by a factor of 16. But since the pipeline can be 4 times slower on the Propeller in a worst case, we can reduce the factor to 4 times faster.

    After that, you take the 4 times faster factor and multiple by 7 cogs (8 minus one for communications) and you get 28 times faster.

    ++++++++++++++++++
    This is gross over-simplification... so let us just say you might find the Propeller twice as fast on a good day.

    (Maybe we should just ban laymen to avoid these kinds of questions... arrgh.)
  • Invent-O-DocInvent-O-Doc Posts: 768
    edited 2013-10-18 11:55
    I think the answer for you depends on what you are trying to do with the microcontroller. Oh, don't forget that propeller now has GCC for C and C++, which is pretty nice and has faster execution that SPIN. There are not a lot of objects for it yet, but spin/PASM objects can be converted with a tool called spin2cpp and a little bit of work.
  • D_D_S_BD_D_S_B Posts: 8
    edited 2013-10-18 17:05
    Great response guys! Thank you!
    So I think I should get a fairly good idea of how my program is going to work, then weigh out the best option, but right off the bat I can already see the advantages.
    Example:
    1. Cog - Bluetooth connectivity to smart phone
    2. Cog - Led strip register setter
    3. Cog - Pixel generator
    4. Cog - IR remote handler
    5. Tripple access accelerometer handler
    6. Tripple access gyro handler

    hmm ;>
  • D_D_S_BD_D_S_B Posts: 8
    edited 2013-10-18 17:19
    Heater. wrote: »
    D_D_S_B,

    Best would be if you could say something about your planned project: What peripherals and interfaces will it need? How much code might be in there. How fast does it need to go. Then we can get an idea if the Prop is a reasonable solution and suggest how it might be done.

    Its a pretty complex led project. (lpd8806) SPI, (Just Data and Clock)

    Bluetooth connectivity to smart phone app.
    12 Pixel generators, possible to run all at once (each frame).
    8 Color generators which control pixel color, every layer will have their own (each frame)
    Accelerometer to alter pixel generator algorithm
    Gyro to alter pixel generator algorithm

    All of which is a framework to "display" user created setting from my WPF .NET GUI.
    Wireless bluetooth from computer - setting uploader
    USB RS232 - re-flash any updates

    And I need to be able to achieve a LOT of frames per second (at least 150 - 200), I am a fanatic about FPS and led strips, I can't stant slow fps. smooth smooth smooth. Off the bat I am getting over a thousand with the due, but of course that will dramatically drop once I have the framework implemented.

    My biggest gripe with the due is the sheer size of the micro, and I will need to use the BGA package for my pcb.
    1. - I don't want to have to route that in sparkdesign
    2. - I sure don't want to reflow those.

    I absolutely need a beast to handle this program in order to achieve the fps I require, the first problem with that is beefier chip equals beefier footprint, this is why the P8x32A is very appealing.
  • jmgjmg Posts: 15,173
    edited 2013-10-18 17:28
    D_D_S_B wrote: »
    And I need to be able to achieve a LOT of frames per second (at least 150 - 200), I am a fanatic about FPS and led strips, I can't stant sow fps. smooth smooth smooth. Off the bat I am getting over a thousand with the due, but of course that will dramatically drop once I have the framework implemented.

    How many pixels/bits per frame, and where do you need to store that information ?
  • D_D_S_BD_D_S_B Posts: 8
    edited 2013-10-18 17:35
    200 pixels (leds) max, 32bit per led. Will probably need to store the information for each setting in EEPROM.
  • D_D_S_BD_D_S_B Posts: 8
    edited 2013-10-18 17:38
    Of course I will have to drop any bitmaps and pixel maps into RAM from eeprom :innocent: That may be a breaker.

    Typically a bitmap I would allow would be (50 * 50) * 3 bytes (7.324219 Kilobyte), Each COG only offers 2. break; err
  • Invent-O-DocInvent-O-Doc Posts: 768
    edited 2013-10-18 21:23
    You can hold the bitmap in hub ram, it will still be fast. There is an lpd8806 driver in obex if you dont want to make your own.
  • jazzedjazzed Posts: 11,803
    edited 2013-10-18 21:28
    Maybe a little confusion here.

    Why would you want to store data in a COG ? The main SRAM (32KB HUB RAM) is better suited for data.

    COGs are suited for 20MIPS each code tasks and peripheral implementations.

    Why do you need to store anything in EEPROM?
  • KyeKye Posts: 2,200
    edited 2013-10-20 09:22
    Use a 6 MHz crystal instead of the standard 5 MHz crystal in your design. The extra 20% speed bump to 24 MIPs per cog really helps.

    ...

    One thing, to really unlock the full power of the prop chip you have to either program it in ASM or use the GCC cogc feature to generate the cog asm code for you.

    GCC cogc is very powerful however, one of our forum members implemented the pong game (with video output to a screen) in just one cog. So, you can fit about a 300-500 line driver in the cog ram. Then interface that with your main processor executing code from the hub at a slower speed that does the business logic. There are some examples of doing in the propeller gcc distribution. Use the SimpleIDE tool to get started.

    ...

    You can get 20 MHz writes with the prop chip. However, the code to do that is a little bit tricky, but, 1-2 MHz speeds are possible by just making a loop toggling the pins manually.
  • LoopyBytelooseLoopyByteloose Posts: 12,537
    edited 2013-11-05 02:16
    I wasn't aware of CogC, but having a C compiler that targets optimized Cog code really makes a lot of sense... after al each Cog is indeed independent in all aspect except calls to Hubram, and the actual machine instructions are rather short programs.
  • Martin_HMartin_H Posts: 4,051
    edited 2013-11-05 06:38
    Something I hadn't considered until recently is that although the 16 KB Flash program space on the 328AVR is large, it's RAM is only about 2K. This constrains the amount of state information a program can hold (e.g. a video buffer, serial buffer, stack and heap space). You won't notice that for simple C applications, but it starts to become noticeable as your experience with both MCU's grows. Also that's Flash, not EPROM, so the number of reprogramming cycles is theoretically more limited as well. Although I haven't had the Flash or EPROM go bad on any of my chips yet.
Sign In or Register to comment.