Shop OBEX P1 Docs P2 Docs Learn Events
Yet anothe Arduino vs Propeller - what I am doing wrog? — Parallax Forums

Yet anothe Arduino vs Propeller - what I am doing wrog?

KonstantinKonstantin Posts: 4
edited 2012-11-17 09:12 in General Discussion
Hi all,
I am trying to make an application which is supposed to transmit sensor data to PC via serial over USB.
It is supposed to transmit a lot of data.
So I am trying to compare different boards performance.
Arduino Mega with 16 MHz clock speed transmists ~13000 reading in 10 sec.
Propeller board with clock speed of 80MHz transmits 8700 reading in 10 sec.
I somehow expected better speed from Propeller.
Is this correct or am I doing something wrong?
Here is Arduino code:
//random output speed test 

int x;  

void setup()
 {
  Serial.begin(115200);
  randomSeed(analogRead(0));
  }
void loop()                  
{
     x=random(300);
     Serial.print("DATA");
     Serial.println(x);     
 }

and here is Propeller code:


CON
_clkmode=xtal1+pll16x
_xinfreq=5_000_000
OBJ
  pst: "Parallax Serial Terminal"
PUB main |rand
rand:=300
pst.Start(115200)
repeat
  pst.str(string("DATA"))
  pst.dec(||(?rand)//300)
  pst.NewLine
    pst.LineFeed
PS it has nothing to do with different implementation of random generation, I have the same results when transmitting just a hardcoded number.
«1

Comments

  • Bill HenningBill Henning Posts: 6,445
    edited 2012-11-15 20:36
    Apples vs. Oranges,

    - The cogs actually run at 20 MIPS (as instructions (with a few exceptions) ) take four clock cycles on the prop
    - Spin compiles to an interpreted byte code, the Arduino compiles C to the native instruction set
    - using prop GCC would be a fairer comparison
    - the maximum speed for both is limited by the relatively low serial bit rate, try using 1Mbps
  • Duane C. JohnsonDuane C. Johnson Posts: 955
    edited 2012-11-15 20:42
    Hi Konstantin;

    My guess is this is a bit of an unfair comparison.

    As I recall the Arduino has a dedicated serial output buffer.
    (If this is not true then "Never Mind".)
    So there is little overhead coming up with the next character to send.

    Your program uses the same cog to both send the character and find the next character.

    I would use 2 cogs, 1 for the main routine and a second to send the character.

    Duane J
  • Mike GreenMike Green Posts: 23,101
    edited 2012-11-15 21:52
    I would second Bill Henning's comments, particularly that using GCC would be a fairer comparison. Spin is interpreted and the interpretive code is optimized for space, not speed. In addition, the Spin code for the serial I/O routines, particularly the conversion from decimal to characters and that for putting characters in the output buffer is not at all optimized. I'm more surprised that the Propeller did as well as it did given these limitations.
  • jmgjmg Posts: 15,140
    edited 2012-11-16 02:36
    Konstantin wrote: »
    Arduino Mega with 16 MHz clock speed transmists ~13000 reading in 10 sec.
    Propeller board with clock speed of 80MHz transmits 8700 reading in 10 sec.

    You can reality check the Baud limit, along the lines of {assumes both value prints average 5 chars length}
    (115200/10)*10/(4+5) = 12800 in 10 sec
    (115200/10)*10/(4+5+2) = 10472 in 10 sec

    Notice the second case shows the impact of 2 additional chars.
    It seems the fist code is not sending the NewLine.LineFeed, as the time is close to a continual 9 bytes.
    - so it seems close to half of the skew, looks to be message variation ?

    If you want to minimise the SW overhead, use larger packets.
    It also looks like the 2nd case is being pulled down by SW speed, whilst the first one is at Baud Ceiling, and so could run even faster at higher baud rates.

    Also note the AVR has a Hardware uart, so whilst a Byte is being sent, the core can be doing other tasks, but a Prop has no Hardware uart, so a single cog design has to bit-bang THEN get the next byte.

    That means a Prop will benefit from the highest possible BAUD speed, as that saves bit-bang time.

    Your 8700 can be modeled as two delays, a Char-send, plus a SW overhead,
    1/((1/26303)+(1/13000)) = 8700.0737
    ie if you can shrink that comms time to zero, by fastest-possible-baud, (or using a second COG), then the SW overhead can approach 26303 readings sent.

    Look at FT232H devices, which can run up to 12MBaud, and it looks like 4MBaud is the highest legal 80MHz fraction.
  • LoopyBytelooseLoopyByteloose Posts: 12,537
    edited 2012-11-16 04:12
    You can do a lot better on the Propeller, not sure you can do better on the Arduino. It is all about how fast your serial i/o is going.
  • prof_brainoprof_braino Posts: 4,313
    edited 2012-11-16 05:01
    As they said above. If you wanted to compare something to spin on the prop you would be closer using some type of basic on the arduino. Basic would most likely be interpreted and a more equal comparision to the spin.
  • Martin_HMartin_H Posts: 4,051
    edited 2012-11-16 06:12
    If you used cog memory model with GCC I bet the Propeller would mop up the floor with the Arduino.
  • prof_brainoprof_braino Posts: 4,313
    edited 2012-11-16 06:45
    Martin_H wrote: »
    If you used cog memory model with GCC I bet the Propeller would mop up the floor with the Arduino.

    if all we are concerned with is being fast, try 5achyon forth. Peter reports 3 mbps over bluetooth. can't beat that with a stick.
  • LoopyBytelooseLoopyByteloose Posts: 12,537
    edited 2012-11-16 06:54
    Martin_H wrote: »
    If you used cog memory model with GCC I bet the Propeller would mop up the floor with the Arduino.

    As I suspected.... Is this comparison done?
  • KonstantinKonstantin Posts: 4
    edited 2012-11-16 08:12
    As I suspected.... Is this comparison done?
    yep, I tried the same with GCC and it performed almost as good as Arduino, 11500 readings vs 13000.
    But I will need to use I2C to get real data. As I understand it there are ready to use libraries in Spin language, but how do I handle it in GCC?
    Do I need to write all I2C protocol engine from scratch? (no way :( )
    I was totally unaware there is no hardware UART implementation on Propeller...well, if it has to implement RS232 protocol programmatically then it explains everything. Thank you for replies!
  • Mike GreenMike Green Posts: 23,101
    edited 2012-11-16 08:23
    There is no hardware UART on the Propeller. On the other hand, the Propeller Serial Terminal I/O driver contains an assembly language program that's launched in another cog and operates completely in parallel with your program (and the Spin subroutines in the I/O driver) doing the "bit-banged" serial I/O. In terms of performance, it's equivalent to a hardware UART and has a 64 byte buffer.

    There is an I2C library driver included with GCC. I don't know what devices it supports other than memory (EEPROMs). You'll have to look at it. Most likely it has the necessary low-level functions to support nearly any I2C device, but you'll have to write the high level operations. For best performance, you may want to have the I2C I/O routines run in their own cog in parallel with your main program and communicate through a buffer of some sort. It all depends on what kind of I2C device and data you're talking about.
  • Heater.Heater. Posts: 21,230
    edited 2012-11-16 08:26
    Konstantin,

    Welcome to the forum by the way.

    ...if it has to implement RS232 protocol programmaticaly then it explains everything.

    Not exactly. A cog based serial object, ie written in PASM, like PST or FullDuplexSerial can drive the serial line at 115200 baud just as well as a hardware UART. I have even written a software UART in C that runs in COG and can do 115200 using propgcc.

    In your case you are using pst.str, pst.dec etc methods in Spin and I suspect that is where all the slow down is.

    Moving to GCC has a substantial speed benefit over Spin as you see.

    It's a design choice that here is no hardware UART I2C and other blocks in silicon on the Propeller, all such things are to be "soft". This makes for maximum flexibility of the device even at the expense of a little speed here and there.
  • Martin_HMartin_H Posts: 4,051
    edited 2012-11-16 09:25
    As I suspected.... Is this comparison done?

    Loopy, I'm no where near a Propeller chip, but the code below compiles, so it might do the trick.
    /**
     * @file SpeedTest.c
     * This is the main SpeedTest program start point.
     */
    #include <stdio.h>
    #include <propeller.h>
    
    // snippet from C stdlib
    static unsigned int next = 1;
    
    int rand(void)
    {
        next = next * 1103515245 + 12345;
        // return (unsigned int)(next / 65536) % 32768;
        return (unsigned int)(next>>16) & 32768;
    }
    
    void srand(unsigned int seed)
    {
        // And you *should* get a warning if sizes dont match.
        next = seed;
    }
    
    /**
     * Main program function.
     */
    int main(void)
    {
        // Initialize seed with the time. Note: time isn't a good seed, reading an input is better.
        srand(CNT);
    
        while(1)
        {
            printf("DATA%d\n", rand());
        }
        return 0;
    }
    
  • KonstantinKonstantin Posts: 4
    edited 2012-11-16 11:49
    Actually, I did it with the 'Step one' tutorial for GCC:
    #include <stdio.h>
    #include <propeller.h>
    
    int main(void)
    {
        while(1) {
            printf("DATA%d\n", 100);
        }
        return 0;
    }
    
    at 115200 this code produced, as I said, 11500 reading in 10 secs
    Memory model COG, optimization O2
    Arduino code (see first message) produced 13000.
    Arduino code does include CRLF (this is what println is for).
  • Bill HenningBill Henning Posts: 6,445
    edited 2012-11-16 12:00
    A prop has 8 cogs, you are using one for a serial driver, one for the code... and you still have 6 cogs left :)
    Konstantin wrote: »
    Actually, I did it with the 'Step one' tutorial for GCC:
    #include <stdio.h>
    #include <propeller.h>
    
    int main(void)
    {
        while(1) {
            printf("DATA%d\n", 100);
        }
        return 0;
    }
    
    at 115200 this code produced, as I said, 11500 reading in 10 secs
    Memory model COG, optimization O2
    Arduino code (see first message) produced 13000.
    Arduino code does include CRLF (this is what println is for).
  • KonstantinKonstantin Posts: 4
    edited 2012-11-16 12:09
    A prop has 8 cogs, you are using one for a serial driver, one for the code... and you still have 6 cogs left :)
    More to this, my board has plenty of buttons, switches, LEDs, VGA &PS2 ports... and I have all of them wasted for nothing. I am really sorry for offering to such a mighty contraption so primitive task. :)
  • jazzedjazzed Posts: 11,803
    edited 2012-11-16 12:29
    if all we are concerned with is being fast, try 5achyon forth. Peter reports 3 mbps over bluetooth. can't beat that with a stick.

    Maybe you guys can port that to Arduino so that those folks can have a fair comparison and fall in love with the easiness and greatness forth ... if anyone needs to be forth-proselytized it's those Arduino folks.
  • jazzedjazzed Posts: 11,803
    edited 2012-11-16 12:31
    A prop has 8 cogs, you are using one for a serial driver, one for the code... and you still have 6 cogs left :)

    Actually in the COG case quoted it is one and only one COG doing all the work.
  • Bill HenningBill Henning Posts: 6,445
    edited 2012-11-16 13:10
    He is using the "Parallax Serial Terminal" object, which cognew's the low-level serial code... so it appears to use two cogs.

    Mind you, he could change to the quad serial driver, and handle four serial ports... Arduino would not fare as well.
    jazzed wrote: »
    Actually in the COG case quoted it is one and only one COG doing all the work.
  • jazzedjazzed Posts: 11,803
    edited 2012-11-16 13:32
    He is using the "Parallax Serial Terminal" object, which cognew's the low-level serial code... so it appears to use two cogs. ....

    Not according to this:
    Konstantin wrote: »
    Actually, I did it with the 'Step one' tutorial for GCC:
    #include <stdio.h>
    #include <propeller.h>
    
    int main(void)
    {
        while(1) {
            printf("DATA%d\n", 100);
        }
        return 0;
    }
    
    at 115200 this code produced, as I said, 11500 reading in 10 secs
    Memory model COG, optimization O2
    Arduino code (see first message) produced 13000.
    Arduino code does include CRLF (this is what println is for).
  • Bill HenningBill Henning Posts: 6,445
    edited 2012-11-16 15:24
    You are correct about the C version, I was talking about his original Spin version:
    CON
    _clkmode=xtal1+pll16x
    _xinfreq=5_000_000
    OBJ
      pst: "Parallax Serial Terminal"
    PUB main |rand
    rand:=300
    pst.Start(115200)
    repeat
      pst.str(string("DATA"))
      pst.dec(||(?rand)//300)
      pst.NewLine
        pst.LineFeed
    

    My basic point to him was that his original comparison kept the Arduino 100% busy, while using 25% of a Prop.
    jazzed wrote: »
    Not according to this:
  • Martin_HMartin_H Posts: 4,051
    edited 2012-11-16 16:07
    I'm a bit surprised that an Arduino was a faster than a single cog. I suppose the built in uart helped. It would be interesting to compare the software serial to the Propeller to get a like to like comparison.

    As Bill says, the Arduino was going full speed while most of the Propeller was still idle. A better benchmark would entail interrupts versus cogs and some tasks requiring 32 bit arithmetic. An ISR would take cycles from the main loop, while two cogs can run concurrently, so that should work decisively for the Propeller.
  • Bill HenningBill Henning Posts: 6,445
    edited 2012-11-16 17:00
    I wonder if the propgcc version would perform better if printf was not used...
  • jazzedjazzed Posts: 11,803
    edited 2012-11-16 17:55
    You are correct about the C version, I was talking about his original Spin version: ...

    Well, I was responding to what you had specifically quoted. :)
    Anyway, I grokked the comparison and the need for a reasonable one.

    As for printf, yes, it's pretty big and slow except in the COG case.
    Still using a separate full duplex serial COG with simple wrappers should be faster - see attached.
  • rod1963rod1963 Posts: 752
    edited 2012-11-16 19:58
    Martin

    A more accurate benchmark would be to compare the Prop to the Pic32 or the mbed(LPC1768)since these are 32 bit processors. Comparing to a 8 bitter is apples vs. oranges.
  • jazzedjazzed Posts: 11,803
    edited 2012-11-16 20:46
    rod1963 wrote: »
    Martin

    A more accurate benchmark would be to compare the Prop to the Pic32 or the mbed(LPC1768)since these are 32 bit processors. Comparing to a 8 bitter is apples vs. oranges.

    I don't think comparing Propeller to PIC32 is a useful comparison either. They are certainly very different.
  • AribaAriba Posts: 2,682
    edited 2012-11-16 20:49
    My basic point to him was that his original comparison kept the Arduino 100% busy, while using 25% of a Prop.

    I'd say the Arduino is waiting 90% of the time for the UART-TX-READY flag, and can do other things in this time, especially if the UART transmit is done with interrupts.
    This code can not be seen as a benchmark, the lmiting factor here is the baudrate and not the execution speed of the processor / language.

    Andy
  • Martin_HMartin_H Posts: 4,051
    edited 2012-11-17 03:29
    rod1963 wrote: »
    Martin

    A more accurate benchmark would be to compare the Prop to the Pic32 or the mbed(LPC1768)since these are 32 bit processors. Comparing to a 8 bitter is apples vs. oranges.

    I don't use Pic32's or mbed's, but I use both the Arduino and the Propeller. Both are about the same cost and I would be genuinely curious which had more oomph. Frankly the 8 versus 32 bits isn't as important as oomph per $.
  • Heater.Heater. Posts: 21,230
    edited 2012-11-17 04:52
    A certain kind of "oomh" can be measured with benchmarks. Try running fft_bench on both the prop and Arduino.
    http://forums.parallax.com/showthread.php?129972-fft_bench-An-MCU-benchmark-using-a-simple-FFT-algorithm-in-Spin-C-and-...
    And'd don't forget that fft_bench only uses one COG to do it's work. I'm working on a parallel fft to see how far we can push performance there.

    Another kind of "oomph" is a bit more tricky to compare. Response to multiple, external real-time events. Clearly having a COG handle each such event can be a winner over using interrupts on a single CPU.

    A less obvoius kind of "oomph", which does not immediately hit you from reading the data sheets and manuals, is programmer productivity. With the Prop one can easily throw together a bunch of objects from OBEX or elsewhere, add a little of your own secret spice and you have a working project. No worries about real-time tasks fighting for interrupt or thread proiority or slowing each other down or introducing odd timming glitches.
  • LoopyBytelooseLoopyByteloose Posts: 12,537
    edited 2012-11-17 05:08
    jazzed wrote: »
    Maybe you guys can port that to Arduino so that those folks can have a fair comparison and fall in love with the easiness and greatness forth ... if anyone needs to be forth-proselytized it's those Arduino folks.

    Forth already has an ATmega model in full function, complete with an Arduino pre-written template image. The only thing is it overwrites the boot loader as the flash memory need to be written to one byte at a time for the Forth dictionary. You would have to reinstall the Arduino boot loader to get back to the nest.

    Why not use Tachyon Forth as Peter has gotten some pretty fast benchmarks that may knock the socks of an Arduino? Propeller will trump the ATmega's serial in many different programing languages.

    The main thing is that Konstantin is being introduced to the fact that you just don't need a hardware USART to get a good serial port. Assembly language can optimize the speed. It is programing at its finest.
Sign In or Register to comment.