Shop OBEX P1 Docs P2 Docs Learn Events
Empty Loop Benchmarks - Page 2 — Parallax Forums

Empty Loop Benchmarks

2»

Comments

  • David BetzDavid Betz Posts: 14,516
    edited 2011-07-16 20:04
    RossH wrote: »
    Hi all,

    I agree with Heater - many C compilers would simply optimize the empty loop out altogether, which makes this a particularly unreliable benchmark for comparing real-world program performance. However ...

    Catalina: 751 microseconds.
    #include <catalina_hmi.h>
    #include <catalina_cog.h>
    
    void main() {
       long a, b;
       int i = 1;
       int j = 1000;
     
       a = _cnt();
       for (; i <= j; i++);
       b = _cnt();
       t_printf("Ticks = %d\n", b - a);
       t_printf("Microseconds = %d\n", (b - a)*1000/(_clockfreq()/1000));
       while (1);
    }
    
    Compiled for a C3 with the command:
    catalina loop.c -lci -D C3 -O3
    
    Ross.

    Is that with code in hub memory or in flash?
  • RossHRossH Posts: 5,519
    edited 2011-07-16 20:11
    David Betz wrote: »
    Is that with code in hub memory or in flash?

    Hi David,

    751 microseconds is executing from Hub RAM.

    When recompiled to execute from XMM it takes 5210 microseconds (5.2ms) - still about twice as fast as Spin. The command I used was:
    catalina loop.c -lci -D C3 -O3 -D LARGE
    
    EDIT: Sorry, David - the above results are for executing from SPI RAM. not SPI Flash. To execute from Flash I have to use the cache, which slows things down a bit further - 9156 microseconds - i.e. about the same speed as Spin. The command I used for this was:
    catalina loop.c -lci -D C3 -O3 -D LARGE -D FLASH -D CACHED_1K
    
    Ross.
  • CarlJacobsCarlJacobs Posts: 8
    edited 2011-07-16 23:04
    For JDForth the results for the empty loop are:
    1000 0 DO LOOP
    
    in 1.4 milliseconds.
    1000 0 DO 1 +LOOP
    
    in 2.4 milliseconds.

    Another simple test is calculating Fibonacci numbers:
    For Spin:
    PRI Fibo(n) : val
      if n =< 1
        val := n
      else
        val := SpinFibo(n-1) + SpinFibo(n-2)
    
    Calculates Fibo(25) in 6773 milliseconds

    For JDForth:
    : Fibo ( n -- n )
      DUP 1 > 
      IF DUP 1- Fibo1 SWAP 2- Fibo1 + THEN
    ;
    
    Calculates 25 Fibo in 1529 milliseconds.
  • prof_brainoprof_braino Posts: 4,313
    edited 2011-07-17 07:10
    jazzed wrote: »
    I can't imagine maintaining a substantial sized program in forth. :smile:

    I just noticed this comment being specific to forth. Not to subvert the thread (I'll remove the post if need be) but this is kind of important in the grand scheme of things:

    Any program or application with sufficient process (documentation through testing) can be maintainable;
    Any program or application with insufficient process is more likely to be a nightmare.

    I wrote the of a user interface for a building control application (control points, alarm point monitoring) running forth, which spanned multiple buildings on multiple campuses on multiple sites. Very huge and complex for its time, couldn't be done using any other existing tools in the time allotted, never had a problem to my knowledge, and was maintained by the client.

    I've done SQA work (much more recently) for many clients with products NOT written in forth (C mostly, but only because its so common) some of which have more bugs than functions, and "fixing" one function breaks another. This is mostly due to conflicting requirements, and poor process management, and neglect of suitable tests. This has NOTHING to do with the choice of language or environment.

    While a particular language may "lend" itself well to a particular class of application, the language ultimately has very little to anything regarding the success of the effort. It all comes down to the skill of the people managing and engineering the project. Poor process yields result inversely proportional to the complexity of the project. Neither C nor forth nor Erlang can save us from this. I don't know much, but I'm pretty sure about this process stuff.
  • jazzedjazzed Posts: 11,803
    edited 2011-07-17 07:31
    While a particular language may "lend" itself well to a particular class of application, the language ultimately has very little to anything regarding the success of the effort. It all comes down to the skill of the people managing and engineering the project. Poor process yields result inversely proportional to the complexity of the project. Neither C nor forth nor Erlang can save us from this. I don't know much, but I'm pretty sure about this process stuff.

    I agree with most of what you say. Still knowing the language is key to successfully using it :)
    I've zero interest in learning a language that is so different from all the others languages I know.
  • David BetzDavid Betz Posts: 14,516
    edited 2011-07-17 08:00
    jazzed wrote: »
    I agree with most of what you say. Still knowing the language is key to successfully using it :)
    I've zero interest in learning a language that is so different from all the others languages I know.

    I have almost the opposite opinion. Why bother learning a language that is almost the same as one you already know unless you *need* to learn it to get something done. The languages I find the most interesting are the ones that take a different approach and are nothing like what I'm used to.
  • prof_brainoprof_braino Posts: 4,313
    edited 2011-07-17 09:50
    Ariba wrote: »
    The Problem with an EEPROM benchmark is that the faster languages are too fast for the EEPROM. They need some delays in the read loop to not violate the max. EEPROM read clock. So the benchmark result is more for the EEPROM than for the language.

    Oh, I see now. Thanks for clarifying. I'll get the hang of this yet.
    I've zero interest in learning a language that is so different from all the others languages I know.

    Me, I'm just lazy. Although I might eventually try J, which is Ken Iverson's "corrections" to to his APL, this might be fun. But not today.

    As a benchmark, FFT is still too difficult for me, I haven't gotten my head around the fourier for dummies thread.
    Nick Lordi did Fast Hartly, supposed to be similar to FFT but doesn't use "complex number"; and is already posted in forth and spin. Is anybody care to take a stab at that?

    The fibonacci looks simple enough for me to try, I'm looking into that. I'm interested to compare to JDForth, which I considered but never tried due to costs.

    Oh, thanks for the Catalina number, that is really interesting.

    Since cnt is updated every cycle, would it be better to list the results in cycles instead of milliseconds or micro seconds? That way, we don't have to worry if somebody has a different crystal, etc?
  • jazzedjazzed Posts: 11,803
    edited 2011-07-17 11:49
    Since cnt is updated every cycle, would it be better to list the results in cycles instead of milliseconds or micro seconds? That way, we don't have to worry if somebody has a different crystal, etc?
    The standard crystal shipped on Parallax products is 80MHz. Let's stick with that.

    FIBO is interesting, but it is more a measure of stack performance than anything for recursive functions. Most "language people" here have declared it not very interesting, but it is much more interesting than loop 1000.

    I agree that FFT is difficult, but there are a few examples from SPIN/PASM that seem to work fine.

    It all comes down to being able to do something useful in a reasonable amount of time. If folks like Forth, then that's the language they will choose. Same goes for C or some variation of BASIC.
  • prof_brainoprof_braino Posts: 4,313
    edited 2011-07-17 15:41
    jazzed wrote: »
    It all comes down to being able to do something useful in a reasonable amount of time. If folks like Forth, then that's the language they will choose. Same goes for C or some variation of BASIC.

    Jazzed, you give the impression that someone is trying to change your mind. Nobody wants you to do anything you don't want to do, please stick to what you do best and are comfortable with.

    The purpose of this thread is to get a feel for what each tool is capable of. If a kid asks "Can I do X on the prop in basic?", it is handy to know that the overhead for just the loop is 33 ms. If a kid asks "can I do X in less than 4 microseconds on the prop? ", its handy to know that the overhead for the loop in pure assembler is 5 microseconds. Its all about being able to select the appropriate tool. In a case where no tool is declared the "favorite, forsaking all others", this might be useful information. But nobody wants to force anything you. Chill, baby! We're cool, dig the rebop.

    Nobody really cares what you tool you use personally, as long as you do neat stuff.
  • mindrobotsmindrobots Posts: 6,506
    edited 2011-07-17 15:43
    This was never intended to be a serious language benchmark. Just a fun little example to see the same code in a bunch of different languages.

    The various discussions that broke out were interesting (as always).

    Benchmarking is always challenging to make sure you are benchmarking the proper components and comparing like to like (PC way to avoid the apples/apples, oranges/oranges controversy).

    As jazzed stated, the goal is to write code to do something useful - loop 1000 is not useful and the ultimate optimization is to just not do it!

    As for learning languages, since I'm not a programmer to put bread on the table, I like trying/learning different language families to play and expand my tiny head!
  • RossHRossH Posts: 5,519
    edited 2011-07-17 15:50
    jazzed wrote: »
    FIBO is interesting, but it is more a measure of stack performance than anything for recursive functions.

    Fibonacci is mainly interesting for highlighting the fact that some prop compilers don't fully implement recursion. :frown:

    While proposing new benchmarks is fine, we should also point out for newcomers that we have already had several threads on this topic, with results available for many languages and compilers.

    We have results for all of the following benchmarks, which are (or perhaps "were" is more accurate) accepted as "industry standard":
    • Dhrystone
    • Whetstone
    • FFT
    For more details (and results) see here and here.

    Ross.
  • jazzedjazzed Posts: 11,803
    edited 2011-07-17 21:24
    Since FIBO and FFT have been mentioned ...

    Here are some new numbers for xbasic on C3 and SpinSocket-Flash modules.

    HUB memory 80MHz summary:
    • loop 1000 15ms
    • fibo(25) 6482ms
    • FFT (no PASM) 3101ms
    External C3* 1MB Flash 80MHz summary:
    • loop 1000 21ms
    • fibo(25) 8643ms
    • FFT (no PASM) 4574ms
    HUB memory 96MHz summary:
    • loop 1000 12ms
    • fibo(25) 5401ms
    • FFT (no PASM) 2584ms
    External SpinSocket 4MB Flash 96MHz summary:
    • loop 1000 17ms
    • fibo(25) 7202ms
    • FFT (no PASM) 3820ms

    SPIN running from HUB at 80MHz
    • loop 1000 9ms
    • fibo(25) 6069ms
    • FFT (no PASM) 1787ms

    FFT is heater_fft_2.0 with SPIN BUTTERFLIES

    *C3 has 1MB of code space, SS or SpinSocket-Flash has 4MB of code space.
    C3 Flash is simple SPI flash. SpinSocket-Flash is a byte-wide SPI flash.

    The xbasic language is designed for BASIC users to run large programs on cheap hardware.
    xbasic is a compiled BASIC language dialect that doesn't use line numbers.

    C3 and SpinSocket-Flash keeps programs in external Flash memory and data in HUB.
    Hardware variations are added to xbasic by specifying a compliant driver in the xbasic.cfg file.
    VM and driver files are loaded at startup and do not consume HUB code space.
  • prof_brainoprof_braino Posts: 4,313
    edited 2011-07-18 05:41
    RossH wrote: »
    Fibonacci is mainly interesting for highlighting the fact that some prop compilers don't fully implement recursion. :frown:

    I must doing Fibonacci wrong. I don't know how to implement recursion, though I have read that it is do-able in forth; usually there isn't enough memory to do much memory, which is the big reason why forth was chosen in the first place. What i was doing as similar to CarlJacobs:

    Start with two Fibonacci's (on the stack)
    Copy the top one
    Add the two original numbers
    Leave the result (on the top of the stack, with the previous top one under it)

    It only uses two stack memory locations, and calculates up to the 47th Fibonacci, which is the last one that fits in 32bits signed long.

    Am I doing this wrong?

    (Sorry, didn't get to post code or results, kids had to go to the pool)

    Also, do we really care about compiler benchmarks? I thought execution benchmarks were what matters. Or do I understand this incorrectly?
  • Heater.Heater. Posts: 21,230
    edited 2011-07-18 06:19
    The fibo test as we have seen here in C and Spin many times is purposely written to use recursion.
    That is to say for any element of the fibionacci series the fibo() function is going to add two numbers. Each one of those two numbers is itself a member of the fibonacci series so fibo() calls itself twice to get them. Until you get down to the beginning of the series and fibo() can just return 1.
    Of course fibo() can be written to not use recursion but that is not what we want here.
    Whilst this is a very recursive algorithm it doesn't not use such a huge amount of stack.
    I leave it as an exercise for the reader to determine how much stack fibo(26) uses, for example:)
  • Heater.Heater. Posts: 21,230
    edited 2011-07-18 06:28
    prof_braino,
    Also, do we really care about compiler benchmarks? I thought execution benchmarks were what matters. Or do I understand this incorrectly?

    I'm not sure what you mean here.

    When benchmarking compilers people sometimes worry about how quickly the compiler converts source code into executable binary. For large programs on slow machines that can be a real issue as anyone who developed software for early micros will remember and makes your development experience miserable.

    However here we don't care, Prop programs are small and the PC's we develop on are fast. I can't see anyone getting frustrated by the compile times.

    So what we are talking about is execution speed of the resulting binary on the target hardware.

    If my compiler generates twice as much code to do the same job as yours it may well execute my program half as fast on the same machine. So you may well want to benchmark your compilers. Can I do this application in C++ using GCC and the Zog interpreter, or do I need to go to LMM and Catalina C or do I need to go to some BASIC generating raw PASM or do I have to hand craft the PASM myself?
  • David BetzDavid Betz Posts: 14,516
    edited 2011-07-18 07:33
    jazzed wrote: »
    The xbasic language is designed for BASIC users to run large programs on cheap hardware.
    xbasic is a compiled BASIC language dialect that doesn't use line numbers.

    If anyone is interested in this strange xbasic language that Steve keeps talking about you can find it checked into Google Code. The project name is 'xbasic'. The loader currently only works under Windows because I haven't debugged the serial port code for Mac OSX or Linux yet. However, the compiler itself will run on any platform that supports an ANSI C compiler. I've even run it on the Propeller compiled with Catalina C but it was very slow because it requires external memory and the SPI flash/SRAM on the C3 is pretty slow.

    http://code.google.com/p/xbasic/
  • jazzedjazzed Posts: 11,803
    edited 2011-07-18 07:38
    Here's the SPIN FIBO code I've been using. Translate it to your favorite language.
    {{
    FiboTest.spin
    }}
    con
      _clkmode = xtal1 + pll16x ' 5MHz crystal
    ' _clkmode = xtal1 + pll8x ' 10MHz crystal
      _clkfreq = 80_000_000
    
    obj
    sx : "FullDuplexSerial"
    
    pub main
      sx.start(31,30,0,115200)
      waitcnt(_clkfreq+cnt)
      sx.str(string($d,"Spin FIBO Test",$d))
      repeat
        'sx.rx ' press enter to continue
        fibotest(0,25)
    
    pub fibotest(begin,end) | n, t1, t2
    
      repeat n from begin to end
       
        sx.str(string($d,$a,"FIBO "))
        sx.dec(n)
        
        t1 := cnt
        result := fibo(n)
        t2 := cnt-t1
    
        sx.str(string(" = "))
        sx.dec(result)
        sx.tx(" ")
        sx.dec(t2/(clkfreq/1000))
        sx.str(string("ms"))
         
        sx.tx($d)
         
    pub fibo(num)
      if (num < 2)
        return num
      else
        return fibo(num-1) + fibo(num-2)
    

    Here it is translated to xbasic.
    REM ===============================================
    REM basic fibo test
    REM ===============================================
    
    REM ------------------------------
    REM Need a deeper stack for fibo
    REM
    option stacksize=256
    
    include "print.bas"
    include "propeller.bas"
    
    def fibo(n)
        if n < 2 then
            return n
        else
            return fibo(n-1) + fibo(n-2)
        end if
    end def
    
    def test(count, endcount)
        dim result
        dim ms, ticks
        dim start
        dim passed
    
        do while (count <= endcount)
            print "FIBO( ";count;" )",
            start = CNT
            result = fibo(count)
            passed = CNT
            ticks = (passed-start)
            ms = ticks/(clkfreq/1000)
            print " = ", result, " ("; ticks; "  ", "ticks)", " ("; ms; " ", "ms)"
            count = count + 1
        loop
    end def
    
    
    REM ===============================================
    REM main program loop
    REM
    
    print
    do
        dim ticks
        ticks = CNT
        test(0,25)
        ticks = CNT-ticks
        print "Total run time: "; ticks/(clkfreq/1000); " ms"
    loop
    
    REM ===============================================
    

    My original xbasic fibo tests used a slightly different algorithm that produced much better results.
Sign In or Register to comment.