Shop OBEX P1 Docs P2 Docs Learn Events
Benchmarks — Parallax Forums

Benchmarks

hippyhippy Posts: 1,981
edited 2008-01-04 19:57 in Propeller 1
How slow is Spin compared to PASM ? A question which crops up fairly regularly directly or indirectly and it would be nice to have some reasonably definitive, meaningful, ballpark answer which could be agreed on, even if there is no absolute answer.

Does anyone have any ideas on creating a sensible to use benchmark for the Propeller chip, suite of benchmarks or is interested in doing any of that ?

There's nothing available in Spin which isn't available in PASM, such as multiply or floating point ( all ultimately use similar PASM code to achieve the same ) so we are not necessarily looking at which delivers the most FLOPS or is fastest with math ... or maybe we are ?

I believe we need one or more fairly straight forward ( that is simple to implement ) programs which test a wide range of operations standalone so they can run on any Propeller development platform.

My thought is to choose such a program, code it functionally ( without fine-tuning ) in Spin, then translate line-by-line to PASM with no real optimisation. Finally optimise both the Spin and PASM as much as possible. Optimisation can continue after the first round of results but we will always have a base un-tuned Spin and PASM to reference against. Those references can also be used as benchmarks for other Propeller programming languages, again line-by-line and optimised conversions.

From a PASM perspective, hub access and register-indirect are more expensive than other operations so should be a necessary part of any benchmark test but a PASM program shouldn't be deliberately crippled by those. Having to fit PASM in the 496 longs of Cog memory is a limitation for any benchmark.

Benchmarking is not my field so any thoughts appreciated.

Comments

  • Ym2413aYm2413a Posts: 630
    edited 2008-01-03 19:42
    I'm interested in trying this out as well. : )
    A good benchmark is really a bunch of tests.

    Dhrystone 1.1
    Whetstone
    Sieve

    And maybe the time it takes to do things like Sine, Add, Square Root and Multiply.
    With standard tests... We could even compare with other processers and interpreters!

    A good test between SPIN and PASM would make use of the math functions and memory functions together. [noparse]:)[/noparse]
    Maybe by builting a lookup-table of prime numbers would be a good place to start.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Share the knowledge: propeller.wikispaces.com
    Lets make some music: www.andrewarsenault.com/hss

  • deSilvadeSilva Posts: 2,967
    edited 2008-01-03 20:10
    Benchmarking is a part of computer science! It is also the most bareface lie in marketing. Hippy might felt it himself when - between the lines - he argued not to penalize any of the candidates... A good start for a benchmark smile.gif

    When you want to have a look I used the computation of the 29. fibonacci number as a reference for a very SPIN friendly benchmark link http://forums.parallax.com/showthread.php?p=601870 near the end if page 2...
    It took some time to implement that in PASM so it would not meet Hippie's constraints...

    Just adding "synthetic constructs" is of very little use.. but this my - though not mine alone - opinion.

    A much more useful approach is "natural benchmarking", i.e. defne a real problem constructed around one (or more) parameters and analyse the parameter space is accessable by a certain implementation. The simplest parameter is run time, but used memory is nearly equally important. You can do most unbelievable things with masses of memory, e.g. sort in linear time... The best known application for it is loop unrolling....

    o.k. I should suggest to define 3 or 4 of such real problems, may be things already available.
    (1) Asynchronious serial transmission: Parameter: Bits/Second in transmission and reception.
    (2) Calculate fibo(29) using a recursive procedure: Parameter: Time
    (3) Remove all spaces from a given text (around 10kB): Parameter: Time
    (4) Substitute in a given text ("template") all occurances of a special pattern (e.g. all numbers) by a given set of different strings, longer and shorter as the original pattern.

    Post Edited (deSilva) : 1/3/2008 8:16:55 PM GMT
  • VIRANDVIRAND Posts: 656
    edited 2008-01-03 20:28
    I just started wondering... Spin interpreter has no terminal console, so why is it even there?
    Why doesn't PropIDE assemble Spin into PASM instead of interpreted bytecodes if it does the same thing much slower?
  • deSilvadeSilva Posts: 2,967
    edited 2008-01-03 20:35
    @VIRAND
    Can you put this into the respective thread please.
  • hippyhippy Posts: 1,981
    edited 2008-01-03 21:09
    deSilva said...
    When you want to have a look I used the computation of the 29. fibonacci number as a reference for a very SPIN friendly benchmark link http://forums.parallax.com/showthread.php?p=601870 near the end if page 2...
    It took some time to implement that in PASM so it would not meet Hippie's constraints...

    I think that's a good starting point. Simple / time to implement was really a concern of "don't expect us 'math amateurs' to implement CORDIC functions from first principles" etc. If it's too complex most players walk away from the field. A simple to follow algorithm to implement is I think what really counts, and that captures the essence precisely.

    It uses register-indirect ( self-modifying code ) and I like the fact it uses a software stack. Simply using iterative methods would be equally valid but only if applied to both Spin and PASM, that's a "fairness" I think we would agree on, compare like with like.

    "Line-by-line" wasn't entirely what I meant in practice either, more how a programmer would take an algorithm and implement it for the language used, without spending ages over optimisation. If a PASM programmer would do a set of Spin statements as one instruction then that's fine by me if it's an obvious way of doing it. A line-by-line reference version was really an idea for trying to take out any subconscious tendency of a good PASM coder to use complicated tricks others would not automatically be familiar with.

    Perhaps another way of looking at it, more fun than cold benchmarking, is "here's a basic algorithm implemented as a reference, make it as fast as possible in Spin and/or PASM".
  • deSilvadeSilva Posts: 2,967
    edited 2008-01-03 21:14
    That's what the pattern matching is for: lots of options smile.gif

    Edit: This will be the advantage of the LMM. As PASM is limited to 300 instructions or so it has to be algorithmically simple. Not so an assembly program with 3000 instructions....

    Post Edited (deSilva) : 1/3/2008 9:19:43 PM GMT
  • deSilvadeSilva Posts: 2,967
    edited 2008-01-03 21:24
    Just a thought: What makes synthetic benchmarks mostly useless is the twighlight zone of explicite or implicite "constraints".
    An example: SPIN has 32 bit arithmetic and under no circumstances will do less. Depending on the intelligence of the algorithm (but because of memory constraints there can't be any) this will take 3 to 4 times longer than an unrolled 16x16 multiplication or 32/16 division. Given a problem solvable with 16 bit arithmetic. Should there be a handicap?
  • LeonLeon Posts: 7,620
    edited 2008-01-03 21:46
    Byte mag used to use a Sieve of Eratosthenes prime number generation program as a benchmark, primarily because it could run on virtually anything. It seems to be quite suitable to me for comparing Spin and PASM.

    Leon

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Amateur radio callsign: G1HSM
    Suzuki SV1000S motorcycle
  • mirrormirror Posts: 322
    edited 2008-01-03 22:00
    SimpleSerial = claims operation to 19200 and is half duplex.
    FullDuplexSerial = claims operation to 230400 and is full duplex.

    So, for this example a naive speedup is 2 * 230400/19200 = 24.
  • deSilvadeSilva Posts: 2,967
    edited 2008-01-03 22:17
    @Hippy:
    I only just read your last posting in the LMM thread about benchmarking and your noticed instability at even small changes. The theory can be put quite simple.

    We are interested in two elements
    * instruction execution time ('et')
    * instruction performance/usefulness ('p')

    You can visualize that easily in 4-quadrant sketch (to be inserted somewhen ...)

    The obvious optimization strategy is to avoid instructions of low 'p'/'et' .... if you can smile.gif
    The both successful strategies for machines (be it H/W or S/W) are called CISC (increasing 'p' more than 'et') and RISC (decreasing 'et' more than 'p').

    The benchmark results will shift considerably when you manage to include high 'p' instructions in the SPIN algorithm (e.g. LOOKDOWN, or really needed 32-bit multiplication) or you force PASM to wait for the HUB. This is trivia of course.

    But synthetic benchmarks tend to be quite susceptible to such effects; small benchmark suites can even become systematically biased...

    I shall put this to an extreme point: You made this suggestion some time ago to "configure" the VM according to the needed instructions. This is a step in the right direction. But you can go further! You GENERATE the most time consuming computations for each specific program. In fact that is exactly what we today do by hand:" Ah I need serial communication - I incorporate FullDuplexSerial." This can also be done on micro-level:
    Compiler: "Oh, I notice this user keeps adding 4 in many places - I best generate a PLUS4 instruction for him this time."
    This is nothing more but "advanced optimization for dynamic VMs" (TM by deSilva)
    So what when the compiler detects that the user wants to compute the Fibonacci numbers? Well, it can include the fastest algorithm. And when it detects the user needs the 29. Fibonacci number? Well, it can include that too smile.gif

    So it is very difficult to decide what a "fair" benchmark shall be in that cases.

    Don't laugh! There had been a time when machines were build to best perform for specific synthetic benchmark suits. And this time is by no means over!

    Post Edited (deSilva) : 1/3/2008 10:52:07 PM GMT
  • John AbshierJohn Abshier Posts: 1,116
    edited 2008-01-03 22:20
    My recommended benchmark suite
    1. Serial coms (bits/second)
    2. Parse GPS strings (strings/second)
    3. Calculate bearing and distance from point A to point B.
    4. Number to string (numbers/second)

    The best benchmark is the program you are going to use or at least significant parts of it. The above are some related to robotics. A problem or at least an item to consider is not only speed but COG usage. Is the extra speed worth an extra cog?
  • KaioKaio Posts: 253
    edited 2008-01-03 23:11
    John,

    if you use PASM you need always an extra Cog to run the assembly code and uses a small Spin interface to communicate with your assembly routines. So I think that could be in same cases a disadvantage for PASM. But it should not be relevant on a comparison with Spin.
  • deSilvadeSilva Posts: 2,967
    edited 2008-01-04 01:22
    Kaio said...
    .. uses a small Spin interface to communicate with your assembly routines.
    I think this is not a useful point of view. You communicate with your assembly routines through cells in the HUB memory - you are "data flow coupled", there is no flow of control (except stopping the whole COG).

    So how you access the cells is your own matter. Other assembly COGS will use machine code. When you are using a SPIN programm it uses SPIN - what else.

    Don't be fooled by the fact that there is some SPIN code in the Object where the machine code is located. This has different reasons

    Post Edited (deSilva) : 1/4/2008 8:24:16 PM GMT
  • Ym2413aYm2413a Posts: 630
    edited 2008-01-04 19:57
    I think it sould be cool to use a small set of standard tests and also compare with other processors and microcontrollers. [noparse]:)[/noparse]
    It would be neat to see where a COG stands up next to a say... z80 or z380, 68000, 386, SX etc.
    Those would be interesting and fun numbers!

    Also Spin could be compared to Basic and other interpreter systems. [noparse]:)[/noparse]

    It's apples and oranges, but still they're all processors so there has to be a way to compare them all on the same grounds. [noparse];)[/noparse]
    Guess it would make for some fun.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Share the knowledge: propeller.wikispaces.com
    Lets make some music: www.andrewarsenault.com/hss

    Post Edited (Ym2413a) : 1/4/2008 8:02:52 PM GMT

Sign In or Register to comment.