C/Spin benchmarks?

RossH · 2012-06-24 16:36

Hi all,

I'm looking for programs suitable for use in comparing the performance and code size of C vs Spin - i.e. any instances where the same algorithm has (as far as practical) been implemented in both C and Spin. I've begun working on a new Catalina kernel that could potentially reduce the code size of C programs much closer to the sizes typical of Spin programs - but at a performance cost over existing C compilers. I'd like to find some more real examples to test it out on before I decide if it is worth releasing.

I know of Heater's fft_bench (which is a great starting point) but I need more examples. If anyone knows of any candidates please post them here (or link to them). It may be a whole program or just an algorithm that has been ported from C to Spin (or vice versa).

Thanks,

Ross.

P.S. This thread is not about benchmarking between the various C compilers - there are other threads for that, and the tradeoff between speed and code sizes of the various C compilers (ICC, Catalina, Zog, GCC) is reasonably well understood. I want this thread to be specifically about Spin vs C. And at this stage, I'm actually much more interested in looking at code sizes rather than performance.

Cluso99 · 2012-06-24 20:04

Great work if you can pull it off Ross. I do think that most C programs are going to suffer from large code sizes compared with what we are used to with spin. A trade-off should surely be welcome.
Are you thinking along the lines of some form of interpreter for this?

pedward · 2012-06-24 20:22

Ross, I have to ask the obvious question here: Now that Parallax has entered Beta with their officially supported port of GCC, why do we have effort on 1 language spread across 4 compilers?

It would make sense to me to focus the effort on PropGCC and the Parallax SPIN compiler, since they are both open source and official code of Parallax.

This is no slight against you, but now what hole does Catalina fill? More importantly, why wasn't Parallax's efforts focused around Catalina, since it was already there? That isn't so much a question of Parallax, but of you, why didn't they consider it for the official compiler?

Sorry to take your thread off-topic, but I'm wary of confusing newcomers and duplicating effort, to what end?

jazzed · 2012-06-24 21:34

I'm interested in this because we've considered doing a code compression back-end for several months - a CMM mode for lack of a better term. Maybe you have thought about it too. If that's not your approach, that's Ok.

You can use Eric's spin2cpp converter if you don't mind re-purposing the resulting code. The C++ness of the output can be easily replaced with normal C in most cases with a simple sed script or regex in vim.

For example, the hello program is translated directly to the .cpp below. The FullDuplexSingleton.spin is translated function by function (not shown here). The one drawback so far with spin2cpp is that all files need to live in the same directory.

''
'' Hello World
''
con ' clock setup
  _clkmode = xtal1 + pll16x
  _clkfreq = 80_000_000

obj
  sx : "FullDuplexSingleton"

pub main | n
'' program main entry point

  waitcnt(CLKFREQ+cnt)
  sx.start(31,30,0,115200) ' start console at 115200 for debug output

  repeat
    sx.str(string("Hello world "))
    sx.dec(n++)
    sx.tx($d)
    waitcnt(CLKFREQ+cnt)

#include <propeller.h>
#include "hello.h"

int32_t helloSpin::Main(void)
{
  int32_t result = 0;
  int32_t       N;
  waitcnt((_CLKFREQ + _CNT));
  Sx.Start(31, 30, 0, 115200);
  while (1) {
    Sx.Str((int32_t)"Hello world ");
    Sx.Dec((N++));
    Sx.Tx(13);
    waitcnt((_CLKFREQ + _CNT));
  }
  return result;
}

Oldbitcollector (Jeff) · 2012-06-24 21:43

@jazzed,

request: an option to compress/optimize code without removing comments. (Like Phil's clean & BST)

OBC

Heater. · 2012-06-25 00:11

RossH,
Don't we always start these benchmark quests with the good old recursive FIBO. It's a crappy benchmark in many ways but has become something of a tradition.
A major challenge might be to get the C equivalent of one of the Spin FAT file systems down to Spin size.

Sapieha · 2012-06-25 01:19

Hi pedward.

It was "silly", "stupid", "foolish" thing anyone be able to say.

It was that talk that stopped Brad from more working on BST

pedward wrote: »

Ross, I have to ask the obvious question here: Now that Parallax has entered Beta with their officially supported port of GCC, why do we have effort on 1 language spread across 4 compilers?

It would make sense to me to focus the effort on PropGCC and the Parallax SPIN compiler, since they are both open source and official code of Parallax.

This is no slight against you, but now what hole does Catalina fill? More importantly, why wasn't Parallax's efforts focused around Catalina, since it was already there? That isn't so much a question of Parallax, but of you, why didn't they consider it for the official compiler?

Sorry to take your thread off-topic, but I'm wary of confusing newcomers and duplicating effort, to what end?

Javalin · 2012-06-25 03:56

Hiya Ross,

These were my numbers ages ago, comparing spin to Imagecraft's C.

http://javalins.wordpress.com/2009/09/04/propeller-languages-speed-comparison/

I thought it was an interesting comparison. I haven't used your complier though.

James

RossH · 2012-06-25 03:58

Hi Jazzed,

The spin2cpp tool is a good idea - thanks. But it does have an inherent bias in that it is self-selecting for programs already hand-crafted to be efficient in Spin - converting them to C would not use much of the C language, since C is a considerably larger language than Spin. To balance it out, I would also need at least some examples of programs that were translated the other way. Hence this thread.

Hi Cluso (& also Jazzed)

You could call the approach I'm taking a "hybrid" kernel. I suppose all LMM kernels are hybrid in the sense that they are all at least partly interpreted - i.e. they implement "primitive" or "pseudo" instructions using various techniques that extend the basic PASM operations (e.g. to manage the stack). The important thing is to find the right balance. I'll be prepared to go into more detail when I have some actual results to report, but at the moment all I can say is that it looks promising. My current rule-of-thumb is that I probably won't bother persevering with it unless it can execute C more than twice as fast as Spin, using less than twice the code size (any better than that would be a bonus). As you are probably aware, this is a significantly differerent time/space tradeoff to what any of the current C compilers are designed to achieve.

Hi pedward,

You'd have to ask those questions of Parallax, not me. As it stands, I think Catalina offers a solution for C (not C++) that is still unmatched by GCC - although I am sure GCC will catch up eventually. As to what role Catalina fills - well, I think it is still the most practical option if you want to use a language other than Spin on a bare Propeller (i.e. no XMM), or program using a mix of Spin and C - and I have already said elswehere that I intend to move it further in that direction (so the GCC team need not feel threatened!). While I certainly won't be abandoning XMM (I also have a new XMM kernel in development, but I'm waiting to see if this idea pans out first, since I would probably also redesign that kernel if it does). But XMM on the Propeller is very much like "expanded memory" was on the old 8086 DOS systems - i.e. a temporary expedient we had to put up with until something better came along.

Hi Heater,

Yes, I also use FIBO - but in any language FIBO only needs about 3 operations (i.e. call, add, compare) - it just executes them over and over and over. It is not really a good benchmark for comparing diverse languages. But the file system idea is a good one - we have DOSFS (written in C) and several Spin alternatives. None of the Spin ones are exactly analogous to DOSFS, since they all tend to include large chunks of hand-coded PASM - but I seem to recall there is at least one that is implemented mostly in Spin to save space. Perhaps someone may remember which one that was?

Hi sapieha,

Don't worry - if I was thin-skinned I would have abandoned these forums long ago

Ross.

RossH · 2012-06-25 04:01

Javalin wrote: »

These were my numbers ages ago, comparing spin to Imagecraft's C.

Thanks James - I had a quick look, but couldn't find your actual Spin or C code. Can you point me to it?

Heater. · 2012-06-25 04:24

pedward,

...why do we have effort on 1 language spread across 4 compilers?

We do not have effort on one language spread across 4 compilers. And I presume you mean the list in first post. I only see two real contenders.

The ImageCraft ICC compiler is discontinued and unsupported.

Zog is not a compiler but rather a virtual machine/emulator for running ZPU byte codes on the Propeller. It was something I just had to do because the ZPU architecture is very small and easily fits in a COG's worth of PASM and because there was an existing GCC target for it. ZPU byte codes turn out not to produce so small executables as I would like and are about the speed of Spin. Although I still think there are use cases for it, like when running from serially connected FLASH/EEPROM. Anyway I have not had much time to work on Zog for a long while so it's not really in the running against Catalina and propgcc.

So your question is really "Now that propgcc is here why don't you abandon Catalina and work on propgcc instead?"

That is rather an impertinent question. Can you really expect someone who has spend thousands of hours over a number of years working on a project to just give up on it? As you see from this thread Ross has plans to push it ahead and give GCC a run for it's money.
One killer for C at the moment is the large size of it's binaries. You cannot fit as much functionality into a Prop with LMM code as you can with Spin byte codes. Looks like Ross has a plan for that.

why wasn't Parallax's efforts focused around Catalina, since it was already there?

I can't speak for Ross or Parallax but I can only imagine that there is a feeling in many circles that GCC is the "professionals" choice. This is more a gut marketing thing than a solid engineering decision. One might also surmise that there are more GCC developer gurus around than there are for other compilers.

...but I'm wary of confusing newcomers and duplicating effort, to what end?

Welcome to the opensource world. Why do we have hundreds of different Linux distributions, or you can use BSD, why do we have Gnome and KDE and so on and so on.

potatohead · 2012-06-25 04:43

In open computing land, there is an implied trust in the idea that people will sort themselves out.

Reading this with interest. We have a lot of "run it from anywhere but the HUB" options now. Good for larger projects and likely good for higher complexity objects too, depending on the complexity and how it manifests in terms of code required.

SPIN + PASM is really great on a bare Prop, and code size is the primary reason why that is so. A bare Prop will continue to be an entry scenario for the foreseeable future. Simple cost will drive this, as will the variety of external storage options. Not everybody will select an SD card, EMM, XMM, etc...

This is a great discussion, and I think having it here perfectly answers the question posed earlier.

@Ross, would something like a line drawing routine be of use? Mostly shifts, adds, writes to HUB and decisions. I've got some VGA C graphics code that I ported selected bits to SPIN for a project. The SPIN port isn't optimal in any way, it just works.

RossH · 2012-06-25 04:57

potatohead wrote: »

@Ross, would something like a line drawing routine be of use? Mostly shifts, adds, writes to HUB and decisions. I've got some VGA C graphics code that I ported selected bits to SPIN for a project. The SPIN port isn't optimal in any way, it just works.

Sure - just post it (or point me to it) and I'll have a look. At the very least, I can extract the C and Spin implementations of the algorithm and use them to compare code sizes.

Ross.

Heater. · 2012-06-25 05:27

Another small offering might be my random number generators JKISS32 and MWC256. Lots of nice typical mcu operations in small functions. Packages contain C and Spin versions.

JKISS32: http://forums.parallax.com/showthread.php?136975-JKISS32-High-quality-PRNG-with-RealRandom-seed-save-a-COG.

MWC256: http://forums.parallax.com/showthread.php?137118-MWC256-Very-high-quality-PRNG-with-RealRandom-seed-save-a-COG.

The later contains both routines.

RossH · 2012-06-25 05:48

Heater. wrote: »

Another small offering might be my random number generators JKISS32 and MWC256. Lots of nice typical mcu operations in small functions. Packages contain C and Spin versions.

JKISS32: http://forums.parallax.com/showthread.php?136975-JKISS32-High-quality-PRNG-with-RealRandom-seed-save-a-COG.

MWC256: http://forums.parallax.com/showthread.php?137118-MWC256-Very-high-quality-PRNG-with-RealRandom-seed-save-a-COG.

The later contains both routines.

Thanks Heater. You don't by any chance have any C algorithms converted to Spin that exercise (at least in the C version) things like arrays, pointers, structures, unions or bit fields? Anything with switch statements or conditional expressions? I wonder if Dave's C to Spin converter handles these things?

Ross.

Dave Hein · 2012-06-25 06:02

RossH wrote: »

Thanks Heater. You don't by any chance have any C algorithms converted to Spin that exercise (at least in the C version) things like arrays, pointers, structures, unions or bit fields? Anything with switch statements or conditional expressions? I wonder if Dave's C to Spin converter handles these things?

CSPIN handles one-dimensional arrays, pointers with up to 2 levels of indirection (i.e., char ** argv), and structures. It doesn't handle unions or bit fields. It can convert switch statements, where each case is terminated with a break statement. It handles conditional expressions to a certain extent. && and || are converted to Spin's AND and OR operators, which aren't exactly the same as the C operators. In C, the expression evaluation stops when the rest of the expression doesn't need to be computed. Spin always evaluates the entire expression.

Ross, let me know if you want me to try converting some C programs to Spin.

Heater. · 2012-06-25 06:33

Not much else to offer at this time. MCW256 does tweaks with arrays at least.

jazzed · 2012-06-25 08:46

RossH wrote: »

The spin2cpp tool is a good idea - thanks. But it does have an inherent bias in that it is self-selecting for programs already hand-crafted to be efficient in Spin - converting them to C would not use much of the C language, since C is a considerably larger language than Spin. To balance it out, I would also need at least some examples of programs that were translated the other way. Hence this thread.

I thought you were only looking for a one-to-one correspondence. Ya, C is more complicated, and there will never exists a Spin program that would use all of C's features. Still, I don't follow the logic completely because theoretically at least whatever is written for spin can be translated to C++ (and C with simple manipulations). I accept that it doesn't fit your needs - it's your project after all.

Heater. · 2012-06-25 08:52

At some point in time I had a prototype ZPU interpreter written in Spin. I also have one written in C. If I could find it that would closer fit your test requirements for similar codes using lots of operators, switch, if, etc.
But I suspect running GCC compiled code for the ZPU under Catalina might cause some kind of time space inversion and Ross would disappear in singularity.

ersmith · 2012-06-25 09:17

Jazzed has already mentioned spin2cpp as an easy way to convert .spin to C++. One of these days I'd like to add an ANSI C output option to it (so the resulting code can be compiled with Catalina) but for simple classes it's pretty straightforward to convert by hand or with simple scripts. The spin2cpp project is hosted on Google code now at:

http://code.google.com/p/spin2cpp

In my experience the converted C++ code tends to be about 4x bigger than Spin and about 4-8x faster when compiled with gcc.

As you've already mentioned Dave Hein has a C to Spin converter to go the other way.

Dave Hein · 2012-06-25 09:41

There was a thread a while ago where we compared speed and size for various programs. The thread is at http://forums.parallax.com/showthread.php?124168-Compiler-Benchmarks&highlight=benchmark . It included programs such as xxtea, drhystone 1.1, euler and fibo.

RossH · 2012-06-25 15:39

ersmith wrote: »

Jazzed has already mentioned spin2cpp as an easy way to convert .spin to C++. One of these days I'd like to add an ANSI C output option to it (so the resulting code can be compiled with Catalina) but for simple classes it's pretty straightforward to convert by hand or with simple scripts. The spin2cpp project is hosted on Google code now at:

http://code.google.com/p/spin2cpp

In my experience the converted C++ code tends to be about 4x bigger than Spin and about 4-8x faster when compiled with gcc.

Yes, those numbers are about what I see with Catalina as well - they're kind of "inherent" in typical LMM type solutions. Originally I thought those numbers would be "good enough" to encourage more C use, but I'm now beginning to wonder. Like most people, I keep finding that when I try and take advantage of the faster execution speeds C offers, I keep running out of RAM. That's why I'm wondering whether other tradeoffs might be more useful.

Ross.

RossH · 2012-06-25 16:02

jazzed wrote: »

I thought you were only looking for a one-to-one correspondence. Ya, C is more complicated, and there will never exists a Spin program that would use all of C's features. Still, I don't follow the logic completely because theoretically at least whatever is written for spin can be translated to C++ (and C with simple manipulations). I accept that it doesn't fit your needs - it's your project after all.

If you limit yourself to programs that exhibit a 1-1 correspondence then the answer (at least on on code sizes) is always going to be Spin.

Ross.

RossH · 2012-06-25 16:09

Dave Hein wrote: »

There was a thread a while ago where we compared speed and size for various programs. The thread is at http://forums.parallax.com/showthread.php?124168-Compiler-Benchmarks&highlight=benchmark . It included programs such as xxtea, drhystone 1.1, euler and fibo.

Hi Dave,

Thanks! I'd forgotten you posted a version of euler in Spin. I don't think there was a Spin version of xtea or dhrystone version though.

Ross.

Kye · 2012-06-25 16:20

GCC was chosen because it is considered a professional free C complier.

Many people who ask about the propeller ask if it supports GCC. Not to say there's anything wrong with Catalina, but, the name does not get as much respect as GCC. Rather than trying to fight an uphill battle, Parallax choose to go with the flow for the C compiler.

Not to say the propeller chip's architecture goes with the flow...

Anyway,

pedward · 2012-06-25 16:43

The code density of SPIN is obvious, because it's a byte code and PASM is long, so right away the simple operations are 4X bigger in PASM.

According to Parallax, SPIN can execute 500,000 SPIN instructions per second, so 4x is 2mil and 8x is 4mil. Obviously PASM is 20MIPs, all numbers are dependent on use though.

That means compiled C is 5x to 10x slower than PASM, assuming LMM vs COG as the issue.

If you could implement a faster interpreter, you could probably find a happy middle ground between LMM and code density for speed. You might end up with 4x code density and 1/20th or 1/10th the speed of COG PASM.

RossH · 2012-06-25 17:43

pedward wrote: »

The code density of SPIN is obvious, because it's a byte code and PASM is long, so right away the simple operations are 4X bigger in PASM.

According to Parallax, SPIN can execute 500,000 SPIN instructions per second, so 4x is 2mil and 8x is 4mil. Obviously PASM is 20MIPs, all numbers are dependent on use though.

That means compiled C is 5x to 10x slower than PASM, assuming LMM vs COG as the issue.

If you could implement a faster interpreter, you could probably find a happy middle ground between LMM and code density for speed. You might end up with 4x code density and 1/20th or 1/10th the speed of COG PASM.

Exactly. My current thinking is that C on the Propeller is not getting the traction it might because it is currently "falling between two stools" - it is neither fast enough to be attractive for those applications that require speed, nor small enough to be attractive for those applications that require complex algorithms.

Perhaps there is some happy "middle ground". In this thread I'm just exploring possible alternatives.

Ross.

pedward · 2012-06-25 18:07

Basically sounds like you are talking about retargetting to a middleground bytecode and implementing an interpreter.

jmg · 2012-06-25 18:20

RossH wrote: »

Exactly. My current thinking is that C on the Propeller is not getting the traction it might because it is currently "falling between two stools" - it is neither fast enough to be attractive for those applications that require speed, nor small enough to be attractive for those applications that require complex algorithms.

Perhaps there is some happy "middle ground". In this thread I'm just exploring possible alternatives.

Ross.

Isn't the best approach there, one that allows small sections of C, to create PASM, rather than byte-code equivalents.
( I think GCC port is following this already ?)

- or a least, this should be part of any solution.

If there is another switch choice that allows memory to 'go further', at the cost of some speed, that is also useful.

Is the memory you run short of, Code or Data memory ?

If it is code, then QuadSPI memory (even DDR) could remove that barrier ?

Dave Hein · 2012-06-25 19:57

RossH wrote: »

Thanks! I'd forgotten you posted a version of euler in Spin. I don't think there was a Spin version of xtea or dhrystone version though.

Here's the spin version of the dhrystone 1.1 program. I'll look for the xtea code tomorrow.

Ariba · 2012-06-25 22:50

RossH wrote: »

... But the file system idea is a good one - we have DOSFS (written in C) and several Spin alternatives. None of the Spin ones are exactly analogous to DOSFS, since they all tend to include large chunks of hand-coded PASM - but I seem to recall there is at least one that is implemented mostly in Spin to save space. Perhaps someone may remember which one that was? ....

The FSRW file drivers is originally written in C and then translated to Spin. The C code is also in the object's ZIP. So this may be good to compare the code size. But the performance is a bit harder to measure, because it's mainly the SPI driver which defines the SD access speed (and with fast SPI code, also the SD card) and not the file driver.

Andy

C/Spin benchmarks?

Comments