Compiler Benchmarks
Dave Hein
Posts: 6,347
Compiler benchmarks are being discussed in another thread, so I thought I would start up a new thread about it.· A simple "Hello World" benchmark was proposed to determine the executable size of a small program.· The source for this benchmark is
I compiled this with CSPIN 0.060 and CLIB 1.0.1 using BST.· The resulting size was 4,336 bytes.· Anybody care to try the other C compilers?· It would also be interesting to get results for Spin and PropBASIC versions of this benchmark.
Note: Very small versions of this program could be hand crafted to produce 100 or 200 bytes.· The goal is to determine the size of the program with the normal development environment.
Post Edited (Dave Hein) : 7/25/2010 1:33:55 AM GMT
#include <stdio.h> void main(void) { printf("Hello World\n"); }
I compiled this with CSPIN 0.060 and CLIB 1.0.1 using BST.· The resulting size was 4,336 bytes.· Anybody care to try the other C compilers?· It would also be interesting to get results for Spin and PropBASIC versions of this benchmark.
Note: Very small versions of this program could be hand crafted to produce 100 or 200 bytes.· The goal is to determine the size of the program with the normal development environment.
Post Edited (Dave Hein) : 7/25/2010 1:33:55 AM GMT
Comments
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
I presume you're talking about final binary size, and not just code segment size? Out of the box, with no optimization, Catalina's closest equivalent to what your CSPIN program does produces a binary file of 7576 bytes.
However, while this appears to support my contention that LMM programs could end up taking 2 times (or less) of the equivalent byte code program size, I should say I don't think this is a true comparison. The problem is that neither program is 'genuine' since neither one is using a real version of stdio. If we were to insist on only comparing programs that do that, then Catalina can do so (so can ICC) but CSPIN cannot.
If I instead compile this program to use a true ANSI C standard 'stdio' instead of a reduced functionality version it generates a binary of about 13k - but only about 6k of this is actual program and library code (the rest is initialization of the kernel and the various drivers). But before you say 'of course this must be included' I should also add that if I use Catalina's EMM loader, that means this program really does only use only 6k of the Propeller's 32k hub RAM at run time, leaving 26k for more program code.
My point is that it's not easy to come up with a simple comparison that actually means anything.
Ross.
I think the main point of the "Hello World" benchmark is just to get an idea of the size of a minimal program using printf.
which I assume does the same as your C program. Is 33 LONGS according to BST, it generates the follow code:
·
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Use BASIC on the Propeller with the speed of assembly language.
PropBASIC thread http://forums.parallax.com/showthread.php?p=867134
March 2010 Nuts and Volts article·http://www.parallax.com/Portals/0/Downloads/docs/cols/nv/prop/col/nvp5.pdf
NEW PropBasic Blog: http://propbasic.blogspot.com
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
There are two rules in life:
· 1) Never divulge all information
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
If you choose not to decide, you still have made a choice. [noparse][[/noparse]RUSH - Freewill]
Post Edited (Bean) : 7/25/2010 3:34:29 AM GMT
Very nice! But you are also not comparing apples with apples since your program does not use 'printf' (or anything even vaguely equivalent). Also, you are using a serial device, whereas Dave and I are including a general purpose local display driver (at least I think Dave's example does - my example certainly does). I can reduce my binary size by a couple of kilobytes just by using a simpler serial I/O driver instead.
This just highlights the problem. All we are doing in this thread is comparing the 'minumum' initial program footprint of various approaches, not the actual program size. However, I'm not complaining - this is exactly what the simple 'hello world' type benchmarks are typically used for. On this basis, I would expect Catalina's minumum footprint to be larger than most others. Does it matter? Not in the slightest!
Ross.
I just thought I'd throw it out there because Dave brought up PropBasic.
Just for the record, the same program as LMM uses 75 instructions total. (I'm pretty sure I heard that all the C compilers generate some kind of LMM. Yes ???)
P.S. Oh yeah, I forgot the CR.
Bean
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Use BASIC on the Propeller with the speed of assembly language.
PropBASIC thread http://forums.parallax.com/showthread.php?p=867134
March 2010 Nuts and Volts article·http://www.parallax.com/Portals/0/Downloads/docs/cols/nv/prop/col/nvp5.pdf
NEW PropBasic Blog: http://propbasic.blogspot.com
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
There are two rules in life:
· 1) Never divulge all information
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
If you choose not to decide, you still have made a choice. [noparse][[/noparse]RUSH - Freewill]
It's probably more meaningful to find the smallest solution that does nothing except print Hello World.
No FDS, no floating point, no filesystem. It would be easier to understand that way and not too off topic.
I value the convenience and power of C printf especially in larger programs where it makes a difference.
Congrats to Bean
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
It matters about as much as it ever did when people used to do this kind of comparison of various languages or compilers on a PC. Someone would always jump in with a perl or python script (or a lisp, forth or basic program) and claim that their 'program size' was only 20 bytes whereas a compiled C program was more like 20 kilobytes.
It didn't mean anything then, and it doesn't mean anything now. I'm happy to have these results posted but most people will realize they don't measure anything meaningful
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
Either you are simply intent on misconstruing what I am saying, or you really don't understand. Assuming the latter, I will try and explain it one last time.
All I have said (several times) is that I now expect LMM programs to be around twice the size of byte code programs - perhaps even slightly less with a good optimizing compiler. I used to say 'between two and four times', but my opinion on this has recently changed as I have now had the chance to compare some actual examples of byte codes generated by Dave's CSPIN compiler with LMM code generated by Catalina. This is discussed in the Catalina thread, with examples. I'm not claiming all the evidence is 'in' yet, but no-one has yet offered much evidence to the contrary, so I think my claim is prefectly reasonable - at least until we have the opportunity to do a true comparison by compiling exactly the same C code with both an LMM C compiler and a byte-code C compiler.
However, for some reason you continually try to intepret this as me claiming that 'size does not matter'. I'm not sure why you insist on this, but I think I may have at least managed to track down the comment to which you may be referring: I think that comment is perfectly reasonable - some people insist on worrying exactly how much C code you can fit onto a Propeller, when the people we need to be attracting to the Propeller probably couldn't care less - if the Propeller offers C it will be used by them where it is cost effective to do so. If the propeller doesn't offer C (or offers only a non-standard version of it) it will not be used at all. This is the sense in which size does not really matter.
Can you see the difference?
Getting back to the subject of this thread, I don't think anyone would seriously claim 'hello world' as a decent method of comparing code size unless both compilers also use the same C library functions and they were also programmed in C - otherwise you are effectively trying to compare the relative code sizes by looking at a single line of code that makes a single function call, with a single parameter. How sensible is that?
However, 'hello world' can be a decent method of assessing 'minimal program footprint' - but only if you also insist that all programs use at least 'functionaly equivalent' libraries and drivers - otherwise it is not even useful for that - which is why it's more commonly regarded with amusement than as a serious benchmark (see stackoverflow.com/questions/284797/hello-world-in-less-than-20-bytes).
Ross.
Minor change to the syntax "void main()" instead of "void main(void)"
384 bytes in SBASIC.
39 bytes (hex=27) in Z80 Assembly on the Propeller
Of course, size does not really matter. I'm off to buy my red sports car now...
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.smarthome.viviti.com/propeller
Post Edited (Dr_Acula) : 7/25/2010 12:26:21 PM GMT
11944 bytes using "printf".
16952 using "iprintf" - integer only printf
16944 using "siprintf" - Not sure what's different about this one.
3228 using "print" - Unformatted printing of strings.
That last result is nice. All of them could be shrunk by 1K my removing the ZPU EMULATE instructions jump table which Zog does not use.
Not sure why Zogs integer only printf is actual less compact than the standard printf.
I'm surprised that Zog actually beats BDS C. Not really a useful comparison, BDS C is using 16 bit ints on an 8 bit machine. Zog is all 32 bit. Doing 32 bit maths on BDS C might me expected to produce bigger code than Zog.
P.S. Compiling a statically liked and stripped hello world binary for my Linux box results in an executable size of 511384 bytes!!!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Post Edited (heater) : 7/25/2010 11:55:27 AM GMT
It helps when the OS does all the work.
Dave
Post Edited (Dave Hein) : 7/25/2010 12:21:53 PM GMT
That '511384' number can't be right. I just tried it using gcc under Windows - 11,264 bytes.
Did you use 'strip' on the executable? Without that, the Windows executable comes in at a whopping 1,116,958 bytes!
Ross.
There's always one smart alec ...
Ross.
Darn it, Dave sets a high standard here with 22 bytes.
Ok - some little cheats. CP/M accepts CR as a CRLF so only need to send the 13, not the 10.
Don't call a bios function then RET to CP/M. Jump to the bios function and the RET at the end will RET to CP/M.
Tested on a real propeller board:
3 bytes for LD DE, Test_text
2 bytes for LD C,9
3 bytes for JP FDOS
11 bytes for Hello World
1 byte for the CR
1 byte for the $ symbol (CP/M uses this instead of ascii 0 for end of string)
The END is just a marker, not part of the program.
Total 21 bytes.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.smarthome.viviti.com/propeller
All this on my Debian testing installation. On Debian stable it is much the same. What on earth is going on?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
If I compile as you did, with dynamic libs, I get 4511 bytes. Then stripping the symbols out it is 2868.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Post Edited (heater) : 7/25/2010 2:32:54 PM GMT
This explains a lot of that 0.5MB.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
Post Edited (jazzed) : 7/25/2010 3:34:43 PM GMT
As we see here the hello world is not much good for this task. It does not have any code in it to speak of and all the work is done by the OS or libraries which are very dependent on libraries. Also it shows us a one time overhead of all the printf stuff, including malloc etc. If you print two strings the program does not get twice as big.
Further I would say that for a lot of embedded work things like printf are not even required.
Dhrystone has been mentioned but as it is heavily biased toward string operations I don't see it as being representative of a lot of embedded code either.
As a start I'd like to propose the xxtea encryption algorithm.
It's a nice small self contained function with a good selection of if, while, for. It uses 32 bit arithmetic, arrays and a hand full of nice operators.
As a bonus it may even be useful to have around. We could wrap it in a loop and do some timing measurements as well.
Attached is a version of xxtea as found on wikipedia. It's only called once from main to decrypt a message. There is no output from this program unless PRINTING is defined when compiling it, then it uses printf.
For GCC the Zog/ZPU binary is 3592 bytes with no printing and 12392 with.
The actual btea function itself is 369, which I think is the important number to look at.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
The xxtea code is fine for a benchmark.· Could you also measure the number of cycles it takes to execute the function?· I'll port it to CSPIN and determine the size and speed as well.
I would also like to do a comparision using Dhrystones.· I was able to get it to compile under CSPIN with some modifications.· CSPIN doesn't support enums or 2 dimensional arrays yet, so I had to make some changes.· Ross, I know you will object to changing the source code, so the numbers I measure with CSPIN will need to be treated as unofficial preliminary values until I get enum and 2D arrays implemented.
Dave
Here are the raw numbers for xxtea for Catalina:
No printing: 4808 bytes
Printing: 12684 bytes (includes stdio printf and drivers)
btea: 1092 bytes (273 long instructions)
cycles: 268720
UPDATED: cycles added.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
In this case Zog is only about 20% smaller. Also I had a look at the byte codes and found GCC is sneaking in calls to some library code. To handle unsigned divides I think. So I should really include that function as well. The 20% advantages then disappears[noparse]:([/noparse]
I have to find a new benchmark that only uses signed ints [noparse]:)[/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Don't panic! - I was calculating the size of btea in hex, not decimal - now corrected.
So Zog is comfortably smaller.
Ross.
Wow, that may just cheer me up enough to continue with Zog[noparse]:)[/noparse]
You forgot to update the last line of that post:
s/Seems quite comparable with Zog./Zog beats Catalina hands down./
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I knew something wasn't right there. Serves me right for doing it in a hurry.
Update your stats when you figure out how big the library routines Zog uses actually are.
Ross.