Pure Virtual Functions Cause Instant Code Bloat
photomankc
Posts: 943
I was curious what it is about about pure virtual declarations that will cause hub overflow in all but contrived examples? I have several projects with class hierarchies that look like this:
Now in most cases the functions in InterfaceClass are just stubs. There can be no implementation because we have no idea how to do that yet so they ought to be declared as "pure virtual" and only defined in Derived1 and Derived2. However, in every real project so far I have not been able to do that. Instead I have to put in either an empty definition { } or return a dummy variable { return -1;}. If I put the '=0' at the end of the declaration then my project gets the ".text will not fit in hub" error message on linking. remove the '=0' and it compiles and links fine. What drives me nuts is that I can make a very contrived example with a couple of classes and no real code behind it and no problems at all tiny code size and no errors. It only seems to nail me when I try a larger real-world application. I do this on other platforms without issues so I don't think its a syntax issue. In fact some of my code I have ported over to a Linux system and it works fine there if I make pure virtuals out of the methods instead of the dummy bodies I have to use in PropGCC.
It's not that big a thing. I get along without them in PropGCC the issue is in porting. Therer are some things that I share between a Beagle Bone Black and a Propeller like an interface classes for I2C and certain types of sensors. I would prefer to have the code port over without editing and I would prefer an unimplemented but required function to produce a compile error vs incorrect run-time operation. I can post up my current project if needed.
InterfaceClass (Defines the minimum set of methods you can expect from all derived objects) | | ------------------------| Derived1 Derived2
Now in most cases the functions in InterfaceClass are just stubs. There can be no implementation because we have no idea how to do that yet so they ought to be declared as "pure virtual" and only defined in Derived1 and Derived2. However, in every real project so far I have not been able to do that. Instead I have to put in either an empty definition { } or return a dummy variable { return -1;}. If I put the '=0' at the end of the declaration then my project gets the ".text will not fit in hub" error message on linking. remove the '=0' and it compiles and links fine. What drives me nuts is that I can make a very contrived example with a couple of classes and no real code behind it and no problems at all tiny code size and no errors. It only seems to nail me when I try a larger real-world application. I do this on other platforms without issues so I don't think its a syntax issue. In fact some of my code I have ported over to a Linux system and it works fine there if I make pure virtuals out of the methods instead of the dummy bodies I have to use in PropGCC.
It's not that big a thing. I get along without them in PropGCC the issue is in porting. Therer are some things that I share between a Beagle Bone Black and a Propeller like an interface classes for I2C and certain types of sensors. I would prefer to have the code port over without editing and I would prefer an unimplemented but required function to produce a compile error vs incorrect run-time operation. I can post up my current project if needed.
Comments
This will fail and produce the following:
In your example somePtr->doFoo() could mean two different things depending on the type of somePtr.
In order do that the compiler has to add some look up tables to your code containing the various virtual functions (vtables) it also needs to add some code to figure out which function in the vtable to call given a particular type. I'm not familiar with how it actually does that but that is the general idea.
Having said that I was surprised that this simple example blows all of HUB RAM so easily. I compiled it on my PC with
gcc -static -o virtual -std=c99 -Wall -Os virtual.cpp -lstdc++
and it came to 750KB !!
I compiling that code on my PC with gcc, instead of c++ or g++, and leaving out the -lstdc++ like so:
gcc -o virtual -static -Wall -Os main.cpp derived.cpp
Notice the undefined references to "vtable". I can add "-lstdc++" to that command and it will build. cpp an
So we see that using virtual classes means we are having to pull in the standard C++ library (libstdc+) and that is huge.
Quite why it has to drag in so much stuff to do this is beyond me.
Note: I split your example into separate files for main, virtual and derived, with appropriate header files, as I have found it impossible to compile virtual classes sometimes unless they are in their own files.
Yep, removing all that "virtual" and "=0" stuff and I am only down to 700KB !!
There are those who warn that C++ is not suitable for small embedded systems as it causes severe code bloat and performance issues.
My experiments had shown that is is possible to use C++ classes and other features and the code will be almost byte for byte the same as if you had coded the thing in C with pointers and structs. C++ is quite OK for small systems if used in a style like the objects in Spin.
I had always suspected that inheritance and polymorphism would cause size problems. I never realized it was this bad.
I'm sure 99% of that 700K being included in my static linux binary is not required to build such a simple program.
Try using this option:
Daniel
c++ -static -fno-exceptions -fno-rtti -o virtual -Os -Wall virtual.cpp
still comes in at 750KB!
Thing is needs __cxa_pure_virtual and delete() from stdc++, as show when you compile with gcc and don't add "-lstdc++".
That seems to pull in a ton of junk.
There has to be more to that story. I have several projects in the Beagle Bone Black that make use of these same classes using pure-virtual and they are nowhere near that 750K size. I just finished writing an interface to a TMP102 sensor using my I_I2C ---- FS_I2C hierarchy that uses virtual and pure-virtual functions both. The completed executable is 24.3K and includes all the Linux file handling nonsense to work with "/dev/i2c-1".
In the above example in fact if you just take away the "=0;" and replace it with "{ };" this will compile to 2,724 bytes. All the v-tables are still there as is all the polymorphic ability it's just the fact that I'm telling the compiler that doFoo() MUST be implemented in children vs CAN be implemented in children that seems to send the code into orbit. That isn't making sense to me.
This will compile to a perfectly acceptable size of 2,732 bytes:
This contains only 3152 bytes of code/data if you add up the memsz fields above.
On my Raspberry Pi this command:
gcc -s -static -fno-exceptions -fno-rtti -o virtual -Os -Wall virtual.cpp -lstdc++
gets me a executable of 508592 byes.
Of course if you forget the "-static" it is:
only 3495 bytes
I have an ARM cross compiler here that will get us down to a 40KB static build:
/opt/gcc-arm-none-eabi-4_7-2013q3/bin/arm-none-eabi-gcc -s -static -fno-exceptions -fno-rtti -o virtual -Os -Wall virtual.cpp -lstdc++
Or there is propgcc which gets that to 3752 bytes!
/opt/parallax/bin/propeller-elf-gcc -s -static -fno-exceptions -fno-rtti -o virtual -Os -Wall virtual.cpp -lstdc++
Make that pure virtual and propgcc is back to 69K and it complains about not fitting in HUB
This propgcc version 4.6.1 (propellergcc-alpha_v1_9_0_2157)
Ok looks like you got that one. Didn't realize that I was dynamic linking. So without pure-virtual function in the example:
g++ -static -fno-exceptions -fno-rtti -Wall -lstdc++ -std=c++0x -c -I./include -I./sensors -I./bus_protocol -c virtual.cpp -o build/virtual.o
g++ -static build/virtual.o -o ./bin/virtual-test
-rwxr-xr-x 1 robot users 583K Oct 17 09:05 virtual-test
Now with a pure-virtual added back in:
g++ -static -fno-exceptions -fno-rtti -Wall -lstdc++ -std=c++0x -c -I./include -I./sensors -I./bus_protocol -c virtual.cpp -o build/virtual.o
g++ -static build/virtual.o -o ./bin/tvirtual-test
-rwxr-xr-x 1 robot users 640K Oct 17 09:08 virtual-test
So almost 57K of bloat to specify a virtual as pure-virtual. I'm a little surprised by that.
Like I said, virtuals without pure specifier are workable for sure and give me the benefit of making code that can have the support objects change without refactoring things everywhere but will be at the risk of run-time bugs if all the required interface functions are not over-ridden. If we can get to where pure-virtual doesn't crush the hub into the ground that would be great.
Project Directory: C:/Users/kcrane2/Documents/SimpleIDE/My Projects/PureVirtual/
propeller-elf-c++ -I . -L . -o cmm/PureVirtual.elf -Os -mcmm -fno-exceptions -fno-rtti PureVirtual.c
propeller-elf-objdump -h cmm/PureVirtual.elf
Done. Build Succeeded!
Code Size is 2736 bytes (2908 total).
It's the same with either:
virtual void doFoo() {}; or
virtual void doFoo()=0;
Awesome!
EDIT: I see it also throws exceptions and thus crams exceptions back in as well. Quite the party it throws.
$ /opt/parallax/bin/propeller-elf-gcc -s -static -fno-exceptions -fno-rtti -o virtual -Os -Wall virtual.cpp -lstdc++
Gives 3868 bytes for the non-pure virtual and 3880 with the pure virtual.
Looks like a made a mistake originally, sorry.
That same setup with pure virtual gives 40K with my ARM cross compiler, 700K on my PC, and 500K on my Raspi.
"You can't use classes and objects they wast a huge lot of space", demonstrably that is not true.
"You can't use templates...same ..same...", demonstrably not true.
And so on.
All the time ignoring the fact that C++ is what is used to program Arduinos which are tiny.
One argument that always stumped me was the "You can't use inheritance and virtual functions it wastes a lot of space"
Thankfully thanks to your question and David's answer we now have a counter to that argument as well.
Hopefully I can eventually find out how to get a similar result on the PC and Raspi.
In this case I think the main argument could come down to performance. There is more work involved in resolving the virtual call for polymorphic types and if the function is doing very little work and being called in a tight loop the overhead could be excessive. So virtual getters/setters of a variety of different object types called in tightly executed loop are a degenerate case where it could cost many times what a static function call would. On big-boy CPU's screwing up the cache in a performance intensive app is a disaster. As with all things. Use it wisely. In my mind the benefits of a single library of code for sensors/buses/motors/ect outweighs some loss of performance. If I have really intense code that needs to do something very fast I put that in C and use another COG to get-r-done.
On the PC or even the PI I am not so worried about the huge size of these binaries or even the performance usually. But it is going to bug me now. What has been done here is:
a) Show that using inheritance and pure virtual methods does not require half a megabyte of binary as is normally produced by GCC. It's only a few KB.
b) There is a "trick" with propgcc that gets the huge binaries down to a few KB.
So now I want to know why the same the same source with the same trick and compiled with the same options does not work on the PI and PC. It still produces half megabyte binaries. What other tricks do I need with these
compilers?
As for the overheads of C++. What I have found so far is that if you actually need something in your code you can do it in C or C++ and find that C++ does it with no overhead and perhaps even allows better oprtimization to produce smaller faster binaries. Much coding in C is is not amenable to optimization because type information and such is lost.
Examples:
1) Classes:
If you need many similar "objects" in your program that can all be manipulated by the same functions then you can do that in C. You will probably put the objects data into structs and you will probably pass pointers to those structs as parameters to the functions.
Or you could do that in C++. Have your objects defined by classes and have those functions as methods.
All that happens is that the "object pointer" is still there but now it is hiddden from you and passed around to methods as "this". That is not C++ overhead, that is something you needed anyway and had to do manually in C
If you compile C and C++ versions of this you will find the code can come out byte for byte the same!
2) Templates:
If you need to apply similar processing to different types, ints, longs, floats, "fish" etc. You can do that in C. Perhaps you have to write out the code N times, once for each type. Or perhaps you pass void pointers around together with some helper functions for each type. Yuk.
Or in C++ you write your processing once and template the types. The compiler takes care of it. There is no C++ overhead here. You needed to do that stuff anyway.
3) Inheritance, polymorphism
If you need your code to dispatch to different functions depending on the type of data it has then you can do that in C. You will probably put your types into different structs together with some field indicating its "type". You will probably need to write code to check that "type" field and look up which actual function to call. Yuk.
Or you can do it in C++ and let the compiler insert all that dispatching business. There is no overhead here (hopefully) you needed to do that in your code anyway.
I said "hopefully" here as the exeriment/comparison has not been done yet.
None of these things is a C++ overhead if you actually need them in your code. You can write it by hand or let C++ do it for you. I suspect the C++ generated code can often be better than hand written C to do these things.
If you don't need classes in your code, you only have one of everything, don't use classes. That little "this" pointer goes away and you get standard C code. And so on for the other features.
There is no reason to go back to .c files:)
Sorry for rambling on here...
Otherwise, you're preaching to the choir here. I have seen all kinds of arguments on how abstraction is overhead and C++ is too bloated (I think you can see how that could be thought though) and C produces better, tighter code. Most of them turn out to be only marginally true when real results are compared so I agree that there is little reason to wave it off from all embedded development completely as many claim you must do. So long as these issues are addressed as we see the code is perfectly well suited to the task.
On polymorphic classes though there are some real hits that can matter:
- The compiler may not know what type is going to be used in this instance and thus can not inline any of the function/method calls even the trivial ones. That in itself can be a hit performance-wise.
- The method calls may have to be resolved via the v-table. In cases where the compiler can tell 100% what object is going to be called it can place the function call statically but not if there is ambiguity about what the actual type will be at run-time it's going to have to be the v-table lookup.
- To resolve that virtual method may take 2 times longer or more since there are a couple more steps involved in getting the address of the right method to call. That may or may not be significant.
Now if the function does some math and a bunch of processing then all that may be a fart in the wind. Who cares if it took 10nS longer to call the virtual area() method for the Circle descendant of Shape when that took 100uS to complete anyway? Probably nobody. However if you call the virtual getXLen() of the Square descendant of Rectangle type of Shape in some tight loop that needs to get millions of those widths you could see rather significant impact there vs a standard, inlined, assignment. If you need to do that in the minimum period of time then this could be a place you have to reconsider the approach.
I've always been prone to write things hybrid anyway. I tend to have a procedural style main loop with OOP subsystems that loop interacts with. My OOP college professors would get very offended by that but it always worked well for me. No need to go back to .c files. Global functions work just as good in C++ if you need one.