CLEAN.exe (Case-Level Examine And Nullify), a program to weed out unused object
Phil Pilgrim (PhiPi)
Posts: 23,514
In another thread, ideas for how to write a BIOS (Basic Input/Output System) for the Propeller are being floated and discussed. The discussion leans a little towards post-compilation manipulation of object files, which is fine in many situations. However, I am more interested in seeing what can be done at the source level. The Spin language and Propeller Tool were developed with an eye towards openness, so as not to create a market for binary modules whose source code is a mystery. It's with this spirit in mind that I came up with CLEAN.
As it stands, the Spin compiler includes every object that's ever mentioned in the OBJ sections of the top object and its referents, along with all of its methods, on down the chain. This is true, even if you never use the object for anything or call its methods. What I wanted to do was examine all the objects referred to in a program, weed out completely the ones that aren't used, and eliminate unreferenced methods — even from the objects that are used. This would allow a BIOS object, say, to reference driver objects for all possible devices attached to a particular system, without the burden of including them all in the compiled application code if they aren't being used. This can be accomplished by parsing the source files (not very deeply, it turns out), starting with the first public method in the top object file, and keeping track of which objects and methods are called. Then, alternative source files can be created which contain only the stuff that's actually used. These, in turn, can be compiled in place of the originals, resulting in more compact object code.
An additional requirement of a BIOS is a set of universal I/O routines — one for input and one for output — that will read or write a character to a device chosen by a file handle. A file handle can be nothing more than a numerical parameter. What throws a wrench into the works for object file reduction, though, is that one method has to be able to access all possible device drivers on a parametric basis. Attempts have been made to use conditional compilation to solve this problem, wherein the user can decide which drivers to include and which to exclude, based on the setting of a constant. My goal was to make things a bit more automatic. That's why I've extended source code examination down to the CASE statement level.
In a typical BIOS, each device can be represented by a defined or enumerated constant:
In a program using the BIOS, these constants can be referred to by name, viz:
If open is a method in the BIOS that creates file handles and opens devices, it might be written thus:
Notice two things: 1) calls to start routines assume a fixed pin assignment. This is because the BIOS for a particular board knows what's connected where; and 2) there's a unary plus sign ahead of each case. This is a signal to CLEAN that these are cases needing special treatment. CLEAN keeps track of all constant references it encounters of the form objref#const. If it then encounters a case with a leading unary plus, it will test each of the conditions in the comma list to see if it's a constant that's been referred to elsewhere. If it is, the code after that case is emitted when the compressed source file is written; if not, it's omitted. What this means is that a reference to an external object can also be omitted, based on whether its associated constant is referenced anywhere.
Here's a sample BIOS to illustrate the point more completely:
Once the user's program establishes a file handle using open, the get and put routines can be used with it to retrieve and transmit character data without any regard for how it's done. These routines also use the +case notation to eliminate calls that never get referenced due to their associated devices not being used. Here's a simple program that inputs data via a bidirectional serial port and outputs it to a TV monitor via tv_wtext:
When this program is run through CLEAN, it produces source files equivalent to the objects referenced, but with "CLEAN_" prepended to the file names. In the above example, mybios.spin would get converted to CLEAN_mybios.spin and read as follows:
Notice that all references to the keyboard object have been deleted. This is because that object was never used.
CLEAN requires Propellent.exe and should be installed in the same directory as Propellent. CLEAN is a command-line program without any Windows baggage. It is invoked as follows (edited per v0.14):
Here are a couple things that CLEAN will not do:
-Phil
Update: Version 0.11 fixes case-sensitivity bug.
Update 2: Version 0.12 fixes nested {{super comment}} bug.
Update 3: Version 0.13 upgrades comment handling and fixes further case sensitivities.
Update 4: Version 0.14 makes ".spin" on source file truly optional; adds option for test compile only by omitting <command> from command line arguments.
Update 5: Version 0.2 converts cleaned file to all lowercase; tracks individually-named instances as well as objects; deals with subscripted object references; fixes a method-marking bug.
Update 6: Version 0.3 includes hooks for a preprocessor (see above); converts cleaned file to all lowercase except stuff in quotes.
Update 7: Version 0.31 fixes another upper/lower case issue in file names.
Update 8: Version 0.32 separates guts internally from command line processing in prep for Windows version; deletes ".schema" from filenames in OBJ section.
Update 9: CLEAN v0.33 includes minor internal changes to common code for compatibility with wCLEAN. wCLEAN v0.1 is the Windows version of CLEAN.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!
Post Edited (Phil Pilgrim (PhiPi)) : 7/13/2008 9:36:52 PM GMT
As it stands, the Spin compiler includes every object that's ever mentioned in the OBJ sections of the top object and its referents, along with all of its methods, on down the chain. This is true, even if you never use the object for anything or call its methods. What I wanted to do was examine all the objects referred to in a program, weed out completely the ones that aren't used, and eliminate unreferenced methods — even from the objects that are used. This would allow a BIOS object, say, to reference driver objects for all possible devices attached to a particular system, without the burden of including them all in the compiled application code if they aren't being used. This can be accomplished by parsing the source files (not very deeply, it turns out), starting with the first public method in the top object file, and keeping track of which objects and methods are called. Then, alternative source files can be created which contain only the stuff that's actually used. These, in turn, can be compiled in place of the originals, resulting in more compact object code.
An additional requirement of a BIOS is a set of universal I/O routines — one for input and one for output — that will read or write a character to a device chosen by a file handle. A file handle can be nothing more than a numerical parameter. What throws a wrench into the works for object file reduction, though, is that one method has to be able to access all possible device drivers on a parametric basis. Attempts have been made to use conditional compilation to solve this problem, wherein the user can decide which drivers to include and which to exclude, based on the setting of a constant. My goal was to make things a bit more automatic. That's why I've extended source code examination down to the CASE statement level.
In a typical BIOS, each device can be represented by a defined or enumerated constant:
CON VID = 1 TV3 = 2 SIO = 3 KBD = 4
In a program using the BIOS, these constants can be referred to by name, viz:
OBJ io : "mybios" PUB Start | video video := io.open(io#VID) serial := io.open(io#SIO) ...
If open is a method in the BIOS that creates file handles and opens devices, it might be written thus:
PUB open(device) case device +VID: tv.start(12) +TV3: tv.start(12 | tv#CHANNEL3 | tv#MUTE) +SIO: ser.start(27, 27, 9600) +KBD: keybd.start(26) return device
Notice two things: 1) calls to start routines assume a fixed pin assignment. This is because the BIOS for a particular board knows what's connected where; and 2) there's a unary plus sign ahead of each case. This is a signal to CLEAN that these are cases needing special treatment. CLEAN keeps track of all constant references it encounters of the form objref#const. If it then encounters a case with a leading unary plus, it will test each of the conditions in the comma list to see if it's a constant that's been referred to elsewhere. If it is, the code after that case is emitted when the compressed source file is written; if not, it's omitted. What this means is that a reference to an external object can also be omitted, based on whether its associated constant is referenced anywhere.
Here's a sample BIOS to illustrate the point more completely:
CON _clkmode = xtal1 + pll16x _xinfreq = 5_000_000 #1, VID, TV3, SIO, KBD OBJ tv : "tv_wtext" ser : "serial_io" keybd: "keyboard" PUB open(device) case device +VID: tv.start(12) +TV3: tv.start(12 | tv#CHANNEL3 | tv#MUTE) +SIO: ser.start(27, 27, 9600) +KBD: keybd.start(26) return device PUB close(device) case device +VID, TV3: tv.stop +SIO: ser.stop +KBD: keybd.stop PUB put(device, data) | char case device +VID, TV3: tv.out(data) +SIO: ser.out(data) PUB get(device) case device +SIO: return ser.in +KBD: return keybd.in
Once the user's program establishes a file handle using open, the get and put routines can be used with it to retrieve and transmit character data without any regard for how it's done. These routines also use the +case notation to eliminate calls that never get referenced due to their associated devices not being used. Here's a simple program that inputs data via a bidirectional serial port and outputs it to a TV monitor via tv_wtext:
CON _clkmode = io#_clkmode _xinfreq = io#_xinfreq OBJ io : "mybios" PUB start | echo, serio echo = io.open(io#VID) serio := io.open(io#SIO) repeat io.put(echo, io.get(serio))
When this program is run through CLEAN, it produces source files equivalent to the objects referenced, but with "CLEAN_" prepended to the file names. In the above example, mybios.spin would get converted to CLEAN_mybios.spin and read as follows:
CON _clkmode = xtal1 + pll16x _xinfreq = 5_000_000 #1, VID, TV3, SIO, KBD OBJ tv : "tv_wtext" ser : "serial_io" PUB open(device) | rx_pin, tx_pin, baudrate case device +VID: tv.start(12) +TV3: tv.start(12 | tv#CHANNEL3 | tv#MUTE) +SIO: ser.start(27, 27, 9600) +KBD: return device PUB close(device) case device +VID, TV3: tv.stop +SIO: ser.stop +KBD: PUB put(device, data) | char case device +VID, TV3: tv.out(data) +SIO: ser.out(data) PUB get(device) case device +SIO: return ser.in +KBD:
Notice that all references to the keyboard object have been deleted. This is because that object was never used.
CLEAN requires Propellent.exe and should be installed in the same directory as Propellent. CLEAN is a command-line program without any Windows baggage. It is invoked as follows (edited per v0.14):
CLEAN.exe <action> <option>s <sourcefile> <action> = -h, --help: Print this help info. <action> = -v, --version: Print version info for CLEAN. <action> = -b <directory>, --bin <directory>: Save binary file and preprocessed and compacted Spin files to <directory>. DO NOT delete preprocessed or compacted Spin files. <action> = -r, --ram: Upload object code to Propeller's RAM. Delete preprocessed and compacted Spin files. <action> = -e, --eeprom: Upload object code to Propeller's EEPROM. Delete preprocessed and compacted Spin files. <action> omitted: Just compile. Don't create a binary file. Delete preprocessed and compacted Spin files. <option> = -l <directory>, --lib <directory> Use <directory> as the library directory. If this option is missing, use library in lastest version Propeller Tool directory. <option> = -p <preprocessor>, --pre <preprocessor> Preprocess each source file through <preprocessor> before compacting. Preprocessor must accept two arguments: <inputfile> <outputfile>. <sourcefile> = Full path and filename of top object file. (May omit ".spin".) Any parameter that constains one or more spaces must be enclosed in "quotes".
Here are a couple things that CLEAN will not do:
- Examine or mess with CON, VAR, or DAT sections. These are kept as-is (but stripped of comments). DAT sections, in particular, are too messy even to comtemplate screwing with.
- Recurse on +case conditions. In any method, only the topmost levels involving unary-plus conditions are examined.
-Phil
Update: Version 0.11 fixes case-sensitivity bug.
Update 2: Version 0.12 fixes nested {{super comment}} bug.
Update 3: Version 0.13 upgrades comment handling and fixes further case sensitivities.
Update 4: Version 0.14 makes ".spin" on source file truly optional; adds option for test compile only by omitting <command> from command line arguments.
Update 5: Version 0.2 converts cleaned file to all lowercase; tracks individually-named instances as well as objects; deals with subscripted object references; fixes a method-marking bug.
Update 6: Version 0.3 includes hooks for a preprocessor (see above); converts cleaned file to all lowercase except stuff in quotes.
Update 7: Version 0.31 fixes another upper/lower case issue in file names.
Update 8: Version 0.32 separates guts internally from command line processing in prep for Windows version; deletes ".schema" from filenames in OBJ section.
Update 9: CLEAN v0.33 includes minor internal changes to common code for compatibility with wCLEAN. wCLEAN v0.1 is the Windows version of CLEAN.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!
Post Edited (Phil Pilgrim (PhiPi)) : 7/13/2008 9:36:52 PM GMT
Comments
When removing +KBD say, you also want to lose the empty "+KBD:" case option from the cleaned
source or you get a 5 byte overhead for each.
CLEAN doesn't make use of pre-existing PATH information, although I could add it I suppose. That's why it's necessary to include the full path in the file spec for the source. Also, if your Propeller Tool is in a non-standard location, CLEAN won't be able to find its associated library, and you'll get that warning. The remedy is to use the --lib option on the command line. The source directory and library location, in that order, are used internally by CLEAN as the search path. My thinking was that most command line programs are either scripted or included in batch files, so the additional typing wouldn't be an issue. But that scenario will become realistic only after Jeff excises the Windows stuff from Propellent and sends its messages to STDOUT.
As to the "missing method name" error in your example, I'll have to go out to the shop to check it out on the dev machine. (I just got up and haven't had my coffee yet.) Who knows? I may have to send boards to England!
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!
On the subject of obscure errors, I've not tried, but have you considered bizarre commenting ...
Good point about "others:" clause in the case. Perhaps buffer up a list of names and include them if needed, leave them out if not.
The problem you uncovered was a remnant case-sensitivity that didn't recognize capital letters in method names. (I can't believe that none of the programs I tested this on used caps!) Anyway, that qualifies for the boards, and I'd be happy to send a couple to England. Just email me your particulars, and I'll get them off. So as not to discourage further testing so soon, I'll make the same offer for the next bug identified.
As to the commenting, yes, that was a nettlesome issue until I bit the bullet and just parsed out all the comments before further processing. (To keep things simple, {{s are treated as two {s.) Quoted strings were another issue. These are emptied temporarily so the parser isn't fooled by things like string("myobj.mymethod"). Unlike comments, the strings have to be restored when the compressed code is emitted. But that's still easier than doing a full, deep parse of the source.
The program is written in Perl, BTW. Chip (or was it Jeff?) mentioned method weeding as a future objective for the compiler. Based on my experience with CLEAN, though, I'd hate to have to do in in x86 assembly!
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!
I'm sure you know this but your comment parsing code can easily be fooled. For instance, {{ } } }} is just one comment and no loose braces.
It's easy to fix though; {{ matches until }} and {{}} do not nest. Other than that, { and } nest (and yes, }} will terminate { { if it is not
used to terminate {{.)
-tom
Want some boards?
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!
By the way, I do like this. It's far better than doing it on the binary (which is unworkable because the "full"
binary might be too large to even compile).
It's too bad the compiler doesn't do this automatically and we have to "work around" all these issues.
I may make very good use of clean with my fsrw stuff, when I add (eventually) support for multiple files,
directories, etc., that not everyone needs.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Rolling this into a complete pre-processor would obviously be a nice goal but not saying it has to be the goal. I have no preference to a chain of separate tools or one monolithic tool.
Post compilation on the image, if done at all, could just be dead code removal, handling he stuff which cannot be done at a source level. It would probably only eek out a few bytes extra so might not even be worth it. If it were disassembling the image, optimising and re-assembling it would make more sense.
Biggest gains should come from pre-compilation passes if the compiler isn't doing it itself.
Hippy, I've avoided IF statements up 'til now, since their syntax allows more complicated expressions. Can you think of an example where such treatment of IFs would be helpful but not handlable by a CASE construct?
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!
I think I've got the nested comment thing fixed. Spin comments do have their subtleties. Inside a {comment}, for example, {{ and }} are each treated as pairs of single braces, rather than double braces. What it boils down to is this: Spin has three kinds of comments, and you can't be inside more than one kind at a time. Here's the Perl code I use to remove them:
... in case anyone really cares.
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!
Very nice. I just tested it on Beaus 3D Demo http://forums.parallax.com/showthread.php?p=605385
It cut over 100 longs off the graphics.spin.
Convenient way to avoid thinning out drivers by hand.
Cheers, Clemens
the three different comment formats and two different string comments and just
worked out all the cases. DFletch probably needs to see this code too, and anyone
else writing anything that processes Spin.
If I want to really be nasty, I can mention that this will probably not parse
something like
but I don't think that's really relevant or even that helpful. In particular it has no
effect (I believe) on the actual operation of CLEAN.
Hmm, you're right again. I hadn't considered the possibility of comments being treated as whitespace by the compiler. Seriously, that should be considered a compiler bug, but I guess I'll have to work around it. It's even worse than that, though. How do you imagine this code compiles?
Answer: It's as if each brace were replaced by a space! What this means is that indentation in Spin isn't governed by the number of leading spaces in a line, but by the column position of the first significant character of code. Yikes! It's back to the Perl editor. I think I'll just leave the comment delimiters in place and replace the comment guts with spaces. That'll make for some weird-looking compressed source, but I guess that doesn't matter.
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!
Post Edited (Phil Pilgrim (PhiPi)) : 7/8/2008 10:42:03 PM GMT
Also the help text says .spin may be omited. I have found it must be omited put .spin and it says p1.spin.psin no such file.
Its something about the config.spin file. It works as long as that file is not included in test.spin. Its related to all the·CON section of config.spin and its use in test.spin
Post Edited (Timmoore) : 7/9/2008 12:44:50 AM GMT
You also need to deal with swallowed newlines, perhaps:
but that's not too bad (just don't "end" lines that end with {} or {{}} comments).
I think we've discussed {}-is-whitespace being a bug or not before, and we are on different
sides of that coin. But it is what it is.
Try the latest version. There was a lingering case sensitivity that hadn't been corrected. Also, send me your shipping particulars, and I'll mail you a couple boards.
Tom,
Yup, I commented out the very newline inhibitor that would've spared this error. I've also decided to convert the comment delimiters to blanks. This makes it easier to parse those special CASE constructs when they begin with { } indents. Here's the latest comment eradicator.
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!
Post Edited (Phil Pilgrim (PhiPi)) : 7/9/2008 2:22:28 AM GMT
It hasn't fixed the putting the .spin extension on the file e.g. clean -r t.spin will fail
Also a feature requrest, propellent allows compile only, can clean have that option?
'Fixed the .spin issue with v0.14. Also, if you omit the <action> from the command line, CLEAN will compact and test compile your code, but nothing will be saved. To save the compacted Spin files and write a .binary file, you still need to use the --bin command with a target directory. Note also that --bin will not try to communicate with, or upload anything to, your Propeller.
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!
Many thanks (again) to Tim Moore who has provided a really big app to test CLEAN on.
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!
clean -b b pcFullDuplexSerial.spin
fails with Can't use an undefined value as an ARRAY reference at C:\sync\rb\CLEAN.exe line 233.
Try it now (v0.31). I had deleted one too many lcs from the source code.
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!
In case anyone is interested. Here are the files I used with clean.
bc.bat, etc. run clean in various modes. Just makes using clean simpler and removes need for full pathnames.
pre.bat runs the visual studio C++ pre-processor on all spin files before compiling. Allows me to use #if and #include. Its setup for visual studio 2008, change the path to 8 if you want to use visual studio 2005.
There is only 1 thing you have to live with to use the cpp pre-processor is
CON
#0
enum
syntax causes problems, so I do
CON
temp1,#0
enum
see config.spin for an example of this.
apart from that the c pre-processor works great and with Phils clean I can now run it on all files rather than just the top spin file.
The latest version of CLEAN solves this by allowing you to produce a "schema" object that stands in the place of known good code. Here's an example of how to do it. Suppose I have BIOS that looks like this:
If I click "Summary" on the Propeller Tool's toolbar, I get this:
This is almost everything needed for a schema. All that's lacking are the CONstants. So start a new file, and copy and paste the summary to it. Next, copy the CON section from the original file and paste it in place of the dummy CON section in the new file. You'll end up with somehting like this:
Now save the new file with the same name as the old file, but with the suffix ".schema.spin".
Finally, when you refer to the BIOS in your program's OBJ section, add a ".schema" suffix. The Propeller Tool will tack on the ".spin" suffix for you, as usual. Here's an example:
When you compile this program using the Propeller Tool, the abbreviated schema file that doesn't use any external objects is substituted for the real thing, and you can work out all the syntax errors in your own code using the Propeller Tool. But when you run the file through CLEAN (v0.32 and above), the ".schema" is stripped from all OBJ references, and the original objects are compressed and compiled into your object code.
One caveat when creating schema files: If the CON section makes reference to constants in other objects, you'll have to substitute dummy values for them by hand. Otherwise, the compiler will choke, since the OBJ section is null.
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Cheers,
Simon
www.norfolkhelicopterclub.co.uk
You'll always have as many take-offs as landings, the trick is to be sure you can take-off again ;-)
BTW: I type as I'm thinking, so please don't take any offense at my writing style
Caveat: wCLEAN.exe has only been tested on the development machine, so it may be missing some Perl modules that my system graciously, but discreetly, fills in. If you can't get it to start by clicking on it, run it from the command line, and you should see a message indicating a missing module. Let me know, and I'll recompile. (I use Perl2exe, which doesn't always import everything it needs automatically — especially where Tk modules are concerned.)
-Phil
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
'Still some PropSTICK Kit bare PCBs left!
Post Edited (Phil Pilgrim (PhiPi)) : 7/13/2008 9:50:42 PM GMT
I have a question.
Does Clean work with PASM?
James L
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
James L
Partner/Designer
Lil Brother SMT Assembly Services
Are you addicted to technology or Micro-controllers..... then checkout the forums at Savage Circuits. Learn to build your own Gizmos!