Object duplicate detection
BradC
Posts: 2,601
G'day all,
Question.
It appears the Parallax SPIN compiler detects duplicate objects by looking at the generated object code.
As we have seen previously, this can lead to issues where you have 2 bytecode identical objects, but those refer to differing objects.
I thought about comparing the source code or file names, but then we have objects where people have duplicated the source files..
OBJ
a : "object-1"
b : "copy_of_object-1"
c : "copy1_of_object-1"
While this is unnecessary and bletcherous, it's been done more than once.
The Parallax compiler will include the object only once, as its object code is identical.
What do we think might be a good way of redundant object removal?
I was leaning towards doing it based on the file name/path of the object (in fact at the moment I do), but then that leads to unnecessary duplicates like the above example.
On the other hand, if I do it by comparing the bytecode, it falls down in the convoluted example we've seen posted to the forum relatively recently.. (as does the Parallax compiler, but then we want to duplicate the functionality not the undocumented features)
If I checksum the source code, then I catch both these cases provided none of the copies of the object sources have been modified internally, but that relies on the objects being straight copies (or symlinks).
Ideas?
Better way to divest the feline of its pelt?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!
Question.
It appears the Parallax SPIN compiler detects duplicate objects by looking at the generated object code.
As we have seen previously, this can lead to issues where you have 2 bytecode identical objects, but those refer to differing objects.
I thought about comparing the source code or file names, but then we have objects where people have duplicated the source files..
OBJ
a : "object-1"
b : "copy_of_object-1"
c : "copy1_of_object-1"
While this is unnecessary and bletcherous, it's been done more than once.
The Parallax compiler will include the object only once, as its object code is identical.
What do we think might be a good way of redundant object removal?
I was leaning towards doing it based on the file name/path of the object (in fact at the moment I do), but then that leads to unnecessary duplicates like the above example.
On the other hand, if I do it by comparing the bytecode, it falls down in the convoluted example we've seen posted to the forum relatively recently.. (as does the Parallax compiler, but then we want to duplicate the functionality not the undocumented features)
If I checksum the source code, then I catch both these cases provided none of the copies of the object sources have been modified internally, but that relies on the objects being straight copies (or symlinks).
Ideas?
Better way to divest the feline of its pelt?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!
Comments
That's closest to PropTool behaviour and certainly no worse, better because one can force multiple objects to be included by different filename. An issue there is how one deals with filenames which include and don't include a full path but are ultimately referencing the same file in the current directory. I'd flag that up as an error; force the user to use one convention or the other.
Another issue is how to deal with an object filename when in a sub-object and doesn't have a path; should it inherit the path of the object it's included in ? Does it make for a preferred path and fall back to the usual default paths if not found ?
I've simply made facility for an effectively unlimited searchpath (and it's case insensitive on Linux and MacOS to match behaviour on Windows).
You can have as many library search directories as you like, wherever you like, and the search order is deterministic. It will always search the same directory as the top object is in first, then in the order the paths are specified. In addition, if the directory goes away, it won't complain or error out. This means you can add search paths for removable drives and it will only look at them when they are present.
I'll keep the distillation code the way it is then.. filename based. It does kinda make sense, and while some of the stuff in the object exchange won't compile bytecode identical as it uses 2 or 3 copies of the same file they'll all be included), it will not suffer from that problem we saw earlier.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Michael Park
PS, BTW, and FYI:
To search the forum, use search.parallax.com (do not use the Search button).
Check out the Propeller Wiki: propeller.wikispaces.com/
Added : which you reported
Post Edited (hippy) : 9/14/2008 4:04:16 PM GMT
Edit: I should add, though, that homespun does check for a little more than just identical bytecode. I have to run now, but I'll post details later. I'd like you guys' take on what I'm doing.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Michael Park
PS, BTW, and FYI:
To search the forum, use search.parallax.com (do not use the Search button).
Check out the Propeller Wiki: propeller.wikispaces.com/
Post Edited (mpark) : 9/14/2008 4:30:15 PM GMT
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!
In most of these discussions there's an assumption that "identical bytecode" refers only to executable opcodes, but that is incorrect. Chip has said that you can make otherwise identical objects different by giving them different DAT data, so that means that more than just executable bytecode is compared.
My theory: To be considered identical, objects have to be identical in all their bytes:
* object size (or pointer to next object), number of methods, number of objects
* pointer to method 0
* pointer to method 1
* ...
* pointer to object 0, VAR offset
* pointer to object 1, VAR offset
* ...
* DAT bytes
* executable bytcode
If you compare all those bytes, you automatically take care of the case where "2 bytecode identical objects refer to differing objects" -- if they refer to differing objects, they have different pointers to objects, so they are not identical after all!
That's the approach Homespun takes, and it's matched Proptool in all my tests except where Proptool generates erroneous code; in those cases, Homespun generates correct code (well, it looks correct to me). Hmmm, in writing all this, I've just realized that there's a latent bug in Homespun!
/scurries off to fix it
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Michael Park
PS, BTW, and FYI:
To search the forum, use search.parallax.com (do not use the Search button).
Check out the Propeller Wiki: propeller.wikispaces.com/
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Chip Gracey
Parallax, Inc.
Too bad I don't speak x86. Anyone looking at this? Brad?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Michael Park
PS, BTW, and FYI:
To search the forum, use search.parallax.com (do not use the Search button).
Check out the Propeller Wiki: propeller.wikispaces.com/
Yeah. Nothing really obvious pokes out at me, but when I get the chance I'll run your pathological example past it and single step the compiler to see what it's really doing.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!