Shop OBEX P1 Docs P2 Docs Learn Events
Object duplicate detection — Parallax Forums

Object duplicate detection

BradCBradC Posts: 2,601
edited 2008-09-17 15:19 in Propeller 1
G'day all,

Question.

It appears the Parallax SPIN compiler detects duplicate objects by looking at the generated object code.
As we have seen previously, this can lead to issues where you have 2 bytecode identical objects, but those refer to differing objects.
I thought about comparing the source code or file names, but then we have objects where people have duplicated the source files..

OBJ
a : "object-1"
b : "copy_of_object-1"
c : "copy1_of_object-1"

While this is unnecessary and bletcherous, it's been done more than once.

The Parallax compiler will include the object only once, as its object code is identical.

What do we think might be a good way of redundant object removal?
I was leaning towards doing it based on the file name/path of the object (in fact at the moment I do), but then that leads to unnecessary duplicates like the above example.
On the other hand, if I do it by comparing the bytecode, it falls down in the convoluted example we've seen posted to the forum relatively recently.. (as does the Parallax compiler, but then we want to duplicate the functionality not the undocumented features)

If I checksum the source code, then I catch both these cases provided none of the copies of the object sources have been modified internally, but that relies on the objects being straight copies (or symlinks).

Ideas?
Better way to divest the feline of its pelt?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Pull my finger!

Comments

  • hippyhippy Posts: 1,981
    edited 2008-09-14 15:17
    To me there seems no logical reason to want to use two or more objects which are the same but with different filenames so I'd say if the bytecode matches and the filenames are the same then reduce those to a single object otherwise include as separate objects.

    That's closest to PropTool behaviour and certainly no worse, better because one can force multiple objects to be included by different filename. An issue there is how one deals with filenames which include and don't include a full path but are ultimately referencing the same file in the current directory. I'd flag that up as an error; force the user to use one convention or the other.

    Another issue is how to deal with an object filename when in a sub-object and doesn't have a path; should it inherit the path of the object it's included in ? Does it make for a preferred path and fall back to the usual default paths if not found ?
  • BradCBradC Posts: 2,601
    edited 2008-09-14 15:30
    Oh, I've decided against implementing that "full pathname" incompatibility. All objects must be referenced in the same way as the Parallax compiler does it. "Name", no path, no extension. This keeps it both simple, easy and fully compatible.

    I've simply made facility for an effectively unlimited searchpath (and it's case insensitive on Linux and MacOS to match behaviour on Windows).

    You can have as many library search directories as you like, wherever you like, and the search order is deterministic. It will always search the same directory as the top object is in first, then in the order the paths are specified. In addition, if the directory goes away, it won't complain or error out. This means you can add search paths for removable drives and it will only look at them when they are present.

    I'll keep the distillation code the way it is then.. filename based. It does kinda make sense, and while some of the stuff in the object exchange won't compile bytecode identical as it uses 2 or 3 copies of the same file they'll all be included), it will not suffer from that problem we saw earlier.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Pull my finger!
  • mparkmpark Posts: 1,305
    edited 2008-09-14 15:34
    Wait, which problem?

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Michael Park

    PS, BTW, and FYI:
    To search the forum, use search.parallax.com (do not use the Search button).
    Check out the Propeller Wiki: propeller.wikispaces.com/
  • hippyhippy Posts: 1,981
    edited 2008-09-14 15:58
    @ Michael : http://forums.parallax.com/showthread.php?p=741739

    Added : which you reported smile.gif

    Post Edited (hippy) : 9/14/2008 4:04:16 PM GMT
  • mparkmpark Posts: 1,305
    edited 2008-09-14 16:23
    I thought the take-home from that thread was that proptool was buggy, not that there's anything wrong with duplicate object elimination itself. Last time I looked, homespun generates "correct" code (e.g., for the first example, I get "1234 666").

    Edit: I should add, though, that homespun does check for a little more than just identical bytecode. I have to run now, but I'll post details later. I'd like you guys' take on what I'm doing.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Michael Park

    PS, BTW, and FYI:
    To search the forum, use search.parallax.com (do not use the Search button).
    Check out the Propeller Wiki: propeller.wikispaces.com/

    Post Edited (mpark) : 9/14/2008 4:30:15 PM GMT
  • BradCBradC Posts: 2,601
    edited 2008-09-14 16:39
    That was my point.. duplicate object elimination reduction as proptool does it has that inherent problem that you first highlighted. I was trying to stimulate some discussion on the best ways to mitigate the problem while maintaining the best level of compatibility.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Pull my finger!
  • mparkmpark Posts: 1,305
    edited 2008-09-15 07:57
    Well, my point is that duplicate object elimination as Proptool is supposed to do it does not have that inherent problem; that is, I'm pretty sure that the problems found in that other thread are caused by bugs in Proptool's implementation of an otherwise sound algorithm. The fact that hippy had so much trouble finding other problem cases indicates that Proptool does do the right thing most of the time.

    In most of these discussions there's an assumption that "identical bytecode" refers only to executable opcodes, but that is incorrect. Chip has said that you can make otherwise identical objects different by giving them different DAT data, so that means that more than just executable bytecode is compared.

    My theory: To be considered identical, objects have to be identical in all their bytes:

    * object size (or pointer to next object), number of methods, number of objects
    * pointer to method 0
    * pointer to method 1
    * ...
    * pointer to object 0, VAR offset
    * pointer to object 1, VAR offset
    * ...
    * DAT bytes
    * executable bytcode

    If you compare all those bytes, you automatically take care of the case where "2 bytecode identical objects refer to differing objects" -- if they refer to differing objects, they have different pointers to objects, so they are not identical after all!

    That's the approach Homespun takes, and it's matched Proptool in all my tests except where Proptool generates erroneous code; in those cases, Homespun generates correct code (well, it looks correct to me). Hmmm, in writing all this, I've just realized that there's a latent bug in Homespun!

    /scurries off to fix it

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Michael Park

    PS, BTW, and FYI:
    To search the forum, use search.parallax.com (do not use the Search button).
    Check out the Propeller Wiki: propeller.wikispaces.com/
  • cgraceycgracey Posts: 14,140
    edited 2008-09-15 09:02
    I'm sorry I've been late in addressing this compiler problem. I have not made time to get into this yet, but I thought I'd put the code here for anyone that wants to take a look at it. I'm quite sure that a few of you out there could figure out what's wrong. The distiller routine is·entirely self-contained and written in 80386 code. Here it is, all 735 bytes' worth:

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔


    Chip Gracey
    Parallax, Inc.
  • mparkmpark Posts: 1,305
    edited 2008-09-17 14:41
    Thanks, Chip! We're going to cajole the compiler source out of you one piece at a time.

    Too bad I don't speak x86. Anyone looking at this? Brad?

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Michael Park

    PS, BTW, and FYI:
    To search the forum, use search.parallax.com (do not use the Search button).
    Check out the Propeller Wiki: propeller.wikispaces.com/
  • BradCBradC Posts: 2,601
    edited 2008-09-17 15:19
    mpark said...
    Thanks, Chip! We're going to cajole the compiler source out of you one piece at a time.

    Too bad I don't speak x86. Anyone looking at this? Brad?

    Yeah. Nothing really obvious pokes out at me, but when I get the chance I'll run your pathological example past it and single step the compiler to see what it's really doing.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Pull my finger!
Sign In or Register to comment.