increases sets efficiency by an order of infinity

Hunter Peress hu.peress at mail.mcgill.ca
Fri Feb 7 07:05:31 EST 2003


Heh, alex. All of my code was completely written as if the proposals were
in existance, so no wonder u were bombing the code from the first half of
my post.


> However you might have perhaps-surprising effects:
> 
> x = [1, 2, 3]
> theset = set.Sets([x])
> x.append(4)
> print x in theset
> 
> would print False -- even though mutable object x was used when building
> theset, what really happened was that a "snapshot" of x's value at that
> time was taken (via the relevant special method) so what was REALLY put
> in the set was (1, 2, 3) instead.  You can already do that manually, the
> proposal being discussed would make it automatic for convenience and
> speed (speed appearing when you test for membership, if a type is able
> to give a *TEMPORARILY* immutable version of its instances -- study
> 2.3's sets.py for an example).

Yes, i did realize this. At the bottom where i rally off some of the
technical points that my abstract reasoning required. I said:

"-at runtime [of a set operation], its probably best to simply run setkey
to obtain the key that Sets will use to evaluate every time any set method
is called. ; however, for ImmutableSet, a more efficient technique used
such that only on an object's entry to the Set is the setkey function
needed."



>> Cool eh? This lets u infinitely extend the set algorithms that have
>> been developing over the last year. This is very pythonic ; its similar
>> to the fact that u can set your own sort function for a list.
> 
> No you can't.  You can PASS a comparison function to one given call to
> somelist.sort, but there is no "setting" involved.  AND more often than
> not passing a comparison function is a bad idea, with the D-S-U idiom
> better.
> 
> One terrible problem with your "cool" proposal is that in the instant
> you call your proposed .setkey method, the membership of the set should
> change -- dropping some duplicates in the general case.  Which ones?
> Eek.


This is an excellent point that I didnt even see. Im very gracious for
this input :-). The quick and unthought out reaction to this, is to supply
yet another function that deals with collisions. I think that makes things
airtight.

Let me explain my impetus behind this whole thread:

The set operations are such a basic thing. And i'd like them to be more
easily extendable. After some discussion to expose most of the main
problems (like we are doing now), maybe a clean solution will present
itself. I think my latest reply (with a 2nd collision function) is not too
bad.

Anyway for entertainment, heres the original problem that led me on to
this thought experiment:


I have 2 file hierarchies. the source for libpcap, and Wpcap (for
windows).
As a rough estimate, I was curious on comparing how many files of same
name existed between the two. Then, with this intersection of names, I
wanted run difflib.SequenceMatcher.ratio on these files.

as this is a naturally occuring problem complexities are inevitable: some
files of same names in the 2 hierarchies exist at different levels and
hence, they have different pathnames. So, you need the pathname to be able
to actually access the file (and run the diff operation), but u need the
basename for the purposes of the intersection operation. No matter how u
try to hack it up, using references or whatever, supplying extra
information to the current implementation of Set defeats my purposes. This
extra information would be the full path name. To rant just bit, I think
that this is just so useless. In many different scenarios, theres always
plenty of data that I want to get tagged along on the set operations. Hmm,
would the ZODB support this? (and yes, i realize the more comprehensive estimate
of this would be to make a 2d array, and run ratio for every grid...., but
I dont even think that is a good way, as Im valuing the filenames here)....


An addendum to the collision idea: a dict gives u a key, and a value to
work with. A dict is *somewhat* similar to a union operation of a set,
except that a set is 1-1 (where 1 is the amount of inputs) and a dict is
1-many. Yes...also the notion of time is discreet for the dict (as we're
trying to compare it to a set), whereas it isnt for the set.intersect set.


anyway. im just babbling. im also not trying to compel, or coerce. Thanks
for the input :-)




More information about the Python-list mailing list