increases sets efficiency by an order of infinity

Alex Martelli aleax at aleax.it
Fri Feb 7 17:37:33 CET 2003


Hunter Peress wrote:
   ...
> is called. ; however, for ImmutableSet, a more efficient technique used
> such that only on an object's entry to the Set is the setkey function
> needed."

You're proposing to make ImmutableSet's MUTABLE by giving them
that setkey method too?  That would defeat the very purpose of
ImmutableSet's existence.  More and more this convinces me that
what you want is very different from sets.

>> One terrible problem with your "cool" proposal is that in the instant
>> you call your proposed .setkey method, the membership of the set should
>> change -- dropping some duplicates in the general case.  Which ones?
>> Eek.
> 
> This is an excellent point that I didnt even see. Im very gracious for
> this input :-). The quick and unthought out reaction to this, is to supply
> yet another function that deals with collisions. I think that makes things
> airtight.

I think it makes things even more complicated.  I think the
standard library should not supply such complicated ways to
duplicate functionality that is already present and well
supported in the language.


> Let me explain my impetus behind this whole thread:
> 
> The set operations are such a basic thing. 

That's what GOOD about them.  They give a good match of a very
basic methematical concept.

> And i'd like them to be more
> easily extendable. 

That is what inheritance is for.


> I have 2 file hierarchies. the source for libpcap, and Wpcap (for
> windows).
> As a rough estimate, I was curious on comparing how many files of same
> name existed between the two. Then, with this intersection of names, I
> wanted run difflib.SequenceMatcher.ratio on these files.
> 
> as this is a naturally occuring problem complexities are inevitable: some
> files of same names in the 2 hierarchies exist at different levels and
> hence, they have different pathnames. So, you need the pathname to be able
> to actually access the file (and run the diff operation), but u need the
> basename for the purposes of the intersection operation. No matter how u

Therefore you conceptually need two simple data structures (which
in practice you might choose to merge into one)

1. a set of base names,
2. a mapping from base names to path names

Since sets are actually implemented by dictionaries (mappings), in
practice you may well choose to use [2] to also stand for [1], of
course.  This way you lose the intersection "primitive", but, given
two dictionaries d1 and d2, enumerating the keys that are in both is
really so trivial in Python -- [x for x in d1 if x in d2] is all
it takes -- so, who cares, pragmatically speaking.

> try to hack it up, using references or whatever, supplying extra
> information to the current implementation of Set defeats my purposes. This
> extra information would be the full path name. To rant just bit, I think
> that this is just so useless. In many different scenarios, theres always
> plenty of data that I want to get tagged along on the set operations. Hmm,

Then use dictionaries, in addition to or instead of sets.

Sets are sets.  A key, fundamental, elementary concept.  I do not
like the idea of making them more complicated, and taking them
farther away from their mathematical models than is necessary,
when Python already offers mappings and similar ideas.

> An addendum to the collision idea: a dict gives u a key, and a value to
> work with. A dict is *somewhat* similar to a union operation of a set,
> except that a set is 1-1 (where 1 is the amount of inputs) and a dict is
> 1-many. Yes...also the notion of time is discreet for the dict (as we're
> trying to compare it to a set), whereas it isnt for the set.intersect set.

???  A set is implemented as "all keys in this here dict", except for
the slight trick with immutability needed to allow sets of sets.  Time,
unions, and 1-1 vs 1-many seem rather different/unrelated concepts.

You can think of a mapping (e.g. a dict) as a set with arbitrary
extra info associated to each member of the set.  This not only
reflects the implementation but is conceptually sound too.


Alex





More information about the Python-list mailing list