increases sets efficiency by an order of infinity

Alex Martelli aleax at aleax.it
Fri Feb 7 05:57:05 EST 2003


Hunter Peress wrote:
   ...
> I think theres a very lacking feature in the 2.3 implementation of sets.
> 
> You are locked in to value that the set does its "magic" on.
> 
> sets are meant to operate on lists. eg: [1,2,3,4] becomes: S([1,2,3,4])
> 
> next imagine we take a list of less granular objects eg:
> a=sets.Set([[1,'/home/'],[2,'/home/']][3,'/lala']) and

Can't do it -- wrong syntax, but even with the right one:

>>> a=sets.Set([[1,'/home/'],[2,'/home/'],[3,'/lala']])
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/local/lib/python2.3/sets.py", line 383, in __init__
    self._update(iterable)
  File "/usr/local/lib/python2.3/sets.py", line 328, in _update
    data[element] = value
TypeError: list objects are unhashable
>>>

Just like dictionary keys, set items are normally constrained
to be immutables -- with specialcasing to allow set of sets.

So what you have to use are tuples instead of lists:

>>> a=sets.Set([(1,'/home/'),(2,'/home/'),(3,'/lala')])

and instead of:

> b=sets.Set([[1,'/bin/'],[2,'/home/']])

>>> b=sets.Set([(1,'/bin/'),(2,'/home/')])

> suppose we make an intersection on these two to find common items.
> Well...this will result in the nullset.

Nope:

>>> a.intersection(b)
Set([(2, '/home/')])
>>>

Once you CAN build the sets (which you can't for general
mutable items), intersection works just fine.

> This example suggests the limits of the sets.Set

Yes: no support for general mutable items.  Which is
inherited from dicts -- sets only specialcase other sets
because sets of sets are a crucial construct.

The Python-Development mailing list IS discussing the
possibility of generalizing this, these very days --
ALLOW (but not require) a mutable type to expose an
"immutable-type" version of its instances, just like
sets do, in order to allow set membership (and use as
dictionary keys more generally, perhaps).  Assuming
for the sake of argument that this became adopted and
a list returned tuple(self) from the relevant special 
method, then you WOULD be able to build the sets you
mention (once you correct your syntax) and then again
the intersection would work just fine.

However you might have perhaps-surprising effects:

x = [1, 2, 3]
theset = set.Sets([x])
x.append(4)
print x in theset

would print False -- even though mutable object x was
used when building theset, what really happened was
that a "snapshot" of x's value at that time was taken
(via the relevant special method) so what was REALLY
put in the set was (1, 2, 3) instead.  You can already
do that manually, the proposal being discussed would
make it automatic for convenience and speed (speed
appearing when you test for membership, if a type is
able to give a *TEMPORARILY* immutable version of its
instances -- study 2.3's sets.py for an example).


> Cool eh? This lets u infinitely extend the set algorithms that have been
> developing over the last year. This is very pythonic ; its similar to the
> fact that u can set your own sort function for a list.

No you can't.  You can PASS a comparison function to one
given call to somelist.sort, but there is no "setting"
involved.  AND more often than not passing a comparison
function is a bad idea, with the D-S-U idiom better.

One terrible problem with your "cool" proposal is that in
the instant you call your proposed .setkey method, the
membership of the set should change -- dropping some
duplicates in the general case.  Which ones?  Eek.


If you have these weird requirements you're FAR better off
building your own "HunterSet" class (maybe subclassing Set)
and leaving real sets alone -- they don't need either the
complication or the resulting slow-down -- IMHO.


> Maybe this is even enough to make a PEP?

Making a PEP is always a good idea.  Make sure to include
a reference implementation and to measure its performance
in typical cases against that of current sets; and to
include objections (such as mine in this post) as well
as arguments why the new functionality is necessary.


Alex





More information about the Python-list mailing list