[Python-ideas] Uniquify attribute for lists
Joshua Landau
joshua.landau.ws at gmail.com
Sat Nov 17 20:37:38 CET 2012
Surely the best choice is two have *two* caches; one for hashables and
another for the rest.
This might be improvable with a *third* chache if some non-hashables had
total ordering, but that could also introduce bugs I think. It'd also be a
lot harder and likely be slower anyway.
Timings (approximate and rough):
"""
Using 16 elements
% Hashable: 100 90 66 33 0
Original: 1.47 1.00 1.06 1.09 1.01
w/ key=str: 3.73 1.91 2.12 3.15 3.00
New: 1.20 1.46 1.81 2.13 3.38
w/ key=str: 1.72 2.00 2.48 2.76 3.01
Using 64 elements
% Hashable: 100 90 66 33 0
Original: 1.15 1.29 1.61 1.64 1.43
w/ key=str: 1.98 2.50 3.09 3.55 3.99
New: 1.00 1.47 2.18 3.01 3.60
w/ key=str: 1.87 2.30 2.79 3.41 3.84
Using 256 elements
% Hashable: 100 90 66 33 0
Original: 2.70 3.66 5.19 5.34 4.41
w/ key=str: 4.06 5.07 6.26 6.93 6.98
New: 1.00 1.65 2.92 5.28 7.62
w/ key=str: 2.28 2.71 3.76 4.36 4.93
Using 1024 elements
% Hashable: 100 90 66 33 0
Original: 9.30 12.4 18.8 21.4 16.9
w/ key=str: 11.1 13.1 16.3 17.5 13.9
New: 1.00 1.84 6.20 13.1 19.8
w/ key=str: 2.31 2.79 3.59 4.50 5.16
Using 4096 elements
% Hashable: 100 90 66 33 0
Original: 33.7 44.3 69.1 79.4 60.5
w/ key=str: 36.7 44.2 59.3 60.1 40.4
New: 1.00 3.73 18.1 42.2 63.7
w/ key=str: 2.23 2.56 3.33 4.19 4.93
Using 16384 elements
% Hashable: 100 90 66 33 0
Original: 126. 173. 265. 313. 243.
w/ key=str: 136. 164. 215. 213. 147.
New: 1.00 12.5 68.6 173. 263.
w/ key=str: 2.24 2.60 3.28 4.14 4.80
"""
--------------
Code attached, unless I forget ;).
No guarantees that it still works the same way, or works at all, or is the
right file.
Every item is repeated 5 times on average for any length of the list being
unique-ified. I'll try it with this changed later.
Basically, the new version is faster on all but ~100% non-hashable lists
when there are more than ~1000 elements, and on more-hashable lists it's
quadratically faster. When slower, it's by only about 10% to 20%. Of
course, specifying whether or not your list is fully-hashable would be more
efficient (probably 10%) but that's not as nice to use.
Additionally, if you use key="" to uniquify by hashable values you are able
to get a good speed up with the new version.
Example of use:
iteruniq(<list of set>, key=frozenset)
With really small non-hashable lists, the original is significantly better
(3x).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20121117/569601d9/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: iteruniq.py
Type: application/octet-stream
Size: 6268 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20121117/569601d9/attachment.obj>
More information about the Python-ideas
mailing list