[python-win32] Identify unique data from sequence array
otrov
dejan.org at gmail.com
Wed Dec 22 18:21:27 CET 2010
> I'm a unix guy. That's what we call a sort-uniq operation, after the
> pipeline we'd use: sort datafile | uniq > uniq-lines.txt. So I google that
> with python and ....
> As Jason Petrone wrote when he withdrew PEP 270 in
> http://www.python.org/dev/peps/pep-0270/:
> "creating a sequence without duplicates is just a matter of
> choosing a different data structure: a set instead of a list."
> At the time, sets.py was a nifty new thing. Since then, the set datatype
> has
> been added to python's base.
> set() can consume a list of tuples, but not a list of lists, like the X you
> showed us. You're job will be getting your massive list of lists into a
> list of tuples.
> This works, but for your very large arrays, may take large time:
> X = [[1,2], [1,2], [3,4], [3,4]]
> Y = set( [tuple(x) for x in X] )
> There may be faster methods. The map() function might help, but I really
> don't know. Here's something to try:
> Y = set( map(tuple, X )
> Or you can go old school route, from before the days of set(), that is:
> http://code.activestate.com/recipes/52560-remove-duplicates-from-a-sequence/
> Best,
> Mike
Thanks for your reply, but perhaps there is misunderstanding:
I don't want unique values, but unique sequence (block) of data that is repeated in array:
A B C D D D A B C D D D A B C D D D
|_________| |_________| |_________|
| | |
unique unique unique
sequence sequence sequence
data data data
I tested your approach and won't say it's slow. It works great but that's not what I'm after. Thanks anyway
Cheers
More information about the python-win32
mailing list