List of Numbers

Aahz aahz at pythoncraft.com
Wed Apr 16 01:42:00 EDT 2003


In article <lhvmm-7c7.ln1 at grendel.myth>,
Jim Richardson  <warlock at eskimo.com> wrote:
>On Sat, 12 Apr 2003 21:11:39 GMT,
> Alex Martelli <aleax at aleax.it> wrote:
>> Jim Richardson wrote:
>>> On Sat, 05 Apr 2003 20:13:45 +0100,
>>>  Simon Faulkner <news at titanic.co.uk> wrote:
>>>> 
>>>> I have a list of about 5000 numbers in a text file - up to 14 digits
>>>> each.
>>>> 
>>>> I need to check for duplicates.
>>>> 
>>>> What would people suggest as a good method?
>>> 
>>> In python, just stuff them all in a dictionary, any repeats, will be
>>> eliminated. But this is rather crude and probably slow. But it would
>>> work.
>> 
>> Anything but slow!  Python dictionaries are quite fast.  But removing
>> duplicates is not the same as 'checking for duplicates' -- Simon
>> might rather want (e.g.) a list of all numbers that WERE in fact
>> duplicate.  A script that plays with a Python dict is still no doubt
>> the right solution, but it's hard to write one without more precise
>> specifications regarding what is desired.
>
>yeah, I didn't look at the check for part, I just parsed it as get rid
>of... <sigh> must need a brain upgrade. 
>
>I don't know how fast/slow the dict would be to tell the truth, it just
>doesn't seem that "elegant" and elegance, is often (wrongly I know)
>associated with speed.

Python dicts are canonically elegant *and* fast.  They're even
moderately efficient with memory usage (thanks to the way Python does
objects and bindings).  Don't forget that Python uses dicts internally
for much of its own management; Python dicts are one of the most highly
optimized data structures around.

Note that in Python 2.3 an answer using sets might be better, but sets
are just a thin layer over dicts.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

This is Python.  We don't care much about theory, except where it intersects 
with useful practice.  --Aahz, c.l.py, 2/4/2002




More information about the Python-list mailing list