List of Numbers
Aahz
aahz at pythoncraft.com
Wed Apr 16 01:42:00 EDT 2003
In article <lhvmm-7c7.ln1 at grendel.myth>,
Jim Richardson <warlock at eskimo.com> wrote:
>On Sat, 12 Apr 2003 21:11:39 GMT,
> Alex Martelli <aleax at aleax.it> wrote:
>> Jim Richardson wrote:
>>> On Sat, 05 Apr 2003 20:13:45 +0100,
>>> Simon Faulkner <news at titanic.co.uk> wrote:
>>>>
>>>> I have a list of about 5000 numbers in a text file - up to 14 digits
>>>> each.
>>>>
>>>> I need to check for duplicates.
>>>>
>>>> What would people suggest as a good method?
>>>
>>> In python, just stuff them all in a dictionary, any repeats, will be
>>> eliminated. But this is rather crude and probably slow. But it would
>>> work.
>>
>> Anything but slow! Python dictionaries are quite fast. But removing
>> duplicates is not the same as 'checking for duplicates' -- Simon
>> might rather want (e.g.) a list of all numbers that WERE in fact
>> duplicate. A script that plays with a Python dict is still no doubt
>> the right solution, but it's hard to write one without more precise
>> specifications regarding what is desired.
>
>yeah, I didn't look at the check for part, I just parsed it as get rid
>of... <sigh> must need a brain upgrade.
>
>I don't know how fast/slow the dict would be to tell the truth, it just
>doesn't seem that "elegant" and elegance, is often (wrongly I know)
>associated with speed.
Python dicts are canonically elegant *and* fast. They're even
moderately efficient with memory usage (thanks to the way Python does
objects and bindings). Don't forget that Python uses dicts internally
for much of its own management; Python dicts are one of the most highly
optimized data structures around.
Note that in Python 2.3 an answer using sets might be better, but sets
are just a thin layer over dicts.
--
Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/
This is Python. We don't care much about theory, except where it intersects
with useful practice. --Aahz, c.l.py, 2/4/2002
More information about the Python-list
mailing list