Nebie: list question, speed

Remco Gerlich scarblac at pino.selwerd.nl
Sun May 6 06:45:41 EDT 2001


Werner Hoch <werner.ho at gmx.de> wrote in comp.lang.python:
> > not sure if this is what you want but use a dictionary instead
> > 
> > earlier:
> > ergDict = {}
> > 
> > if not ergDict.has_key(ergLine):
> > 	ergDict[ergLine] = 1
> > else:
> > 	ergDict[ergLine] += 1
> > 
> > 
> > later ergDict.keys() is the same as your ergfield
> > and ergDict.values() (or ergDict.items() with key and value) 
> > contains the number of duplicates
> 
> Looks great, I will keep it in mind if I need the numbers of duplicates.

Even if you ignore the counting and make it

ergDict = {}
for ergLine in lines:
   ergDict[ergline] = 1

uniq = ergDict.keys()

It will be *much* faster than the list based solutions.

> > this is only faster if your files contain many different lines
> 
> 1st file has 53000 lines
> 2nd file has 44000 lines
> and the result ergfield has 4500 entries

It'll be way faster.

-- 
Remco Gerlich



More information about the Python-list mailing list