Nebie: list question, speed
Uwe Hoffmann
nospam at nospam.de
Sun May 6 07:18:04 EDT 2001
Werner Hoch wrote:
>
> Uwe Hoffmann wrote:
> > Werner Hoch wrote:
> > > I wrote a little programm parsing two textfiles together.
> > > The result is a list without uniq entries:
> > >
> > > So i wrote this to check if the entry already exists:
> > > --------
> > > if ergfield.count(ergline) == 0:
> > > ergfield.append(ergline)
> > > ---------
> > > execution time is about 45 seconds
> > >
> > > an then I tried a second statment which is twice as fast as the first one:
> > > ------------
> > > try:
> > > ergfield.index(ergline)
> > > except:
> > > ergfield.append(ergline)
> > > ------------
> > > execution time is about 21 seconds
> > >
> > > I don't like the second solution because it uses the exeption handling like
> > > a if statement!
> > > Are there better ways to do this?
> >
> > not sure if this is what you want but use a dictionary instead
> >
> > earlier:
> > ergDict = {}
> >
> > if not ergDict.has_key(ergLine):
> > ergDict[ergLine] = 1
> > else:
> > ergDict[ergLine] += 1
> >
> >
> > later ergDict.keys() is the same as your ergfield
> > and ergDict.values() (or ergDict.items() with key and value)
> > contains the number of duplicates
>
> Looks great, I will keep it in mind if I need the numbers of duplicates.
> >
> > this is only faster if your files contain many different lines
>
> 1st file has 53000 lines
> 2nd file has 44000 lines
> and the result ergfield has 4500 entries
then the dictionary version is much faster
>
> > > BTW: how can I convert an integer to a string?
> >
> > str(number)
> > or
> > "%i" % (number,)
>
> It's a shame that this is not in my Python book.
see
http://www.python.org/doc/current/lib/typesseq-strings.html
and
http://www.python.org/doc/current/lib/built-in-funcs.html
>
> Thanks
> Werner
> --
> werner.ho at gmx.de
More information about the Python-list
mailing list