prefix search on a large file

js ebgssth at gmail.com
Fri Oct 13 15:06:45 CEST 2006


I did the test the way you suggested. It took not so long to realize
stupid mistakes I made. Thank you.

The following is the result of test with timeit(number=10000)
using fresh copy of the list  for every iteration

0.331462860107
0.19401717186
0.186257839203
0.0762069225311

I tried my recursive-function to fix up my big-messed-list.
It stops immediately because of 'RuntimeError: maximum recursion limit exceeded'

I hope this trial-and-errors getting me good place...

anyway, thank you.

On 10/13/06, Peter Otten <__peter__ at web.de> wrote:
> js  wrote:
>
> > By eliminating list cloning, my function got much faster than before.
> > I really appreciate you, John.
> >
> > def prefixdel_recursively2(alist):
> >     if len(alist) < 2:
> > return alist
> >
> >     first = alist.pop(0)
> >     unneeded = [no for no, line in enumerate(alist) if
> >     line.startswith(first)] adjust=0
> >     for i in unneeded:
> >         del alist[i+adjust]
> > adjust -= 1
> >
> >     return [first] + prefixdel_recursively(alist)
> >
> >
> > process stime
> > prefixdel_stupidly         : 11.9247150421
> > prefixdel_recursively   : 14.6975700855
> > prefixdel_recursively2 : 0.408113956451
> > prefixdel_by_john        : 7.60227012634
>
> Those are suspicious results. Time it again with number=1, or a fresh copy
> of the data for every iteration.
>
> I also have my doubts whether sorting by length is a good idea. To take it
> to the extreme: what if your data file contains an empty line?
>
> Peter
> --
> http://mail.python.org/mailman/listinfo/python-list
>



More information about the Python-list mailing list