My Big Dict.

Christian Tismer tismer at tismer.com
Sat Jul 5 03:09:39 CEST 2003


Paul Simmonds wrote:

[some alternative implementations]

> I've done some timings on the functions above, here are the results:
> 
> Python2.2.1, 200000 line file(all data lines)
> try/except with split:   3.08s
>     if     with slicing: 2.32s
> try/except with slicing: 2.34s
> 
> So slicing seems quicker than split, and using if instead of
> try/except appears to speed it up a little more. I don't know how much
> faster the current version of the interpreter would be, but I doubt
> the ranking would change much.

Interesting. I doubt that split() itself is slow, instead
I believe that the pure fact that you are calling a function
instead of using a syntactic construct makes things slower,
since method lookup is not so cheap. Unfortunately, split()
cannot be cached into a local variable, since it is obtained
as a new method of the line, all the time. On the other hand,
the same holds for the find method...

Well, I wrote a test program and figured out, that the test
results were very dependant from the order of calling the
functions! This means, the results are not independent,
probably due to the memory usage.
Here some results on Win32, testing repeatedly...

D:\slpdev\src\2.2\src\PCbuild>python -i \python22\py\testlines.py
 >>> test()
function test_index for 200000 lines took 1.064 seconds.
function test_find  for 200000 lines took 1.402 seconds.
function test_split for 200000 lines took 1.560 seconds.
 >>> test()
function test_index for 200000 lines took 1.395 seconds.
function test_find  for 200000 lines took 1.502 seconds.
function test_split for 200000 lines took 1.888 seconds.
 >>> test()
function test_index for 200000 lines took 1.416 seconds.
function test_find  for 200000 lines took 1.655 seconds.
function test_split for 200000 lines took 1.755 seconds.
 >>>

For that reason, I added a command line mode for testing
single functions, with these results:

D:\slpdev\src\2.2\src\PCbuild>python \python22\py\testlines.py index
function test_index for 200000 lines took 1.056 seconds.

D:\slpdev\src\2.2\src\PCbuild>python \python22\py\testlines.py find
function test_find  for 200000 lines took 1.092 seconds.

D:\slpdev\src\2.2\src\PCbuild>python \python22\py\testlines.py split
function test_split for 200000 lines took 1.255 seconds.

The results look much more reasonable; the index thing still
seems to be optimum.

Then I added another test, using an unbound str.index function,
which was again a bit faster.
Finally, I moved the try..except clause out of the game, by
using an explicit, restartable iterator, see the attached program.

D:\slpdev\src\2.2\src\PCbuild>python \python22\py\testlines.py index3
function test_index3 for 200000 lines took 0.997 seconds.

As a side result, split seems to be unnecessarily slow.

cheers - chris
-- 
Christian Tismer             :^)   <mailto:tismer at tismer.com>
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  pager +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?   http://www.stackless.com/

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: testlines.py
URL: <http://mail.python.org/pipermail/python-list/attachments/20030705/56e834ce/attachment.ksh>


More information about the Python-list mailing list