Speed problems with Python vs. Perl

Christopher A. Craig ccraig at ccraig.org
Wed Mar 28 18:08:25 CEST 2001

"Fredrik Lundh" <fredrik at pythonware.com> writes:

> assuming you're using Python 2.1, the following version is
> about 16 times faster on my box:
> import sys
> def main():
>     icount = 0
>     for line in sys.stdin.xreadlines():
>         icount += 1
>         f = line.split()
>     print "Total lines read", icount
> if __name__ == '__main__':
>     main()

You can't necessarily blame this on the line I/O performance because
you used line.split() where he used the split method on a regular
expression object.  Testing just the differences in those two I show a
10 fold improvement on line.split().  

In this case that is valid since you are splitting on whitespace in
both Perl and Python, but it would be interesting to see if Python
still holds up under more complex regular expressions.  

I went ahead and used

--- Python Script ---

import re
import sys

r = re.compile(":+")

def main():
    icount = 0
    while 1:
        lines = sys.stdin.readlines(50000)
        if not lines:
        for line in lines:
                icount += 1
                f = r.split(line)

if __name__=="__main__": main()

--- End Python Script ---


--- Perl Script ---
$icount = 0;

while(<>) {
  @f = split(/:+/);

--- End Perl Script ---

with Python 2.0 and Perl 5.6.0 on a Celeron 300A using a 242000 line
input file and I got the following results:

python: 61.10s user 0.42s system 87% cpu 1:10.37 total
perl:   11.31s user 0.10s system 76% cpu 14.960 total

admittedly Python is being hurt by the slower 2.0 line I/O, but I
would guess that regular expressions are still hurting performance.

Christopher A. Craig <ccraig at ccraig.org>
"Parity is for farmers." Seymour Cray on his machines lack of parity
"I guess farmers buy a lot of computers." Seymour Cray on including parity

More information about the Python-list mailing list