Python too slow for real world

Arne Mueller a.mueller at
Fri Apr 23 08:34:46 EDT 1999

Hi All,

first off all: Sorry for that slightly provoking subject ;-) ...

I just switched from perl to python because I think python makes live
easyer in bigger software projects. However I found out that perl is
more then 10 times faster then python in solving the following probelm:

I've got a file (130 MB) with ~ 300000 datasets of the form:

>px0034 hypothetical protein or whatever description

The word floowing the '>' is an identifier, the uppercase letters in the
lines following the identifier are the data. Now I want to read and
write the contens of that file excluding some entries (given by a
dictionary with identifiers, e.g. 'px0034').

The following python code does the job:

from re import *
from sys import *

def read_write(i, o, exclude):
    name = compile('^>(\S+)') # regex to fetch the identifier
    l = i.readline()
    while l:
        if l[0] == '>': # are we in new dataset?
            m =
            if m and exclude.has_key( # excluding current
                l = i.readline()
                while l and l[0] != '>':  # skip this dataset
                    l = i.readline()
        l = i.readline()

f = open('my_very_big_data_file','r') # datafile with ~300000 records
read_write(f, stdout, {}) # for a simple test I don't exclude anything!

It took 503.90 sec on a SGI Power Challange (R10000 CPU). An appropiate
perl script does the same job in 32 sec (Same method, same loop

Since I've to call this routine about 1500 times it's a very big
difference in time and not realy accaptable.

I'd realy like to know why python is so slow (or perl is so fast?) and
what I can do to improove speed of that routine.

I don't want to switch back to perl - but honestly, is python the right
language to process souch huge amount of data?

If you want to generate a test set you could use the following lines to
print 10000 datasets to stdout:

for i in xrange(1, 10001):

And if you don't believe me that perl does the job quicker you can try
the perl code below:

#!/usr/local/bin/perl -w
my %ex = ();

sub read_write{
  $l = <IN>;
 OUTER: while( defined $l ){
    if( (($x) = $l =~ /^>(\S+)/) ){
      if( exists $ex{$x} ){
	$l = <IN>;
	while( defined $l && !($l =~ /^>(\S+)/) ){
	  $l = <IN>;
	next OUTER;
    print $l;
    $l = <IN>;

Please do convince me being a python programmer does not mean being slow

	Thanks very much for any help,


More information about the Python-list mailing list