speed problems

Thu Jun 3 11:04:19 EDT 2004

>     First off you're using exernal programs here for decompression.  This
is a
> trade off of making a system call vs internal implementation.  Maybe
Python's
> implementation is slower?  I don't know, just pointing out that it is a
> difference.  Personally when programming tools like this I try to keep
> everything internal because I've had endless system calls kill the
run-time.
> However with the few files you're iterating over the cost might be the
other
> way 'round.  :)
>

I'll be looping over these files only, but I thought using python's gzip
module would be faster then spawning gzip itself the way I did in the perl
script.

> Python:
> >     for line in lf.readlines():
> >       if string.count( line, "INFECTED" ):
> >         vname = re.compile( "INFECTED \((.*)\)" ).search(
line ).group(1)
>
>     If I read this correctly you're compiling this regex every time you're
> going through the for loop.  So every line the regex is compiled again.
You
> might want to compile the regex outside the loop and only use the compiled
> version inside the loop.
>

Well, only for lines containing 'INFECTED' then. Good point. (I suddenly
remember some c stuff in which it made a huge difference) I've placed it
outside the loop now, but the times are still the same.

Another difference might be while( <filehandle>) and line in lf.readlines().
The latter reads the whole file to memory if I'm not mistaken as the former
will read the file line by line. Why that could make such a difference I
don't know.

Thanks for your quick reply,
Kind regards,

Axel