readline()

Andrew M. Kuchling akuchlin at mems-exchange.org
Wed Mar 22 16:13:51 EST 2000


Dana Booth <dana at mmi.oz.net> writes:
> Anyway, I'm pretty new to Python, is there a better way to analyze
> textfiles? Or is the re.search slowing it down?

Perl gains some of its file-reading speed by breaking C's FILE *
abstraction and looking at the internals of that type; that's a
non-starter for Python, because no one wants to do something that ugly.  

The re.search() might also be a factor, depending on the pattern
you're using.  If your task can be accomplished by using the string
module, or by using normal string operations, you're probably better
off using them.  For example, if your pattern is just '^==separator',
you're not using any features of regular expressions at all, and it
would be faster to do something like 'if line[:<whatever>] ==
'==separator'.  
	
Also, nested regex repetitions can take exponential time to fail,
which may slow things down noticeably in line-by-line mode, but not so
much that you realize that something's wrong.  (I once did that; the
program took several hours to run on 400Mb of data, and I just swore
at it.  I missed this until I tried doing searches over 1K chunks of
data, at which point a single search took minutes to complete, and the
problem became glaringly obvious.)

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
Most hearts of any quality are broken on two or three occasions in a lifetime.
They mend, of course, and are often stronger than before, but something of the
essence of life is lost at every break.
    -- Robertson Davies, _Leaven of Malice_





More information about the Python-list mailing list