Regular expressions in Python

Andreas Jung andreas at
Sun Sep 3 15:54:11 CEST 2000

johnvert at wrote:
2qyI have a few questions regarding the usage of regular expressions in
2qy1)  In Perl, I can do something like
2qy    if (/START(.+?)END/) {
2qy      use $1 here (value caught in (.+?))
2qy    }
2qy    what is the equivalent of Perl's $1, $2, ... in Python.

In general take a look the documenation of the module "re".
In Python you could try something link:

import re
lst = re.findall('START(.*?)END',string)
for l in lst: print l

2qy2)  This question is not directly related to regular expressions,
2qybut        more to parsing text in Python in general:
2qy    I want to capture stuff between START and END, like in the
2qyabove         regular expression, but START, the stuff in the middle,
2qyand END are      not necessarily on the same line.  The only way I can
2qythink of is to     read the whole file into memory as a string, and
2qyoperate on that         string, or read it with readlines() and join()
2qythose to a string.        Both of these approaches would be slow because
2qythe file would be         read in one slurp.  Is there a way to handle
2qythis `multiple line'        parsing in a way that I can read the file
2qyline by line, as in:
2qy    while 1:
2qy      line = file.readline()
2qy      # parse

Why would it be to slow ? We read large files (up to 50MB) either
line by line with readline() or in just one read() call. This
is not neccessarily slower than a program in C.


More information about the Python-list mailing list