Regular expressions

Alex Martelli aleaxit at yahoo.com
Thu Jan 18 04:23:01 EST 2001


"Realware Systems" <derek at realware.com.au> wrote in message
news:MPG.14cf88e8d0c87a6d989681 at news.syd.connect.com.au...
> Hi,
>
> I have been looking at the docs at python.org but I must be missing
> something.
>
> I cannot find an example of how to set up a regular expression.

A regular expression object is returned by applying function
re.compile to a string that represents the regular expression
(you may also call certain functions in the same builtin module
re, with the string representing the regular expression as
their first parameter, and get the same results AS IF you
were building the RE object and then calling its methods; this
is sometimes convenient, but, like all cases in which there
are two ways to achieve a result, may be slightly confusing).


> I would like to know how to
>
> 1. set up an RE

import re
myre = re.compile(r'\d+')

This creates a regular expression object, to which variable
'myre' refers to, which will match any non-empty sequence of
digits.  OK so far?

> 2. apply it to a string (to just perform a match)

mymo = myre.match('39393')

This creates a match-object, to which variable 'mymo' refers
to, which represents the match of the given regular expression
with the given string.  Note:

mymo = myre.match('pop348ze4933plap')

this will now make variable 'mymo' refer to ***None***, as
the .match method implies the match has to start *from the
start of the string*; if you want to _search_ for a match
wherever it may start, use instead the .search method:

mymo = myre.search('pop348ze4933plap')

this will now make variable 'mymo' refer to the substring
'3483' of the given string (leftmost-first matching, and
also 'greedy', i.e., longest substring left-first found).

> 3. apply it to a string to perform a replacement

newstring = myre.sub('DIGITS','z23skidoo42',1)

this will now make variable 'newstring' refer to a string
worth 'zDIGITSskidoo42'.  Note that string objects are NOT
mutable, so the OLD string is not changed by the .sub --
a NEW string is built with the required value.  Note also
that .sub searches like .search -- no implied 'must start
at the beginning'.  Finally, we had to explicitly specify
we wanted ONE replacement (that third parameter worth 1
to the .sub call); by default, i.e. without a 3rd parameter
or with a 3rd parameter worth 0, .sub would substitute
ALL non-overlapping occurrences of the pattern in the
string (which is often what we want -- but you asked for
'a' replacement, which I took as meaning 'one' occurrence).

> 4. how to join RE's

I don't know of a way to join compiled RE objects *directly*,
but you can get back the string representation using the
attribute .pattern of the RE object, join those strings
(e.g. with operator +, if you only want to juxtapose; with
'|'.join, if you want to perform an 'or'; etc), and
compile the result.

Suppose that, as above:

myre = re.compile(r'\d+')

(matches non-empty sequence of digits) and that variable
hisre refers to some other RE object that matches something
else, for example:

hisre = re.compile('[abc]+')

matches non-empty sequence of letters a, b, c.  Now, to get
a re meaning: "whatever myre matches immediately followed
by whatever hisre matches" (here: "non-empty sequence of
digits followed by nonempty sequence of letters a,b,c"),
we do:

jointre = re.compile(myre.pattern+hisre.pattern)


Alex






More information about the Python-list mailing list