[Tutor] Regexes -- \b ; re.sub

Magnus Lycka magnus@thinkware.se
Sat Nov 16 13:22:02 2002


At 15:09 2002-11-16 +0630, John Abbe wrote:
>matches = re.search (re.compile ("(.)\b(.)"), "spam and eggs are yummy")

I've never seen anyone send a regular expression object to re.search
like that before. I didn't even know it was possible. The manual
doesn't say that it's permitted as far as I understand. re.search
takes a *pattern* as it's first parameter. A pattern is a string.
The result of compiling a pattern with re.compile is called a
regular expression object. Calling functions with undocumented types
like this might break in a future release. (It's happened before.)

You can either write:

pattern = re.compile(r"(.)\b(.)")
matches = pattern.search("spam and eggs are yummy")

or shorter

matches = re.compile(r"(.)\b(.)").search("spam and eggs are yummy")

or use the re.search function with an (uncompiled) pattern.

matches = re.search(r"(.)\b(.)", "spam and eggs are yummy")

(Note the r"" for raw string. \b in a string means backspace.
The alternative to a raw string is to write "(.)\\b(.)")

Then you get

 >>> print matches.groups()
('m', ' ')

which is correct.

I'm not sure why you called your match object "matches". If you want
to find all non-overlapping <something><word boundry><something> you
could use re.findall(pattern, text) or re.compile(pattern).findall(text)

>print re.sub (re.compile ("(i)"), "o\1o", "i like eggs and spam")
>
>...prints out "oo looke eggs and spam" -- shouldn't it be "oio loioke eggs 
>and spam", after all this...

Actually, it doesn't... But you need raw strings again.

 >>> print re.sub (re.compile ("(i)"), "o\1o", "i like eggs and spam")
oo looke eggs and spam

There are actually \1 = chr(1) = ASCII value SOH between the "o"s.
Maybe they don't show up in your environment.

 >>> print re.sub(re.compile ("(i)"), r"o\1o", "i like eggs and spam")
oio loioke eggs and spam
 >>> print re.sub("(i)", r"o\1o", "i like eggs and spam")
oio loioke eggs and spam
 >>> iPat = re.compile("(i)")
 >>> print iPat.sub(r"o\1o", "i like eggs and spam")
oio loioke eggs and spam

And please don't put a space between the function name and the
leading parenthesis in you function calls. It's a good thing
to follow the official Python style guide:
http://www.python.org/peps/pep-0008.html


-- 
Magnus Lycka, Thinkware AB
Alvans vag 99, SE-907 50 UMEA, SWEDEN
phone: int+46 70 582 80 65, fax: int+46 70 612 80 65
http://www.thinkware.se/  mailto:magnus@thinkware.se