[Tutor] parse text for paragraghs/sections

spir denis.spir at free.fr
Mon Apr 20 19:46:31 CEST 2009

Le Mon, 20 Apr 2009 09:07:01 -0700,
"Dinesh B Vadhia" <dineshbvadhia at hotmail.com> s'exprima ainsi:

> t = """abc <pre> DEF ghi jkl </pre> MNO pqr"""
> ... pickup all text between the tags <pre> and </pre> and replace with
> another piece of text.
> I tried 
> t = re.sub(r"\<pre>[A-Za-z0-9]\</pre>", "DBV", t)
> ... but it doesn't work.

You need:
-1- Add ' ' to the character class.
-2- Repete it with '*' or '+'.

from re import compile as Pattern
p = Pattern(r"<pre>[A-Za-z0-9 ]*</pre>")
t = """abc <pre> DEF ghi jkl </pre> MNO pqr"""
print p.sub('@', t)
abc @ MNO pqr


PS: I really wonder why sub() takes (1) the replacement string (2) the source text, as parameters. I always get caught by this (IMO weird) parameter order. And you get no error... only wrong result.
la vita e estrany

More information about the Tutor mailing list