[Tutor] parse text for paragraghs/sections

Mon Apr 20 19:46:31 CEST 2009

Le Mon, 20 Apr 2009 09:07:01 -0700,
"Dinesh B Vadhia" <dineshbvadhia at hotmail.com> s'exprima ainsi:

> t = """abc <pre> DEF ghi jkl </pre> MNO pqr"""
> 
> ... pickup all text between the tags <pre> and </pre> and replace with
> another piece of text.
> 
> I tried 
> 
> t = re.sub(r"\<pre>[A-Za-z0-9]\</pre>", "DBV", t)
> 
> ... but it doesn't work.

You need:
-1- Add ' ' to the character class.
-2- Repete it with '*' or '+'.

from re import compile as Pattern
p = Pattern(r"<pre>[A-Za-z0-9 ]*</pre>")
t = """abc <pre> DEF ghi jkl </pre> MNO pqr"""
print p.sub('@', t)
==>
abc @ MNO pqr

Denis

PS: I really wonder why sub() takes (1) the replacement string (2) the source text, as parameters. I always get caught by this (IMO weird) parameter order. And you get no error... only wrong result.
------
la vita e estrany