[Tutor] Regex question

"Andrés Chandía" andres at chandia.net
Wed Mar 30 18:49:39 CEST 2011



Thanks Steve, your are, from now on, my guru....

this is the final version, the
good one!

contents = re.sub(r'(<u>|<span style="text-decoration:
underline;">)(l|L|n|N|t|T)(</span>|</u>)', r"\2'" ,contents)


On Wed, March 30, 2011 17:27, Steve Willoughby wrote:
On 30-Mar-11 08:21,
"Andrés Chandía" wrote:
>
>
> Thanks Kushal
and Steve.
> I think it works,a I say "I think" because at the
>
results I got a strange character instead of the letter that should appear
>
>
this is
> my regexp:
>
> contents = re.sub(r'(<u>|<span
style="text-decoration:
>
underline;">)(l|L|n|N|t|T)(</span>|</u>)', '\2\'' ,contents)

Remember that \2 in a string means the ASCII character with the code
002.  You need to
escape this with an extra backslash:
	'\\2\''
Although it would be more convenient
to switch to double quotes to make
the inclusion of the literal single quote easier:
	"\\2'"

How does that work?  As the string is being "built",
the \\ is
interpreted as a literal backslash, so the actual characters in the
string's value end up being:
	\2'
THAT is what is then passed into the sub()
function, where \2 means to
replace the second match.

This can be yet simpler
by using raw strings:
	r"\2'"

Since in raw strings, backslashes do
almost nothing special at all, so
you don't need to double them.

I should have
thought of that when sending my original answer to your
question.  Sorry I overlooked
it.

--steve





_______________________
            andrés
chandía

P No imprima
innecesariamente. ¡Cuide el medio ambiente!




More information about the Tutor mailing list