[Tutor] Regex question

Steve Willoughby steve at alchemy.com
Wed Mar 30 17:27:25 CEST 2011


On 30-Mar-11 08:21, "Andrés Chandía" wrote:
>
>
> Thanks Kushal and Steve.
> I think it works,a I say "I think" because at the
> results I got a strange character instead of the letter that should appear
>
> this is
> my regexp:
>
> contents = re.sub(r'(<u>|<span style="text-decoration:
> underline;">)(l|L|n|N|t|T)(</span>|</u>)', '\2\'' ,contents)

Remember that \2 in a string means the ASCII character with the code 
002.  You need to escape this with an extra backslash:
	'\\2\''
Although it would be more convenient to switch to double quotes to make 
the inclusion of the literal single quote easier:
	"\\2'"

How does that work?  As the string is being "built", the \\ is 
interpreted as a literal backslash, so the actual characters in the 
string's value end up being:
	\2'
THAT is what is then passed into the sub() function, where \2 means to 
replace the second match.

This can be yet simpler by using raw strings:
	r"\2'"

Since in raw strings, backslashes do almost nothing special at all, so 
you don't need to double them.

I should have thought of that when sending my original answer to your 
question.  Sorry I overlooked it.

--steve


-- 
Steve Willoughby / steve at alchemy.com
"A ship in harbor is safe, but that is not what ships are built for."
PGP Fingerprint 48A3 2621 E72C 31D9 2928 2E8F 6506 DB29 54F7 0F53


More information about the Tutor mailing list