[Tutor] Regex question

"Andrés Chandía" andres at chandia.net
Wed Mar 30 17:21:02 CEST 2011

Thanks Kushal and Steve.
I think it works,a I say "I think" because at the
results I got a strange character instead of the letter that should appear

this is
my regexp:

contents = re.sub(r'(<u>|<span style="text-decoration:
underline;">)(l|L|n|N|t|T)(</span>|</u>)', '\2\'' ,contents)

this is my input file content:
<span style="text-decoration:

this is
my output file content


at to head
of the file I got:
#!/usr/bin/env python
# -*- coding: utf-8 -*-

I tried
changing the coding to iso-8859-15, but nothing, for sure you know the reason for this, can
you share it with this poor newbee"

Thanks a lot!!

On Wed, March 30, 2011 09:46, Kushal Kumaran wrote:
2011/3/30 "Andrés
Chandía" <andres at chandia.net>:
> I'm new to
this list, so hello everybody!.

Hello Andrés

The stuff:
> I'm working with
> regexps and this is my line:
> contents = re.sub("<u>l<\/u>",
> "le"
> in perl there is a way to reference previous registers,
> i.e.
> $text =~ s/<u>(l|L|n|N)<\/u>/$1e/g;
> So I'm looking for
> the way to do it in python, obviously this does not
> contents =
re.sub("<u>(l|L|n|N)<\/u>", "$1e", contents)

You will use \1 for the backreference.  The documentation of the re
(http://docs.python.org/library/re.html#re.sub) has an example.
 Also note the use of raw
strings (r'...') to avoid having to escape
the backslash with another backslash.


P No imprima
innecesariamente. ¡Cuide el medio ambiente!

More information about the Tutor mailing list