[Tutor] Regex question
"Andrés Chandía"
andres at chandia.net
Wed Mar 30 17:21:02 CEST 2011
Thanks Kushal and Steve.
I think it works,a I say "I think" because at the
results I got a strange character instead of the letter that should appear
this is
my regexp:
contents = re.sub(r'(<u>|<span style="text-decoration:
underline;">)(l|L|n|N|t|T)(</span>|</u>)', '\2\'' ,contents)
this is my input file content:
<u>l</u>omo
<u>n</u>omo
<u>t</u>omo
<u>L</u>omo
<u>N</u>omo
<u>T</u>omo
<span style="text-decoration:
underline;">n</span>omo
<u>t</u>omo
this is
my output file content
'omo
'omo
'omo
'omo
'omo
'omo
'omo
'omo
at to head
of the file I got:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
I tried
changing the coding to iso-8859-15, but nothing, for sure you know the reason for this, can
you share it with this poor newbee"
Thanks a lot!!
On Wed, March 30, 2011 09:46, Kushal Kumaran wrote:
2011/3/30 "Andrés
ChandÃa" <andres at chandia.net>:
>
>
> I'm new to
this list, so hello everybody!.
>
Hello Andrés
>
The stuff:
>
> I'm working with
> regexps and this is my line:
>
> contents = re.sub("<u>l<\/u>",
> "le"
,contents)
>
> in perl there is a way to reference previous registers,
> i.e.
>
> $text =~ s/<u>(l|L|n|N)<\/u>/$1e/g;
>
> So I'm looking for
> the way to do it in python, obviously this does not
works:
>
> contents =
>
re.sub("<u>(l|L|n|N)<\/u>", "$1e", contents)
>
You will use \1 for the backreference. The documentation of the re
module
(http://docs.python.org/library/re.html#re.sub) has an example.
Also note the use of raw
strings (r'...') to avoid having to escape
the backslash with another backslash.
_______________________
andrés
chandía
P No imprima
innecesariamente. ¡Cuide el medio ambiente!
More information about the Tutor
mailing list