[Tutor] Regex question

Wed Mar 30 17:21:02 CEST 2011

Thanks Kushal and Steve.
I think it works,a I say "I think" because at the
results I got a strange character instead of the letter that should appear

this is
my regexp:

contents = re.sub(r'(<u>|<span style="text-decoration:
underline;">)(l|L|n|N|t|T)(</span>|</u>)', '\2\'' ,contents)

this is my input file content:
<u>l</u>omo  
<u>n</u>omo  
<u>t</u>omo  
<u>L</u>omo  
<u>N</u>omo  
<u>T</u>omo  
<span style="text-decoration:
underline;">n</span>omo  
<u>t</u>omo 

this is
my output file content
'omo  
'omo  
'omo  
'omo 

'omo  
'omo  
'omo  
'omo  

at to head
of the file I got:
#!/usr/bin/env python
# -*- coding: utf-8 -*-

I tried
changing the coding to iso-8859-15, but nothing, for sure you know the reason for this, can
you share it with this poor newbee"

Thanks a lot!!

On Wed, March 30, 2011 09:46, Kushal Kumaran wrote:
2011/3/30 "AndrÃ©s
ChandÃa" <andres at chandia.net>:
>
>
> I'm new to
this list, so hello everybody!.
>

Hello AndrÃ©s

>
The stuff:
>
> I'm working with
> regexps and this is my line:
>
> contents = re.sub("<u>l<\/u>",
> "le"
,contents)
>
> in perl there is a way to reference previous registers,
> i.e.
>
> $text =~ s/<u>(l|L|n|N)<\/u>/$1e/g;
>
> So I'm looking for
> the way to do it in python, obviously this does not
works:
>
> contents =
>
re.sub("<u>(l|L|n|N)<\/u>", "$1e", contents)
>

You will use \1 for the backreference.  The documentation of the re
module
(http://docs.python.org/library/re.html#re.sub) has an example.
 Also note the use of raw
strings (r'...') to avoid having to escape
the backslash with another backslash.

_______________________
            andrés
chandía

P No imprima
innecesariamente. ¡Cuide el medio ambiente!