How to write replace string for object which will be substituted? [regexp]

Anthra Norell anthra.norell at bluewin.ch
Wed Aug 5 07:28:18 EDT 2009


MRAB wrote:
> ryniek90 wrote:
>> Hi.
>> I started learning regexp, and some things goes well, but most of 
>> them still not.
>>
>> I've got problem with some regexp. Better post code here:
>>
>> "
>>  >>> import re
>>  >>> mail = '\nname at mail.com\nname1 [at] mail [dot] com\nname2 [$at$] 
>> mail [$dot$] com\n'
>>  >>> mail
>> '\nname at mail.com\nname1 [at] mail [dot] com\nname2 [$at$] mail 
>> [$dot$] com\n'
>>  >>> print mail
>>
>> name at mail.com
>> name1 [at] mail [dot] com
>> name2 [$at$] mail [$dot$] com
>>
>>  >>> maail = re.sub('^\n|$\n', '', mail)
>>  >>> print maail
>> name at mail.com
>> name1 [at] mail [dot] com
>> name2 [$at$] mail [$dot$] com
>>  >>> maail = re.sub(' ', '', maail)
>>  >>> print maail
>> name at mail.com
>> name1[at]mail[dot]com
>> name2[$at$]mail[$dot$]com
>>  >>> maail = re.sub('\[at\]|\[\$at\$\]', '@', maail)
>>  >>> print maail
>> name at mail.com
>> name1 at mail[dot]com
>> name2 at mail[$dot$]com
>>  >>> maail = re.sub('\[dot\]|\[\$dot\$\]', '.', maail)
>>  >>> print maail
>> name at mail.com
>> name1 at mail.com
>> name2 at mail.com
>>  >>> #How must i write the replace string to replace all this 
>> regexp's with just ONE command, in string 'mail' ?
>>  >>> maail = re.sub('^\n|$\n| 
>> |\[at\]|\[\$at\$\]|\[dot\]|\[\$dot\$\]', *?*, mail)
>> "
>>
>> How must i write that replace pattern (look at question mark), to 
>> maek that substituion work? I didn't saw anything helpful while 
>> reading Re doc and HowTo (from Python Doc). I tried with 
>> 'MatchObject.group()' but something gone wrong - didn't wrote it right.
>> Is there more user friendly HowTo for Python Re, than this?
>>
>> I'm new to programming an regexp, sorry for inconvenience.
>>
> I don't think you can do it in one regex, nor would I want to. Just use
> the string's replace() method.
>
> >>> mail = '\nname at mail.com\nname1 [at] mail [dot] com\nname2 [$at$] 
> mail [$dot$] com\n'
> >>> mail
> '\nname at mail.com\nname1 [at] mail [dot] com\nname2 [$at$] mail [$dot$] 
> com\n'
> >>> print mail
>
> name at mail.com
> name1 [at] mail [dot] com
> name2 [$at$] mail [$dot$] com
>
> >>> maail = mail.strip()
> name at mail.com
> name1 [at] mail [dot] com
> name2 [$at$] mail [$dot$] com
>
> >>> maail = maail.replace(' ', '')
> >>> print maail
> name at mail.com
> name1[at]mail[dot]com
> name2[$at$]mail[$dot$]com
> >>> maail = maail.replace('[at]', '@').replace('[$at$]', '@')
> >>> print maail
> name at mail.com
> name1 at mail[dot]com
> name2 at mail[$dot$]com
> >>> maail = maail.replace('[dot]', '.').replace('[$dot$]', '.')
> >>> print maail
> name at mail.com
> name1 at mail.com
> name2 at mail.com
This is a good learning exercise demonstrating the impracticality of 
regular expressions in a given situation. In the light of the 
fascination regular expressions seem to exert in general, one might 
conclude that knowing regular expressions in essence is knowing when not 
to use them.

There is nothing wrong with cascading substitutions through multiple 
expressions. The OP's solution wrapped up in a function and streamlined 
for needless regex overkill might look something like this:

def translate (s):
   s1 = s.strip ()     # Instead of: s1 = re.sub ('^\n|$\n', '', s)
   s2 = s1.replace (' ', '')    # Instead of: s2 = re.sub (' ', '', s1)
   s3 = re.sub ('\[at\]|\[\$at\$\]', '@', s2)
   s4 = re.sub ('\[dot\]|\[\$dot\$\]', '.', s3)
   return s4

print translate (mail)   # Tested

MRAB's solution using replace () avoids needless regex complexity, but 
doesn't simplify tedious coding if the number of substitutions is 
significant. Some time ago I proposed a little module I made to 
alleviate the tedium. It would handle this case like this:

import SE
Translator = SE.SE ( ' (32)= [at]=@ [$at$]=@ [dot]=. [$dot$]=. ' )
print Translator (mail.strip ())   # Tested

So SE.SE compiles a string composed of any number of substitution 
definitions into an object that translates anything given it. In a 
running speed contest it would surely come in last, although in most 
cases the disadvantage would be imperceptible. Another matter is coding 
speed. Here the advantage is obvious, even with a set of substitutions 
as small as this one, let alone with sets in the tens or even hundreds. 
One inconspicuous but significant feature of SE is that it handles 
precedence correctly if targets overlap (upstream over downstream and 
long over short). As far as I know there's nothing in the Python system 
handling substitution precedence. It always needs to be hand-coded from 
one case to the next and that isn't exactly trivial.

SE can be downloaded from http://pypi.python.org/pypi/SE/2.3.

Frederic







More information about the Python-list mailing list