How to write replace string for object which will be substituted? [regexp]
Anthra Norell
anthra.norell at bluewin.ch
Wed Aug 5 07:28:18 EDT 2009
MRAB wrote:
> ryniek90 wrote:
>> Hi.
>> I started learning regexp, and some things goes well, but most of
>> them still not.
>>
>> I've got problem with some regexp. Better post code here:
>>
>> "
>> >>> import re
>> >>> mail = '\nname at mail.com\nname1 [at] mail [dot] com\nname2 [$at$]
>> mail [$dot$] com\n'
>> >>> mail
>> '\nname at mail.com\nname1 [at] mail [dot] com\nname2 [$at$] mail
>> [$dot$] com\n'
>> >>> print mail
>>
>> name at mail.com
>> name1 [at] mail [dot] com
>> name2 [$at$] mail [$dot$] com
>>
>> >>> maail = re.sub('^\n|$\n', '', mail)
>> >>> print maail
>> name at mail.com
>> name1 [at] mail [dot] com
>> name2 [$at$] mail [$dot$] com
>> >>> maail = re.sub(' ', '', maail)
>> >>> print maail
>> name at mail.com
>> name1[at]mail[dot]com
>> name2[$at$]mail[$dot$]com
>> >>> maail = re.sub('\[at\]|\[\$at\$\]', '@', maail)
>> >>> print maail
>> name at mail.com
>> name1 at mail[dot]com
>> name2 at mail[$dot$]com
>> >>> maail = re.sub('\[dot\]|\[\$dot\$\]', '.', maail)
>> >>> print maail
>> name at mail.com
>> name1 at mail.com
>> name2 at mail.com
>> >>> #How must i write the replace string to replace all this
>> regexp's with just ONE command, in string 'mail' ?
>> >>> maail = re.sub('^\n|$\n|
>> |\[at\]|\[\$at\$\]|\[dot\]|\[\$dot\$\]', *?*, mail)
>> "
>>
>> How must i write that replace pattern (look at question mark), to
>> maek that substituion work? I didn't saw anything helpful while
>> reading Re doc and HowTo (from Python Doc). I tried with
>> 'MatchObject.group()' but something gone wrong - didn't wrote it right.
>> Is there more user friendly HowTo for Python Re, than this?
>>
>> I'm new to programming an regexp, sorry for inconvenience.
>>
> I don't think you can do it in one regex, nor would I want to. Just use
> the string's replace() method.
>
> >>> mail = '\nname at mail.com\nname1 [at] mail [dot] com\nname2 [$at$]
> mail [$dot$] com\n'
> >>> mail
> '\nname at mail.com\nname1 [at] mail [dot] com\nname2 [$at$] mail [$dot$]
> com\n'
> >>> print mail
>
> name at mail.com
> name1 [at] mail [dot] com
> name2 [$at$] mail [$dot$] com
>
> >>> maail = mail.strip()
> name at mail.com
> name1 [at] mail [dot] com
> name2 [$at$] mail [$dot$] com
>
> >>> maail = maail.replace(' ', '')
> >>> print maail
> name at mail.com
> name1[at]mail[dot]com
> name2[$at$]mail[$dot$]com
> >>> maail = maail.replace('[at]', '@').replace('[$at$]', '@')
> >>> print maail
> name at mail.com
> name1 at mail[dot]com
> name2 at mail[$dot$]com
> >>> maail = maail.replace('[dot]', '.').replace('[$dot$]', '.')
> >>> print maail
> name at mail.com
> name1 at mail.com
> name2 at mail.com
This is a good learning exercise demonstrating the impracticality of
regular expressions in a given situation. In the light of the
fascination regular expressions seem to exert in general, one might
conclude that knowing regular expressions in essence is knowing when not
to use them.
There is nothing wrong with cascading substitutions through multiple
expressions. The OP's solution wrapped up in a function and streamlined
for needless regex overkill might look something like this:
def translate (s):
s1 = s.strip () # Instead of: s1 = re.sub ('^\n|$\n', '', s)
s2 = s1.replace (' ', '') # Instead of: s2 = re.sub (' ', '', s1)
s3 = re.sub ('\[at\]|\[\$at\$\]', '@', s2)
s4 = re.sub ('\[dot\]|\[\$dot\$\]', '.', s3)
return s4
print translate (mail) # Tested
MRAB's solution using replace () avoids needless regex complexity, but
doesn't simplify tedious coding if the number of substitutions is
significant. Some time ago I proposed a little module I made to
alleviate the tedium. It would handle this case like this:
import SE
Translator = SE.SE ( ' (32)= [at]=@ [$at$]=@ [dot]=. [$dot$]=. ' )
print Translator (mail.strip ()) # Tested
So SE.SE compiles a string composed of any number of substitution
definitions into an object that translates anything given it. In a
running speed contest it would surely come in last, although in most
cases the disadvantage would be imperceptible. Another matter is coding
speed. Here the advantage is obvious, even with a set of substitutions
as small as this one, let alone with sets in the tens or even hundreds.
One inconspicuous but significant feature of SE is that it handles
precedence correctly if targets overlap (upstream over downstream and
long over short). As far as I know there's nothing in the Python system
handling substitution precedence. It always needs to be hand-coded from
one case to the next and that isn't exactly trivial.
SE can be downloaded from http://pypi.python.org/pypi/SE/2.3.
Frederic
More information about the Python-list
mailing list