How to write replace string for object which will be substituted? [regexp]

Wed Aug 5 08:21:43 EDT 2009

On 5 Sie, 13:28, Anthra Norell <anthra.nor... at bluewin.ch> wrote:
> MRAB wrote:
> > ryniek90 wrote:
> >> Hi.
> >> I started learning regexp, and some things goes well, but most of
> >> them still not.
>
> >> I've got problem with some regexp. Better post code here:
>
> >> "
> >>  >>> import re
> >>  >>> mail = '\nn... at mail.com\nname1 [at] mail [dot] com\nname2 [$at$]
> >> mail [$dot$] com\n'
> >>  >>> mail
> >> '\nn... at mail.com\nname1 [at] mail [dot] com\nname2 [$at$] mail
> >> [$dot$] com\n'
> >>  >>> print mail
>
> >> n... at mail.com
> >> name1 [at] mail [dot] com
> >> name2 [$at$] mail [$dot$] com
>
> >>  >>> maail = re.sub('^\n|$\n', '', mail)
> >>  >>> print maail
> >> n... at mail.com
> >> name1 [at] mail [dot] com
> >> name2 [$at$] mail [$dot$] com
> >>  >>> maail = re.sub(' ', '', maail)
> >>  >>> print maail
> >> n... at mail.com
> >> name1[at]mail[dot]com
> >> name2[$at$]mail[$dot$]com
> >>  >>> maail = re.sub('\[at\]|\[\$at\$\]', '@', maail)
> >>  >>> print maail
> >> n... at mail.com
> >> name1 at mail[dot]com
> >> name2 at mail[$dot$]com
> >>  >>> maail = re.sub('\[dot\]|\[\$dot\$\]', '.', maail)
> >>  >>> print maail
> >> n... at mail.com
> >> na... at mail.com
> >> na... at mail.com
> >>  >>> #How must i write the replace string to replace all this
> >> regexp's with just ONE command, in string 'mail' ?
> >>  >>> maail = re.sub('^\n|$\n|
> >> |\[at\]|\[\$at\$\]|\[dot\]|\[\$dot\$\]', *?*, mail)
> >> "
>
> >> How must i write that replace pattern (look at question mark), to
> >> maek that substituion work? I didn't saw anything helpful while
> >> reading Re doc and HowTo (from Python Doc). I tried with
> >> 'MatchObject.group()' but something gone wrong - didn't wrote it right.
> >> Is there more user friendly HowTo for Python Re, than this?
>
> >> I'm new to programming an regexp, sorry for inconvenience.
>
> > I don't think you can do it in one regex, nor would I want to. Just use
> > the string's replace() method.
>
> > >>> mail = '\nn... at mail.com\nname1 [at] mail [dot] com\nname2 [$at$]
> > mail [$dot$] com\n'
> > >>> mail
> > '\nn... at mail.com\nname1 [at] mail [dot] com\nname2 [$at$] mail [$dot$]
> > com\n'
> > >>> print mail
>
> > n... at mail.com
> > name1 [at] mail [dot] com
> > name2 [$at$] mail [$dot$] com
>
> > >>> maail = mail.strip()
> > n... at mail.com
> > name1 [at] mail [dot] com
> > name2 [$at$] mail [$dot$] com
>
> > >>> maail = maail.replace(' ', '')
> > >>> print maail
> > n... at mail.com
> > name1[at]mail[dot]com
> > name2[$at$]mail[$dot$]com
> > >>> maail = maail.replace('[at]', '@').replace('[$at$]', '@')
> > >>> print maail
> > n... at mail.com
> > name1 at mail[dot]com
> > name2 at mail[$dot$]com
> > >>> maail = maail.replace('[dot]', '.').replace('[$dot$]', '.')
> > >>> print maail
> > n... at mail.com
> > na... at mail.com
> > na... at mail.com
>
> This is a good learning exercise demonstrating the impracticality of
> regular expressions in a given situation. In the light of the
> fascination regular expressions seem to exert in general, one might
> conclude that knowing regular expressions in essence is knowing when not
> to use them.
>
> There is nothing wrong with cascading substitutions through multiple
> expressions. The OP's solution wrapped up in a function and streamlined
> for needless regex overkill might look something like this:
>
> def translate (s):
>    s1 = s.strip ()     # Instead of: s1 = re.sub ('^\n|$\n', '', s)
>    s2 = s1.replace (' ', '')    # Instead of: s2 = re.sub (' ', '', s1)
>    s3 = re.sub ('\[at\]|\[\$at\$\]', '@', s2)
>    s4 = re.sub ('\[dot\]|\[\$dot\$\]', '.', s3)
>    return s4
>
> print translate (mail)   # Tested
>
> MRAB's solution using replace () avoids needless regex complexity, but
> doesn't simplify tedious coding if the number of substitutions is
> significant. Some time ago I proposed a little module I made to
> alleviate the tedium. It would handle this case like this:
>
> import SE
> Translator = SE.SE ( ' (32)= [at]=@ [$at$]=@ [dot]=. [$dot$]=. ' )
> print Translator (mail.strip ())   # Tested
>
> So SE.SE compiles a string composed of any number of substitution
> definitions into an object that translates anything given it. In a
> running speed contest it would surely come in last, although in most
> cases the disadvantage would be imperceptible. Another matter is coding
> speed. Here the advantage is obvious, even with a set of substitutions
> as small as this one, let alone with sets in the tens or even hundreds.
> One inconspicuous but significant feature of SE is that it handles
> precedence correctly if targets overlap (upstream over downstream and
> long over short). As far as I know there's nothing in the Python system
> handling substitution precedence. It always needs to be hand-coded from
> one case to the next and that isn't exactly trivial.
>
> SE can be downloaded fromhttp://pypi.python.org/pypi/SE/2.3.
>
> Frederic

Thanks again.  :)

I saw that MRAB is actively developing new implementation of re
module.
MRAB: You think it'd be good idea adding to Your project some best
features of SE module?
I didn't seen yet features of Your re module but will try to find time
even today, to see what's going on.

Greets