find and replace with regular expressions

Mensanator mensanator at aol.com
Thu Jul 31 23:23:13 CEST 2008


On Jul 31, 3:56 pm, Mensanator <mensana... at aol.com> wrote:
> On Jul 31, 3:07 pm, chrispoliq... at gmail.com wrote:
>
>
>
>
>
> > I am using regular expressions to search a string (always full
> > sentences, maybe more than one sentence) for common abbreviations and
> > remove the periods.  I need to break the string into different
> > sentences but split('.') doesn't solve the whole problem because of
> > possible periods in the middle of a sentence.
>
> > So I have...
>
> > ----------------
>
> > import re
>
> > middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')
>
> > # this will find abbreviations like e.g. or i.e. in the middle of a
> > sentence.
> > # then I want to remove the periods.
>
> > ----------------
>
> > I want to keep the ie or eg but just take out the periods.  Any
> > ideas?  Of course newString = middle_abbr.sub('',txt) where txt is the
> > string will take out the entire abbreviation with the alphanumeric
> > characters included.
> >>> middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')
> >>> s = 'A test, i.e., an example.'
> >>> a = middle_abbr.search(s)      # find the abbreviation
> >>> b = re.compile('\.')           # period pattern
> >>> c = b.sub('',a.group(0))       # remove periods from abbreviation
> >>> d = middle_abbr.sub(c,s)       # substitute new abbr for old
> >>> d
>
> 'A test, ie, an example.'


A more versatile version:

import re

middle_abbr = re.compile('[A-Za-z0-9]\.[A-Za-z0-9]\.')
s = 'A test, i.e., an example.'
a = middle_abbr.search(s)      # find the abbreviation
b = re.compile('\.')           # period pattern
c = b.sub('',a.group(0))       # remove periods from abbreviation
d = middle_abbr.sub(c,s)       # substitute new abbr for old

print d
print
print

s = """A test, i.e., an example.
Yet another test, i.e., example with 2 abbr."""

a = middle_abbr.search(s)      # find the abbreviation
c = b.sub('',a.group(0))       # remove periods from abbreviation
d = middle_abbr.sub(c,s)       # substitute new abbr for old

print d
print
print

s = """A test, i.e., an example.
Yet another test, i.e., example with 2 abbr.
A multi-test, e.g., one with different abbr."""

done = False

while not done:
  a = middle_abbr.search(s)        # find the abbreviation
  if a:
    c = b.sub('',a.group(0))       # remove periods from abbreviation
    s = middle_abbr.sub(c,s,1)     # substitute new abbr for old ONCE
  else:                            # repeat until all removed
    done = True

print s

##  A test, ie, an example.
##
##
##  A test, ie, an example.
##  Yet another test, ie, example with 2 abbr.'
##
##
##  A test, ie, an example.
##  Yet another test, ie, example with 2 abbr.
##  A multi-test, eg, one with different abbr.



More information about the Python-list mailing list