re
David C. Ullrich
dullrich at sprynet.com
Wed Jun 4 17:39:38 EDT 2008
In article <mailman.79.1212598994.1044.python-list at python.org>,
"Russell Blau" <russblau at hotmail.com> wrote:
> "Diez B. Roggisch" <deets at nospam.web.de> wrote in message
> news:6anvi4F38ei08U1 at mid.uni-berlin.de...
> > David C. Ullrich schrieb:
> >> Say I want to replace 'disc' with 'disk', but only
> >> when 'disc' is a complete word (don't want to change
> >> 'discuss' to 'diskuss'.) The following seems almost
> >> right:
> >>
> >> [^a-zA-Z])disc[^a-zA-Z]
> >>
> >> The problem is that that doesn't match if 'disc' is at
> >> the start or end of the string. Of course I could just
> >> combine a few re's with |, but it seems like there should
> >> (or might?) be a way to simply append a \A to the first
> >> [^a-zA-Z] and a \Z to the second.
> >
> > Why not
> >
> > ($|[\w])disc(^|[^\w])
> >
> > I hope \w is really the literal for whitespace - might be something
> > different, see the docs.
>
> No, \s is the literal for whitespace.
> http://www.python.org/doc/current/lib/re-syntax.html
>
> But how about:
>
> text = re.sub(r"\bdisc\b", "disk", text_to_be_changed)
>
> \b is the "word break" character,
Lovely - that's exactly right, thanks. I swear I looked at the
docs... I'm just blind or stupid. No wait, I'm blind _and_
stupid. No, blind and stupid and slow...
Doesn't precisely fit the _spec_ because of digits and underscores,
but it's close enough to solve the problem exactly. Thanks.
>it matches at the beginning or end of any
> "word" (where a word is any sequence of \w characters, and \w is any
> alphanumeric
> character or _).
>
> Note that this solution still doesn't catch "Disc" if it is capitalized.
Thanks. I didn't mention I wanted to catch both cases because I
already knew how to take care of that:
r"\b[dD]isc\b"
> Russ
--
David C. Ullrich
More information about the Python-list
mailing list