re
David C. Ullrich
dullrich at sprynet.com
Wed Jun 4 13:24:33 EDT 2008
In article <6anvi4F38ei08U1 at mid.uni-berlin.de>,
"Diez B. Roggisch" <deets at nospam.web.de> wrote:
> David C. Ullrich schrieb:
> > Actually using regular expressions for the first
> > time. Is there something that allows you to take the
> > union of two character sets, or append a character to
> > a character set?
> >
> > Say I want to replace 'disc' with 'disk', but only
> > when 'disc' is a complete word (don't want to change
> > 'discuss' to 'diskuss'.) The following seems almost
> > right:
> >
> > [^a-zA-Z])disc[^a-zA-Z]
> >
> > The problem is that that doesn't match if 'disc' is at
> > the start or end of the string. Of course I could just
> > combine a few re's with |, but it seems like there should
> > (or might?) be a way to simply append a \A to the first
> > [^a-zA-Z] and a \Z to the second.
>
> Why not
>
> ($|[\w])disc(^|[^\w])
>
> I hope \w is really the literal for whitespace - might be something
> different, see the docs.
Thanks, but I don't follow that at all.
Whitespace is actually \s. But [\s]disc[whatever]
doesn't do the job - then it won't match "(disc)",
which counts as "disc appearing as a full word.
Also I think you have ^ and $ backwards, and there's
a ^ I don't understand. I _think_ that a correct version
of what you're suggesting would be
(^|[^a-zA-Z])disc($|[^a-zA-Z])
But as far as I can see that simply doesn't work.
I haven't been able to use | that way, combining
_parts_ of a re. That was the first thing I tried.
The original works right except for not matching
at the start or end of a string, the thing with
the | doesn't work at all:
>>> test = compile(r'(^|[^a-zA-Z])disc($|[^a-zA-Z])')
>>> test.findall('')
[]
>>> test.findall('disc')
[('', '')]
>>> test.findall(' disc ')
[(' ', ' ')]
>>> disc = compile(r'[^a-zA-Z]disc[^a-zA-Z]')
>>> disc.findall(' disc disc disc')
[' disc ']
>>> disc.findall(' disc disc disc')
[' disc ', ' disc ']
>>> test.findall(' disc disc disc')
[(' ', ' '), (' ', ' ')]
>>> disc.findall(' disc disc disc')
[' disc ', ' disc ']
>>> disc.findall(' disc disc disc ')
[' disc ', ' disc ', ' disc ']
> Diez
--
David C. Ullrich
More information about the Python-list
mailing list