Using Groups inside Braces with Regular Expressions

Chris chrisspen at gmail.com
Sun Jul 13 21:01:01 EDT 2008


On Jul 13, 8:14 pm, MRAB <goo... at mrabarnett.plus.com> wrote:
> On Jul 14, 12:05 am, Chris <chriss... at gmail.com> wrote:> I'm trying to delimit  sentences in a block of text by defining the
> > end-of-sentence marker as a period followed by a space followed by an
> > uppercase letter or end-of-string.
>
> > I'd imagine the regex for that would look something like:
> > [^(?:[A-Z]|$)]\.\s+(?=[A-Z]|$)
>
> > However, Python keeps giving me an "unbalanced parenthesis" error for
> > the [^] part. If this isn't valid regex syntax, how else would I match
> > a block of text that doesn't the delimiter pattern?
>
> What is the [^(?:[A-Z]|$)] part meant to be doing? Is it meant to be
> matching everything up to the end of the sentence?
>
> [...] is a character class, so Python is parsing the character class
> as:
>
> [^(?:[A-Z]|$)]
> ^^^^^^^^^^

It was meant to include everything except the end-of-sentence pattern.
However, I just realized that I can simply replace it with ".*?"



More information about the Python-list mailing list