regexp compilation error

Ovidiu Deac ovidiudeac at gmail.com
Tue Oct 11 04:26:03 EDT 2011


Thanks for the answer. I will give a try to pypy regex.

On Fri, Sep 30, 2011 at 4:56 PM, Vlastimil Brom
<vlastimil.brom at gmail.com> wrote:
> 2011/9/30 Ovidiu Deac <ovidiudeac at gmail.com>:
>> This is only part of a regex taken from an old perl application which
>> we are trying to understand/port to our new Python implementation.
>>
>> The original regex was considerably more complex and it didn't compile
>> in python so I removed all the parts I could in order to isolate the
>> problem such that I can ask help here.
>>
>> So the problem is that this regex doesn't compile. On the other hand
>> I'm not really sure it should. It's an anchor on which you apply *.
>> I'm not sure if this is legal.
>>
>> On the other hand if I remove one of the * it compiles.
>>
>>>>> re.compile(r"""^(?: [^y]* )*""", re.X)
>> Traceback (most recent call last):
>>  File "<stdin>", line 1, in <module>
>>  File "/usr/lib/python2.6/re.py", line 190, in compile
>>    return _compile(pattern, flags)
>>  File "/usr/lib/python2.6/re.py", line 245, in _compile
>>    raise error, v # invalid expression
>> sre_constants.error: nothing to repeat
>>>>> re.compile(r"""^(?: [^y] )*""", re.X)
>> <_sre.SRE_Pattern object at 0x7f4069cc36b0>
>>>>> re.compile(r"""^(?: [^y]* )""", re.X)
>> <_sre.SRE_Pattern object at 0x7f4069cc3730>
>>
>> Is this a bug in python regex engine? Or maybe some incompatibility with Perl?
>>
>> On Fri, Sep 30, 2011 at 12:29 PM, Chris Angelico <rosuav at gmail.com> wrote:
>>> On Fri, Sep 30, 2011 at 7:26 PM, Ovidiu Deac <ovidiudeac at gmail.com> wrote:
>>>> $ python --version
>>>> Python 2.6.6
>>>
>>> Ah, I think I was misinterpreting the traceback. You do actually have
>>> a useful message there; it's the same error that my Py3.2 produced:
>>>
>>> sre_constants.error: nothing to repeat
>>>
>>> I'm not sure what your regex is trying to do, but the problem seems to
>>> be connected with the * at the end of the pattern.
>>>
>>> ChrisA
>>> --
>
> I believe, this is a limitation of the builtin re engine concerning
> nested infinite quantifiers - (...*)*  - in your pattern.
> You can try a more powerful recent regex implementation, which appears
> to handle it:
>
> http://pypi.python.org/pypi/regex
>
> using the VERBOSE flag - re.X all (unescaped) whitespace outside of
> character classes is ignored,
> http://docs.python.org/library/re.html#re.VERBOSE
> the pattern should be equivalent to:
> r"^(?:[^y]*)*"
> ie. you are not actually gaining anything with double quantifier, as
> there isn't anything "real" in the pattern outside [^y]*
>
> It appears, that you have oversimplified the pattern (if it had worked
> in the original app),
> however, you may simply try with
> import regex as re
> and see, if it helps.
>
> Cf:
>>>>
>>>> regex.findall(r"""^(?: [^y]* )*""", "a bcd e", re.X)
> ['a bcd e']
>>>> re.findall(r"""^(?: [^y]* )*""", "a bcd e", re.X)
> Traceback (most recent call last):
>  File "<input>", line 1, in <module>
>  File "re.pyc", line 177, in findall
>  File "re.pyc", line 244, in _compile
> error: nothing to repeat
>>>>
>>>> re.findall(r"^(?:[^y]*)*", "a bcd e")
> Traceback (most recent call last):
>  File "<input>", line 1, in <module>
>  File "re.pyc", line 177, in findall
>  File "re.pyc", line 244, in _compile
> error: nothing to repeat
>>>> regex.findall(r"^(?:[^y]*)*", "a bcd e")
> ['a bcd e']
>>>> regex.findall(r"^[^y]*", "a bcd e")
> ['a bcd e']
>>>>
>
>
> hth,
>  vbr
> --
> http://mail.python.org/mailman/listinfo/python-list
>



More information about the Python-list mailing list