[Python-Dev] New regex module for 3.2?

Georg Brandl g.brandl at gmx.net
Fri Jul 16 19:08:04 CEST 2010


Am 16.07.2010 17:08, schrieb Vlastimil Brom:
> 2010/7/9 Georg Brandl <g.brandl at gmx.net>:
>> Am 09.07.2010 02:35, schrieb MRAB:
>>
>>>
>>> 1. Some of the inline flags are scoped; for example, putting "(?i)" at
>>> the end of a regex will now have no effect because it's no longer a
>>> global, all-or-nothing, flag.
>>
>> That is problematic.  I've often seen people put these flags at the end
>> of a regex, probably for readability purposes.  IMHO it would be better
>> to limit flag scoping to the explicit (?flags-flags: ) groups.
>>
> 
> I just noticed the formulation on the reference page
> regular-expressions.info on this kind of flags:
> "(?i)	Turn on case insensitivity for the remainder of the regular
> expression. (Older regex flavors may turn it on for the entire
> regex.)" and likewise for other flags.
> 
> http://www.regular-expressions.info/refadv.html
> 
> I am not sure, how "authoritative" this page by Jan Goyvaerts is for
> various implementations, but it looks like a very comprehensive
> reference.

Nevertheless, the authoritative reference for our regex engine is its
docs, i.e. http://docs.python.org/library/re.html -- and that states
clearly that inline flags apply to the whole regex.

> I think with a new regex implementation, not all of this "historical"
> semantics must be copied, unless there are major real usecases, which
> would be affected by this.

As I already said, I *have* seen this in real code.  As MRAB indicated,
this was the only silent change in semantics as compared to the old
regex engine.  If we replace re by regex, which I think is the only
way to get the new features in the stdlib, changing this one aspect is
a) not backwards compatible and b) in a subtle way that forces everyone
to review his/her regular expressions.  That's definitely not
acceptable.


Georg


-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.



More information about the Python-Dev mailing list