[Python-ideas] Custom string prefixes

Stephen J. Turnbull stephen at xemacs.org
Fri May 31 09:02:17 CEST 2013


>>>>> Haoyi Li writes:

 > Maybe? I could imagine the regex module using it right away for a
 > very nice syntax for:

 > regex"<my-big-regex>".match(...)

I don't consider that particularly nice compared to the function call
it greatly resembles.  And there's no way I would copy-paste a
<my-big-regex> to multiple places, so I'd need an appropriately-
scoped variable for it anyway -- which might as well be initialized to
the compiled regexp.  That leaves possibility of regexp compilation at
source compile time as the unique benefit.  I don't think that's worth
new syntax.

I think those observations about *this* particular use case probably
generalize to similar use cases.  That leaves the "let's make it as
easy as possible to use safe literals in places where variable
interpolation is dangerous" rationale, which I think is a lot more
plausible.  My intuition says that is better addressed in templating
languages for user interfaces, though.  At least, that's where I
regularly stick my nose (and occasionally my neck) into hanging nooses.

 > While simultaneously getting rid of the behaviourally unspecified
 > global-compiled-regex-cache (what does "a few regular expressions
 > at a
 > time<http://docs.python.org/3.3/library/re.html?highlight=re.sub#re.compile>"
 > mean anyway?)

It means it's an optimization that throwaway scripts can take
advantage of, leaving the choice to "intern" the regex to the user.

 > in favor of per-regex-literal interning of compiled regexes, which
 > is what the global-compiled-regex-cache is trying to approximate

I'm not the author, so I'm not authoritative on Python, but in XEmacs
we do the same thing because it dramatically speeds up loops where the
regexp is inlined.  (Emacs Lisp doesn't provide a compiled regexp
type.  Of course they'd be even faster if we could save the compiled
regexp in a variable, but very strict compatibility with Emacs is
required.)  Somebody once tried attaching the compiled regexp to the
strings as properties, but that meant interning the strings so string
content storage alone increased by about 20% in XEmacs itself, and the
compiled regexps added another 25% of string storage.  So XEmacs
itself grew by about 5%, and uncollectable strings plus compiled
regexp baggage accumulated rapidly.[1]  That was unacceptable given a
lack of visible performance improvement over the cache.

The global regexp cache is a reasonable compromise, that's all: an
inexpensive, easily tuned optimization which provides substantial
saving in a common case.



Footnotes: 
[1]  Granted, this was before we implemented weakrefs, which probably
would mitigate this problem.



More information about the Python-ideas mailing list