[Python-Dev] Prospective Peephole Transformation
Raymond Hettinger
python at rcn.com
Sat Feb 19 02:41:24 CET 2005
> >>Wouldn't it help a lot more if the compiler would detect that
> >>(1,2,3) is immutable and convert it into a constant at
> >>compile time ?!
> >
> >
> > Yes. We've already gotten it to that point:
. . .
>
> Cool. Does that work for all tuples in the program ?
It is limited to just tuples of constants (strings, ints, floats,
complex, None, and other tuples). Also, it is limited in its ability to
detect a nesting like: a=((1,2),(3,4)). One other limitation is that
floats like -0.23 are not recognized as constants because the initial
compilation still produces a UNARY_NEGATIVE operation:
>>> dis.dis(compile('-0.23', '', 'eval'))
0 0 LOAD_CONST 0 (0.23000000000000001)
3 UNARY_NEGATIVE
4 RETURN_VALUE
> I did a search on our code and Python's std lib. It turns
> out that by far most such usages use either 2 or 3
> values in the tuple. If you look at the types of the
> values, the most common usages are strings and integers.
Right, those are the most common cases. The linear searches are
ubiquitous. Here's a small selection:
if comptype not in ('NONE', 'ULAW', 'ALAW', 'G722')
return tail.lower() in (".py", ".pyw")
assert n in (2, 3, 4, 5)
if value[2] in ('F','n','N')
if sectName in ("temp", "cdata", "ignore", "include", "rcdata")
if not decode or encoding in ('', '7bit', '8bit', 'binary'):
if (code in (301, 302, 303, 307) and m in ("GET", "HEAD")
Unfortunately, there are several common patterns that are skipped
because rarely changed globals/builtins cannot be treated as constants:
if isinstance(x, (int, float, complex)): # types are not constants
if op in (ROT_TWO, POP_TOP, LOAD_FAST): # global consts from
opcode.py
except (TypeError, KeyError, IndexError): # builtins are not
constant
> I'd assume that you'll get somewhat different results
> from your benchmark if you had integers in the tuple.
Nope, the results are substantially the same give or take 2usec.
> Hmm, what if you'd teach tuples to do faster contains lookups for
> string or integer only content, e.g. by introducing sub-types for
> string-only and integer-only tuples ?!
For a linear search, tuples are already pretty darned good and leave
room for only microscopic O(n) improvements. The bigger win comes from
using a better algorithm and data structure -- hashing beats linear
search hands-down. The constant search time is faster for all n>1,
resulting in much improved scalability. No tweaking of
tuple.__contains__() can match it.
Sets are the right data structure for fast membership testing. I would
love for sets to be used internally while letting users continue to
write the clean looking code shown above.
Raymond
More information about the Python-Dev
mailing list