My thesis (which, for those who don't know, was to come up with a way to do
type inferencing in the compiler without requiring any semantic changes;
basically type inferencing atomic types assigned to local variables) is now far
enough long that I have the algorithm done and I can generate statistics on
what opcodes are called with the most common types that I can specifically
infer (can also do static type checking on occasion; only triggers were actual
unit tests making sure TypeError was raised for certain things like ``~4.2``
and such). Thought some of you might get a kick out of this since the numbers
are rather blatent for certain opcodes and methods.
To read the stats, the number to the left is the number of times the opcode was
compiled (not executed!) with the specific type(s) known for the opcode (if it
took two args, then both types are listed; order was considered irrelevant).
Now they are listed as integers, so here is the conversion::
For the things named "meth_<something>" that is the method being called
immediately on the type.
Now realize these numbers are only for opcodes where I could definitely infer
the type; ones where it could be more than one type, regardless if those
possibilities were very specific, I just ignored it and did not include in the
I also tweaked some opcodes knowing how they are more often used. So, for
instance, BINARY_MODULO checks specifically for the case of when the left side
is a basestring and then just doesn't worry about the other args. Other ones I
just didn't bother with all the args since it was not interesting to me in
terms of deciding what type-specific opcodes I want to come up with.
Anyway, here are the numbers on Lib sans Lib/test (129,814 lines according to
SLOCCount) for the ones above 100::
(101, ('BINARY_MULTIPLY', (8, 4))),
(106, ('BINARY_SUBSCR', 128)),
(118, ('GET_ITER', 128)),
(124, ('BINARY_MODULO', None)),
(195, ('meth_join', 4)),
(204, ('BINARY_ADD', (8, 8))),
(331, ('BINARY_ADD', (4, 4))),
(513, ('BINARY_LSHIFT', (8, 8))),
(840, ('meth_append', 128)),
(1270, ('PRINT_ITEM', 4)),
(1916, ('BINARY_MODULO', 4)),
(12302, ('STORE_SUBSCR', 64))]
We sure like our dictionaries (for those that don't know, dictionaries are
created by making an empty dict and then basically doing an indivual assignment
for each value). We also seem to love to use string interpolation, and
printing stuff. Using list.append is also popular. Now the BINARY_LSHIFT is
rather interesting, and that ties into the whole issue of how much I can
actually infer; since binary work tends to be with all constants I can infer it
really easily and so its frequency is rather high. Its actual frequency of
use, though, compared to other things probably is not high, though. Plus I
doubt Plone, for instance, uses ``<<`` very often so I suspect the opcode will
get weeded out when I incorporate stats from the other apps I am taking stats from.
As for the stuff I cut out, the surprising thing from those numbers was how few
mathematical expressions could be inferred. I checked my numbers with grep and
there really is only 3 times where a float constant is divided by a float
constant (and they are all in colorsys). I was not expecting that at all.
Guess global variables or object attributes tend to have them or I just can't
infer the values. Either way I just wasn't expecting that.
Anyway, as I said I just thought some people might find this interesting.
Don't read into this too much since I am just using these numbers as guidelines
for type-specific opcodes to write for use as a quantifiable measurement of the
usefulness of type inferencing like this.
P.S.: anyone who is *really* interested I can send you the full stats for the
apps I have run my modified version of compile.c against.