New subject: [OT] stats on type-inferencing atomic types in local variables in the stdlib

30 Sep 2004

      My thesis (which, for those who don't know, was to come up with a way to do 
type inferencing in the compiler without requiring any semantic changes; 
basically type inferencing atomic types assigned to local variables) is now far 
enough long that I have the algorithm done and I can generate statistics on 
what opcodes are called with the most common types that I can specifically 
infer (can also do static type checking on occasion; only triggers were actual 
unit tests making sure TypeError was raised for certain things like ``~4.2`` 
and such).  Thought some of you might get a kick out of this since the numbers 
are rather blatent for certain opcodes and methods.

To read the stats, the number to the left is the number of times the opcode was 
compiled (not executed!) with the specific type(s) known for the opcode (if it 
took two args, then both types are listed; order was considered irrelevant). 
Now they are listed as integers, so here is the conversion::

   Basestring 4
   IntegralType 8
   FloatType 16
   ImagType 32
   DictType 64
   ListType 128
   TupleType 256

For the things named "meth_<something>" that is the method being called 
immediately on the type.

Now realize these numbers are only for opcodes where I could definitely infer 
the type; ones where it could be more than one type, regardless if those 
possibilities were very specific, I just ignored it and did not include in the 
stats.

I also tweaked some opcodes knowing how they are more often used.  So, for 
instance, BINARY_MODULO checks specifically for the case of when the left side 
is a basestring and then just doesn't worry about the other args.  Other ones I 
just didn't bother with all the args since it was not interesting to me in 
terms of deciding what type-specific opcodes I want to come up with.

Anyway, here are the numbers on Lib sans Lib/test (129,814 lines according to 
SLOCCount) for the ones above 100::

  (101, ('BINARY_MULTIPLY', (8, 4))),
  (106, ('BINARY_SUBSCR', 128)),
  (118, ('GET_ITER', 128)),
  (124, ('BINARY_MODULO', None)),
  (195, ('meth_join', 4)),
  (204, ('BINARY_ADD', (8, 8))),
  (331, ('BINARY_ADD', (4, 4))),
  (513, ('BINARY_LSHIFT', (8, 8))),
  (840, ('meth_append', 128)),
  (1270, ('PRINT_ITEM', 4)),
  (1916, ('BINARY_MODULO', 4)),
  (12302, ('STORE_SUBSCR', 64))]

We sure like our dictionaries (for those that don't know, dictionaries are 
created by making an empty dict and then basically doing an indivual assignment 
for each value).  We also seem to love to use string interpolation, and 
printing stuff.  Using list.append is also popular.  Now the BINARY_LSHIFT is 
rather interesting, and that ties into the whole issue of how much I can 
actually infer; since binary work tends to be with all constants I can infer it 
really easily and so its frequency is rather high.  Its actual frequency of 
use, though, compared to other things probably is not high, though.  Plus I 
doubt Plone, for instance, uses ``<<`` very often so I suspect the opcode will 
get weeded out when I incorporate stats from the other apps I am taking stats from.

As for the stuff I cut out, the surprising thing from those numbers was how few 
mathematical expressions could be inferred.  I checked my numbers with grep and 
there really is only 3 times where a float constant is divided by a float 
constant (and they are all in colorsys).  I was not expecting that at all. 
Guess global variables or object attributes tend to have them or I just can't 
infer the values.  Either way I just wasn't expecting that.

Anyway, as I said I just thought some people might find this interesting. 
Don't read into this too much since I am just using these numbers as guidelines 
for type-specific opcodes to write for use as a quantifiable measurement of the 
usefulness of type inferencing like this.

-Brett

P.S.: anyone who is *really* interested I can send you the full stats for the 
apps I have run my modified version of compile.c against.

[OT] stats on type-inferencing atomic types in local variables in the stdlib

Brett C.

Armin Rigo

Brett C.

tags

participants (2)