Python Type-Inference based LINT.. (pylint.py)

David Jeske jeske at egroups.net
Sun Dec 5 16:15:49 EST 1999


> Jeremy Hylton wrote:
> 
> > Here's a variation that I have been thinking about; not sure if it
> > makes sense.  Instead of having types that correspond to sequence,
> > mapping, etc. use a set of micro-types that apply to specific API
> > calls.  So a type might have __getitem__ and __len__.  It's not
> > unusual to have a user-defined class that only implements a few of the
> > methods you'd expect a sequence to have.

This is a very interesting issue. On the one hand, it definetly makes
sense to let the code implicitly define the "micro-types" because that
is in fact what is actually required for the code to run correctly.

However, I'm not sure that it's incredibly useful to allow code based
on such fine grained types to static check correctly. Here is a
(somewhat long) example:

There is another goal of pylint.py, and that is to output a
ctags-esque list of what functions and variables are of what types.

(see "Theory of Operation: Stage 2" in
http://www.chat.net/~jeske/Projects/PyLint/download/pylint-19991121.py)

In this case, it seems far more relevant to upgrade to the proper type
than to provide the micro-type. For example, for the function below,
"A" seems more human readable than "B":

 def sumListElements(a_list):
   total = 0
   for an_element in a_list:
     total = total + an_element
   return total
 
-- We then extract and output one of the following type signatures:

 A) sumListElements( SequenceType of NumberType ) -> NumberType
 B) sumListElements( MicroType{
                               __getitem__(NumberType) -> MicroType{__add__},
                               __len__() -> NumberType 
                               } ) -> MicroType{__add__}


NOTE: I've left out the __coerce__, and the functional type signature
for __add__ just to make the type signature smaller. However, I hope
you get the idea.

Now that we have a picture in our heads of the "micro-type" signature
vs. the "proper type" signature of the same function. 

-- Here is the starter question:

How relevant is it to do the static checking based on B instead of A?

Could you pass an object to sumListElements which only implemented
"__getitem__" and "__len__"? Yes, you could. However, it seems
terribly fragile to allow this kind of behavior to go unchecked. When
someone is coding inside of sumListElements in the future, it seems
reasonable for them to make the assumption that they are working with
a whole-type such as a Sequence (i.e. list). They might take a look at
the function and decide that they want to fix a bug which exists:

The old version of the function assumes that the elements which are
added are going to be numbers. This version allows there to be
strings, or other addable elements:

 def sumListElements(a_list):
   total = a_list[0]
   for an_element in a_list[1:]:
     total = total + an_element
   return total

This changes the "proper type" signature to:

  sumListElements( SequenceType of AddableType ) -> AddableType

However, in doing so, it also requires that the "a_list" argument
include __getslice__, another part of SequenceType. Making the
implicit type signature closer to:

 sumListElements( MicroType{
                            __getitem__(NumberType) -> MicroType{__add__},
                            __len__() -> NumberType,
                            __getslice__(NumberType,NumberType) -> 
                                     MicroType{__getitem__(NumberType) -> MicroType{__add__}
                           } ) -> MicroType{__add__}

If you were coding the above function, I would gander that you would
consider this a reasonable change to have made. In the "proper type"
signature, it only became more general. No existing code would
break. However, if code was allowed to rely on the "micro-type"
signature and pass in objects which only implemented __getitem__ and
__len__, that code would now be broken.

The good news is that the static checker would catch the problem
anyhow. The bad news is that IMHO it seems fragile.

-- Here is the final question:

Does it seem more proper to:

 I. Do all static checking based on the micro-types, and require
    the function writer to define the type constraints if they want
    larger granularity?

    def sumListElements(a_list):
      'pylint a_list (SequenceType,)'  ## <- this is the syntax pylint.py 
                                       ## supports currently for type
                                       ## declarations
      total = a_list[0]
      for an_element in a_list[1:]:
        total = total + an_element
      return total

 II. Always "upgrade" types to proper-types during static checking.
     If someone wants to have a fine-grained type which just includes
     __getitem__ and __len__, then force them to at least declare
     this as a proper type, like "SimpleSequenceType" and then
     show the signature as:

     sumListElements( SimpleSequenceType of AddableType ) -> AddableType


-- 
David Jeske (N9LCA) + http://www.chat.net/~jeske/ + jeske at egroups.net
                    eGroups Core Server Engineering




More information about the Python-list mailing list