[Types-sig] Proposed Goals PEP

Krishnaswami, Neel neelk@cswcasa.com
Tue, 13 Mar 2001 14:09:59 -0500


Paul Prescod [mailto:paulp@ActiveState.com] wrote:
> "Krishnaswami, Neel" wrote:
> > 
> > ...
> > This isn't going to help unless you have parametrized types. The
> > most common case of type error in my code is when I pass in a list
> > or dictionary with bogus element types. This is because usually a
> > general list or dictionary of a particular shape is being used as
> > a custom "type." Errors with incorrectly handling primitive types
> > just doesn't seem to happen to me -- even the string/sequence
> > thing basically never bites me.
> 
> How would you propose to implement type checking of potentially
> heterogenous lists against a potentially complex type (i.e. one with
> many subtypes) in an efficient manner? There are limits to the
> performance price I am willing to ask users to incur for better type
> checking!

Off the top of my head: 

First, we could under the covers add fields in the list and 
dictionary objects noting what key and element types they have. 
When a value is created with a type declaration, then the key
and element type fields are filled in, and whenever __setitem__ 
is called, a typecheck can be done to make sure that the 
modification is ok. In addition to signalling an error at the 
point the invariant breaks, this also ensures that the precise 
type is never wrong.

Eg (with bogus example syntax):

  >>> d :: {Int: String} = {3: "foo", 4: "bar"}
  >>> d[17] = 42
  TypeError: d is of type {Int: String}, assigned an integer. 

If a dictionary d is created without a type declaration, then the
key and element fields are not set. (More likely, we'd want a 
new type like Object or Top or Any to represent arbitrary values.)

When a function with type declarations receives a value, it 
can first check to see if the value has a precise type. If it
does, then the typecheck is fast. It's only if the value's type
is less precise than the declaration that you need to walk all 
the elements of the structure. Dynamically-checked code doesn't 
need to do this at all, leading to the following speed hierarchy:

1: untyped code fed values with any types
2: typed code fed values with precise types
3: typed code fed values with imprecise types

Like I said, this is off the top of my head. There are obvious
refinements, that might or might not be useful: keeping track 
of the precise type even of untyped lists and dictionaries. I 
think it's probably okay to special-case lists and dictionaries,
since they are the most common source of such errors. 

Here's an implemenation to give you the idea:

import UserDict

class TypedDict(UserDict.UserDict):
    def __init__(self, key_type, val_type, data=None):
        self.key_type = key_type
        self.val_type = val_type
        self.data = {}
        if data != None:
            for (k, v) in data.items():
                self[k] = v

    def __setitem__(self, k, v):
        if isinstance(k, self.key_type) and isinstance(v, self.val_type):
            self.data[k] = v
        else:
            raise TypeError, ("key %s, val %s" % (str(k), str(v)))

class TypedDict(UserDict.UserDict):
    def __init__(self, key_type, val_type, data=None):
        self.key_type = key_type
        self.val_type = val_type
        self.data = {}
        if data != None:
            for (k, v) in data.items():
                self[k] = v

    def __setitem__(self, k, v):
        if isinstance(k, self.key_type) and isinstance(v, self.val_type):
            self.data[k] = v
        else:
            raise TypeError, ("key %s, val %s wrong type" % 
                              (str(k), str(v))) 
        
    def check(self, key_type, val_type):
        if self.key_type == key_type and self.val_type == val_type:
            return 1
        else:
            ans = 1
            for k, v in self.data.items():
                if not (isinstance(k, key_type) and 
                        isinstance(v, val_type)):
                    ans = 0
                    break
            return ans

> Also note that even without parameterized type checks, you would often
> catch your error closer to where it occurred than you would 
> today. Yes, you might not catch it at the point you pass in the list 
> but you would catch it as soon as you passed one of the arguments to 
> a sub-function.

I'm unconvinced -- collections are where typechecking is most 
needed, and where it is least provided. (Excepting ML and Haskell.)
 
--
Neel Krishnaswami
neelk@cswcasa.com