[Types-sig] challenge response (was: A challenge)

Greg Stein gstein@lyra.org
Wed, 15 Dec 1999 07:37:52 -0800 (PST)


On Wed, 15 Dec 1999, Guido van Rossum wrote:
> There seem to be several proposals for type declaration syntaxes out
> there, with (mostly implied) suggestions on how to spell various types
> etc.
> 
> I personally am losing track of all the various proposals.
> 
> I would encourage the proponents of each approach to sit down with
> some sample code and mark it up using your proposed syntax.  Or write
> the corresponding interface file, if that's your fancy.
> 
> I recommend using the sample code that I posted as a case study,
> including some of the imported modules -- this should be a
> reasonable but not extreme challenge.

#----------------------------------------------------------------------
import sys, find			# 1
					# 2
def main():				# 3
    dir = "."				# 4
    if sys.argv[1:]:			# 5
        dir = sys.argv[1]		# 6
    list = find.find("*.py", dir)	# 7
    list.sort()				# 8
    for name in list:			# 9
        print name			# 10
					# 11
if __name__ == "__main__":		# 12
    main()				# 13
#----------------------------------------------------------------------

Presume that find.find is declared as:
  def find(pattern!StringType, dir = os.curdir!StringType)!ListType:

At the moment, I'm going to use '!' for the type declarators and the
type-assert operator. A placeholder. Also, the ListType in the above
declaration could be updated at some point, if we presume that more
complex type declarator syntax is designed (allowing List<String>).

In my view, the type system does not prove correctness, but only type
safety. In that sense, we don't need to worry about things like
termination or whether main() actually gets called, or whatever.
[ Guido's case study was concerned with algorithmic correctness ]

At the beginning of the compilation/inference process, the compiler knows
that __name__ is a string (technically: it knows the type for each name in
the module's namespace (the others are '__builtins__' and '__doc__')). The
import states that "sys" and "find" are ModuleType (which means sys.argv
and find.find will be okay from an operational point; it would still need
to check if they exist).

Line 3: the compiler defines main() to take no arguments and to have a
PyObject return value.

Line 4: "dir" now has a String value.

Line 5: The compiler knows sys is a module so the attribute access is
fine. It now must verify that argv exists.
  Problem #1: I'm not sure how it would do this without loading the
              module.
  Caveat #2: I do not have a proposal for stating sys.argv's type. This
             would be part of the interface stuff (which I would defer).
  Caveat #3: alternative to #2: we could hardcode knowledge of "sys"

For this line, the compiler cannot ensure that the [1:] would not raise an
error. This can be solved by introducing type info (interfaces), or we can
alter the line to:

    if (sys.argv!ListType)[1:]:			# 5

At this point, the compiler will assume it is a List for the rest of the
function.
  Caveat #4: we would need to determine how strict we want to be about the
             possibility of external changes to objects. Another thread
             could change the type before the next usage (or the
             find.find() could, but we don't use sys.argv after that)

If the compiler knows it is a list, then it also knows the [1:] would
succeed.

Line 6: in the absence of List<String> type information, the "dir" would
now become a (PyObject, String) which is simplified to (PyObject,).
  Caveat #5: adding List<String> concepts would keep "dir" as a String, as
             the inferencer would understand the indexing operation.

Line 7: per caveat #1, assume the compiler can access the find.find()
function. From that, it knows the signature. The first parameter has a
matching type, but the second (PyObject) does not match the required type
(String), so an error is raised. If caveat #5 is resolved, then the second
parameter matches. It is also possible to avoid the error by rewriting:

    list = find.find("*.py", dir!StringType)	# 7

"list" is now a ListType, based on the find.find() return value. (see
caveat #5 -- it could be possible to refine this knowledge).

Line 8: this is fine -- the inferencer knows List has a sort() method and
what the sort method's signature is.

Line 9: again, this is okay, based on the the inferencer's knowledge of
"for" statements and Lists. "name" is assigned a PyObject type (unless we
resolve caveat #5).

Line 10: the print succeeds, as any object can be printed.

Line 12: any comparison is valid, so this is fine. The compiler does
happen to know that __name__ is a string, though.

Line 13: the invocation matches the definition. No problem.


-----------------------------

IMO, the best thing to improve the system here is to introduce
parameterized lists (and dicts, tuples, etc). In fact, this would be
necessary to avoid the error at line 7 (without rewriting).

The following problem needs to be resolved:

1) how to fetch type information from modules without necessarily loading
   them

[ if this is unsolved then all attribute accesses become PyObject values.
  The type system would be pretty useless since you wouldn't even get
  function information. ]

The caveats listed are desired to be resolved:

2) how to specify types for module/class attributes
3) hardcode type information in the absence of a solution for #2
4) what sort of notions of "const" do we provide -- can the types of
   things change? (this may be moot with an interface present)
5) provide a syntax for composite/complex types

Summary of changes to the case study code:

1) find.find() definition altered.

[ it is certainly possible that fnmatch.* could be altered, but that is
  not necessary from the standpoint of the example code. The inferencer
  goes no further than the find.find() definition. ]

Optional changes, if the caveats are not resolved:

1) add !ListType to line 5
2) add !StringType to line 7

Miscellaneous notes:

* note the absence of declarations for "dir", "list", and "name".
* only one change was made in the "find" library to support type safety
  for the example code. The example code itself had no alterations
  (subject to the noted caveats)

Underlying proposal:

* add type declarator syntax
* add declarators to function args and return value (example provided for
  discussion purposes; I do believe a ':' is not possible and that a "name
  syntactic-marker typedecl" form is the proper form)
* add type-assert operator ('!' for discussion purposes)
* add type inferencing for associating types with names in the global and
  local scope. all other type information is imported rather than globally
  computed. inferencing does not occur over multiple, local scopes (in
  other words, we can process one function at a time, independent of the
  other functions)

TODO:

* I just realized the presence of the "global" statement throws off a lot
  of stuff. Type inferencing and/or checking will be harder and/or require
  a second pass if a global is used and new type is added to the possible
  set of types for the global.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/