Adding static typing to Python

Tue Feb 19 18:12:47 EST 2002

ajeru at vknn.org (Alexander Jerusalem) wrote in message news:<24c39b2c.0202190059.41874b92 at posting.google.com>...
> The argument that exhaustive testing should be done anyway is
> certainly true. I nevertheless tend to think that it's better to let
> the compiler do as much as possible in terms of error checking and
> then use testing to find the true runtime errors. It just takes much
> less time than starting a test run every time just to find all my
> typos. I don't know if there are test suites that let you run all the
> branches of all your methods and show you all typos at a time. I
> suspect that I'd have to start the test suite for each and every typo
> but I'm not knowledgable enough when it comes to Python test suites to
> be sure of that claim.

I'm trying to adopt the XP habit of writing tests first, and then
only code enough to make the test pass. It's sometimes difficult.
It requires some diciplin not to get carried away while writing the
code, and do more than I've written tests for. When I do get carried
away I end up having to write a whole bunch of boring tests afterwards,
and that feels like a drag, because then the writing of tests is not
part of the creative process. And it's difficult to be sure that I
actually have all the tests I need then.

Normally, the process goes like this:
1. "Hm, what do I want this method to do?" - write a test.
2. "How do I accomplish that?" - write the code.
3. Run all unit tests for the module.

These are small cycles, where I probably don't write more
than ten lines of application code in each cycle. It's
not a new test method for each ten lines of application code,
the test methods grow just a few lines per cycle.

In my current project, I have roughly as many lines of
test code as I have application code, but don't think that 
I spend as much time writing tests as writing running code.
The test methods are mainly written once, while the 
application code is obviously changed a bit now and then... 
Perhaps 20% of my programming is coding of tests. I haven't 
measured, it might be a bit more. It's probably not more
time than I'd spent typing declarations, braces etc in static
languages. ;-)

I can only say that I feel that this works better than any
C, C++ etc project I worked with before.

Having run my entire test suite successfully, I feel fairly confident
that I haven't messed anything up with my last bug fix or new feature.
Having compiled and linked C++ code only made me feel that it was
time to do all the tedious tests...

Running my entire test suite is certainly much faster than any
C++ (or ADA I suspect) build process would be for the same amount
of program content. Running the tests for one module is not slower
than a simple compile would be in C++.

> There's another thing that bugs me with the missing static typing and
> that is that it makes code less readable. I don't know what I can do
> with a function parameter when I see it. For example, when I see a
> function like def computeSalary(person)... I don't know at first sight
> if person is the id of person, a person object and if it's an object

Well, I don't know any programming language that forces you to write
good variable names. If it's an ID, it should be "personId", not 
"person" in my opinion. You might even claim that such a misnaming
is a bug!

Imagine that the ADA programmer had written computeSalary(Id : in Oid)
Then you would have known that it was an object id, but not for what!
Both ADA and Python give fairly good support to produce clarity in 
code, although in very different ways. But you _can_ obfuscate code 
in any language...

Unfortunately I've seen a lot of C++ code where methods have
maybe a dozen parameters, most of the class string, the rest
being integers. How much wiser do you get from that? Then someone 
manages to switch the order between two string parameters...
{And when you look at sequence diagrams from Rational Rose, you
don't see variable names, just "(string, string, string..." :-( }

If you wish, you can naturally have a coding standard saying
that there should be assertions on all method parameters. Then
you get "documentation" which won't go stale.

You can also make assertions that are more qualified than simple
type tests. I see a lot of "string" variables in C++ code where
a maximum length in a database table should really be checked,
and "short" where only 0 to 100 are good values etc. Although in
this regard, ADA is much more supportive than C++.

Ok, asserts aren't checked on compile, but surely you need to run 
enough tests to call each method in every class!

> what its exact class in a Person class hierarchy is. I have to find
> the place where the constructor is called to find that out. That's not
> a big problem for small programs but if I have many classes that use
> inheritance, it's really not easy to figure out what interface a
> parameter actually supports. This is, however, a minor problem if
> everyone on the team agrees to document parameter types in the doc
> string. The language doesn't enforce it though.

doctest might be a module for you? It's always tricky with
documentation that goes out of date. assert or doc-strings used
by doctest are really verified. That's certainly a good thing.

I see my unit tests as important documentation and examples of how my
classes and methods are to be used. In my code I don't have much more
comments that the doc string that describes the purpose of each 
module/class/method and minor notes about quirks and temporary notes
about things that should be changed etc.

> And the final argument I have for static type checking is that it
> enables method dispatching based on parameter types. In a statically
> typed language you can create two methods that have the same name but
> differ on the parameter types. The correct method will be called for
> you depending on the type of the argument you pass in your call. That
> makes for a quite flexible way of extending a program. You can just
> add another method with the same name and another type to handle a
> special case without touching the existing methods.

I don't think that's a big deal. In python you can very often use
the same method without a single special case check for all those
different types that you would have needed to write separate methods
for in ADA, Java or C++.

If you really want to do completely different things based on
parameter, even though you wan't the name of the called method
to be the same, you can do something like:

import types

class A:

    def myMethod(self, a, b):
        if (type(a), type(b)) == (types.IntType, types.StringType):
            self.__int_string(a, b)
        elif type(a) == type(b) == types.IntType:
            self.__int_int(a, b)
        else:
            raise TypeError

    def __myMethod_int_int(self, a, b):
        ...

    def __myMethod_int_string(self, a, b):
        ...

But I never ran across the need to do that. After all, most
of the time, if you want the method call to have the same name,
you basically want to do the same thing, and in a dynamically
typed language, you will probably be able to share the same lines
of code for most of your method.

> I think type checking could be added as an optional feature, like in
> Dylan, without hurting the character of Python.

Sure. At least for the interface of a class. There has been a
number of proposals on that. In the typically small python methods, 
where local variables are rarely used for more than a screenful, 
I think there is little point in declaring locals, and we don't 
use much globals, do we...

> I'm currently using Python to write generators that produce Java code
> from an XML representation. So the runtime system is type checked by
> the Java compiler anyway. I tried to do the generator in Java first
> but the text processing capabilities of Java are just terrible, so I
> came to use Python and I like it for many reasons. It's just that I'm
> using much more time on debugging than I used to in Java where I
> didn't have to care too much about typos because the compiler
> complained anyway...

If we're talking about getting regular expressions to do what you
want etc, I don't think static typing would help you...

But I can recommend the method of writing tests first, and code
after. The unittest module works very well. If I have a python
module named something.py, I will have a unit test module called
something_ut.py.

This will start with "import unittest, something" and end with
if __name__ == "__main__":
    unittest.main()

In between there are classes subclassed from unittest.TestCase.
They can have methods setUp and tearDown that are run before
and after each test, and the actual tests are methods with names
that start with "test". In these tests you use methods such as
self.failUnlessEqual or self.failUnlessRaises etc. It's as simple
as that. You just run something_ut.py every time you changed
something.py.

Unless you have too big methods, these trivial little tests will
nail down more bugs than you would imagine. Every feature you see
that you want, you test for. Every bug that you find, you write a
test for. This will mean that you loose a bit of the speed in writing
python code, but it will still be a lot faster than other languages,
and for larger projects, I think this investment is well worth the
effort.

If you feel that the tests get too complicated, you probably have too
complicated code, and should consider making more, smaller methods.

I've come to a point where I feel like when I learned how to prove
things in mathematics with induction etc. You have this big problem,
but you don't need to solve it. You just look at a little piece at a
time, and if a piece isn't small enough to be (fairly) trivial, you
subdivide it, and solve that. When the pieces are all solved, your
problem is solved too. With the unit tests, you always keep on track!

Then there are some tricks to think of. Most typos will after all
give syntax errors or name errors that you find at once. The problem
is if you assign to the wrong name, and create a new variable or
attribute where you intended to update an existing one. For attributes
in classes you can get away from this problem by using __setattr__.
Always have a __setattr__ method, and don't allow any "extra" 
attributes to be set. You can check type here as well.

Ok, it's not found until run-time, but you will clearly spot where
the problem ocurs. Tracebacks are rather good...