Python Sanity Proposal: Type Hinting Solution

Sat Jan 24 11:05:33 EST 2015

Mario Figueiredo wrote:

> In article <54c39e48$0$12996$c3e8da3$5496439d at news.astraweb.com>,
> steve+comp.lang.python at pearwood.info says...
>> 
>> I'm not sure if you're making a general observation or one which is
>> specific
>> to Python. Plenty of languages have static analysis as a language
>> feature. Are you arguing they are wrong to do so?
>> 
> 
> No. I'm arguing static analysis should not consume a programming
> language keyword space. Static analysis is just a development tool.

Hmmm. I don't see how that could possible work in practice. Let's take C
syntax for example:

int i;

declares i to be an int. How is this supposed to work without making int a
reserved word? Likewise for Pascal:

i: Integer;

In Python, built-in functions aren't reserved words, but we surely don't
expect a lexical type-checker to evaluate arbitrary code in order to check
whether int has been shadowed:

import random
if random.random() < 0.5:
    int = str

def spam(n:int):
    """Or if you prefer docstrings:
    @param n: int
    """

Even though int is not actually reserved, the type-checker is permitted to
behave as if it were, and it is difficult to see how it could possibly work
in practice if it did not behave that way.

At least it is difficult for me.

>> def myfunction(arg1, arg2):
>>     """
>>     Normal docstring.
>>     @typehint: (str, int) -> bool"""
>>     return True
>> 
>> One of the problems with this is that it put the information about
>> parameters far away from the parameter list itself.
> 
> Then move it to the first line of the docstring...

Which is close to the current convention used by document generators.

>> > I removed the arguments names on purpose. They are only necessary on
>> > the
>> > PEP because type hinting is a part of the function header there.
>> > However, when using a documentation like pattern as above (or as in
>> > your own example), they can be safely removed, with the added benefit
>> > of making the syntax simpler.
>> 
>> Only at the cost of making it hard to read.
> 
> If you want the redundancy and a potential source of new bugs by having
> a type hint in a docstring include argument names... You see, there's
> always a downside to everything.

True, true.

But you can't really avoid duplicating the name in the documentation, just
as you can't avoid duplicating the name in the body of the function. How
can you talk about the parameter if you don't refer to it by name? Such
unavoidable duplication of names is not what most people call "redundancy".

> Meanwhile, object names have nothing to do with type analsys, which
> makes arguments names rather irrelevant in the context of static
> analysis. I'd rather promote types than names. Remove names and you
> will.

How can type analyse avoid dealing with names?

def function(arg: int):
    x = arg + 1
    arg.append(None)

How does the type checker know that "x = arg + 1" is legal, but arg.append
is not? It has to know that since arg has not been rebinded yet, arg is an
int, not a list. It cannot do that unless it knows the name arg.

Type analysis has to deal with more than *just* names. Ideally, it should be
able to infer the type of a whole expression. But it *must* be able to deal
with names -- it would be a pretty poor type-checker that couldn't warn
about this:

    n = 1
    n.sort()

>> >      "@typehint: (str, int) -> bool"
>> >      def myfunction(arg1, arg2):
>> 
>> That's somewhat better, in that at least the hint is close to the
>> function signature. But it has a lot of disadvantages: it is compile-time
>> only, the type hints aren't available at runtime.
> 
> It's static analysis. You don't need runtime execution.

You don't "need" it, but Python has it, so that's what any alternative has
to target. If an alternative fails to met that target, that doesn't
necessarily rule the alternative out, but it is a point against it.

If you want to talk about programming languages in their full generality,
please say so, otherwise I'll assume you're referring to Python. Python
offers rich and powerful introspection abilities, and annotations are
available at runtime, which enables runtime access to the type hints. Any
alternative which does not offer that same runtime introspection is missing
an important piece of functionality.

>> It requires extra complexity to the
>> parser, so that decorators may be separated from the function by a hint:
>> 
>> @decorate
>> "@typehint: (str, int) -> bool"
>> def myfunction(arg1, arg2):
>> 
>> No doubt some people will get them the wrong way around, and the type
>> checker may silently ignore their hints:
>> 
>> "@typehint: (str, int) -> bool"
>> @decorate
>> def myfunction(arg1, arg2):
>> 
>> And others will write:
>> 
>> @decorate
>> @typehint(str, int) -> bool
>> def myfunction(arg1, arg2):
>> 
> 
> That seems like you are fishing. What is exactly your point? That people
> will not be able to understand the rules of type hinting?

No. My point was in the next sentence of my post:

"Some syntax will be a bug magnet. This is one."

I believe that your suggestion of a magic string that looks like a decorator
but isn't would be a bug magnet. You have no help from the compiler. If you
get the syntax of an annotation wrong, you get a syntax error:

py> def spam(x int):
  File "<stdin>", line 1
    def spam(x int):
                 ^
SyntaxError: invalid syntax

but if you get one of your magic strings wrong, nothing happens because it's
just a string and the compiler ignores it. The type-checker, on the other
hand, is caught between two bad alternatives:

- complain about every string it doesn't know how to parse, which 
  will annoy anyone who uses bare strings for any other purpose; or

- silently ignore any such string with an error.

Both alternatives are bad. There is no good way to deal with this, either
annoy the user with false positives, or fail silently. 

Not all alternatives are equally good. There are at least five choices for
where to put type hints/declarations:

- in the function signature (annotations);
- in the docstring;
- applied using a decorator;
- using a specially formatted string (other than the docstring);
- using a stub file.

These are not equally as good, and the choice between them is not merely a
matter of personal preference of what looks nicer. There is about 60 years
worth of prior art on type declarations, going back to Fortran in the
1950s, and we can be certain that putting the type declaration in the
signature is the least likely to go wrong. We may need (e.g.) stub files as
a fallback for when we can't use the function signature, but that doesn't
mean we should forgo the best solution when it is available.

> And that 
> somehow is going to be an universal problem?
> 
>> Some syntax will be a bug magnet. This is one.
> 
> You think?

Yes I do.

-- 
Steven