python philosophical question - strong vs duck typing

Devin Jeanpierre jeanpierreda at gmail.com
Wed Jan 4 07:22:09 EST 2012


> Since Python does not 'silently convert types' as I understand those 3
> words, you lose me here. Can you give a code example of what you mean?

I mean the reasoning behind the arguments like
'X isn't strongly typed because 2 + "3" = 5 but "3" + 2 = "32"'.
OCaml considers this a problem and bans all implicit conversions whatsoever.
(Or maybe that's more to do with making their type inference better?)

I actually stole the classification of continuums from a Python wiki
page, and my terminology was influenced by the reading. See:
http://wiki.python.org/moin/StrongVsWeakTyping

> I am aware that typed names lead to typed function signatures and that some
> languages in the ML family even type operators, so that different operators
> are required for 1+2 and 1.0+2.0 and mixed operations like 1+2.0 are
> prohibited. But we are talking about the typing of objects here.

I should state up-front that I'm not sure we can only talk about the
typing of objects.

There's a bit of a problem in that "dynamically typed" and "statically
typed" aren't exactly things that can be compared -- they're almost
orthogonal. Almost. Dynamically typed languages like Python keep, by
necessity, information about what they are and what they came from (at
run-time). (The alternative is to be untyped, which is bad).
Statically typed languages do not always have to keep them around. If
you do enough at compile-time, you can forget everything at run-time.
Run-time and compile-time are completely different, and a language can
do any combination of things at both. It's weird that we've decided
that there's only two options rather than four, and that we really
want to compare these two options.

I'm not sure how that fits into what I've said. Maybe I've already
contradicted myself.

Anyway, in answer, OCaml forbids both at run-time _and_ compile-time
the coercion of floats to ints via the + operator, and in doing so
enforces its notion of strong typing. You can't add a float to an int.
So in this notion, it allows less implicit type conversion -- and
run-time object conversion -- than Python, which freely converts
floats to ints. This philosophy extends elsewhere in OCaml.

> Using induction, I can prove, for instance, that these two functions
>
> def f_tr(n, val=base): # tail recursive
>    if n:
>        return f_tr(n-1, rec(n, val))
>    else:
>        return val
>
> def f_wi(n, val = base):  # while iterative
>    while n:
>        n, val = n-1, rec(n, val)
>    return val
>
> are equivalent, assuming enough stack and normal procedural Python
> semantics. (And assuming no typos ;-).
>
> f_tr(0) == base == f_wi(0)
> f_tr(n+1) == f_tr(n+1, base) == f_tr(n, rec(n+1, base))
> == by inductive hypothsis
> f_wi(n, rec(n+1, base)) == f_wi(n+1, base) == f_wi(n+1)
>
> So it is not clear to me what such proofs have to do with types as I
> understand the term.

Ah, I messed up and didn't explain that. It doesn't help that I've
been too lazy to actually work with ATS proofs directly, so I might
say some things that are wrong/vague.

In some languages -- particularly dependently-typed languages -- a
type can represent a computation. That is, the return type of a
function might essentially be the computation that we wish to do, and
therefore if the function typechecks then we know it does that
computation. (This has all the caveats of formal methods as a whole).
Many (all?) of the automated theorem provers are dependently typed
languages in this fashion.

In particular, if you pick some well-defined decidable computational
language, you can try to make your type system powerful enough to
encode it. (For example, the primitive recursive functions are a nice
subset.)

For an example, in ATS the following is a compiler-type which can be
used as part of the type declaration for a function:

dataprop FIB (int, int) =
  | FIB0 (0, 0) | FIB1 (1, 1)
  | {n:nat} {r0,r1:int} FIB2 (n+2, r0+r1) of (FIB (n, r0), FIB (n+1, r1))

The predicate FIB(X, Y) is true when Fibonacci(X)  = Y, so this is
essentially the same as defining fibonacci recursively:

Fibonacci(0) = 0
Fibonacci(1) = 1
Fibonacci(A+2) = Fibonacci(A+1) + Fibonacci(A)

It encodes the recursive definition of fibonacci, and if we declare a
function as "taking a value m, returning a value n, such that FIB(m,
n)", then a successful typecheck is equivalent to a successful proof
that the implementation is correct.

How does that typecheck work? I don't really know, I haven't at all
investigated the proof structure for ATS. Sorry. :(

Some cursory reading says that a function can return a proof that is
typechecked against dataprop, as well as the value. Maybe that's it.
:)

> I can imagine that if one overloads 'type' with enough extra information,
> the procedural reasoning involved in such proofs might reduce to a more
> calculational demonstration. The question is whether the months and years of
> intellectual investment required on the front end pay off in reduced
> intellectual effort across several applications.

Yes. Well, the compiler writers think so. I'm not really sure. Often
the incentives are misaligned -- it might not be worth the type in the
global economy, but professors need to research _something_. There's
also the question of whether it's worth spending all that effort to
satisfy a computer: does it really reduce error rates? Even if it
does, is it cost-effective at it?  Some of the things I've read says
that this might not be the right place to apply formal methods (but
rather that the best place is when writing the specification), but
it's a little bit of an under-researched field. [see
http://lambda-the-ultimate.org/node/4425 ]

> I happen to be working on a book on inductive algorithms that will include
> several such equivalences like that above. I will stick with the sort of
> proof that I can write and that most can understand.

The way I see it, mathematical english is a pseudocode for a
particular kind of declarative programming language. There isn't much
point leaving the pseudocode, because it seems to be unambiguous
enough. Unless you want an objective observer to verify your work and
actually execute the proof/program.

I guess if my proof was responsible for decisions made in machinery
that could cost lives I'd verify the proof, but if it's just a
textbook... :)

There is also a relatively common argument against dependently typed
languages that, while they're kind of popular in a certain group of
people, some think that they are losing the focus of how we should be
proving program correctness. I'm not really sure of the merits of
this.

> I see the object model of Python as being very different from the data model
> of C. The class of an object is an internal attribute of the object while
> the type of a block of bits is an external convention enforced by the
> compiler. I see Python's objects as more similar to the objects of many
> functional languages, except that Python has mutable objects.

Sorry, yes, the model is very different. I was trying to compare the
benefits. I asserted that the primary benefit of Python over C, in
terms of errors, is not that the errors no longer exist, it's that
they are in a much more manageable form (e.g. tracebacks instead of
corrupted data). Whereas in a strongly-statically typed language, the
benefit might be that they no longer exist at run-time. That's a
massive change.

> Cython with superset features used.

Yes. One of my pipe dream future projects is to extend Cython to
compile to ATS. But that'd require understanding Cython and ATS, and I
understand neither. (I was writing some Cython code today and I just
keep blowing up the compiler...)

> I believe Cython disables the run-times checks that it can tell are
> unnecessary. For instance, "for i in range(len(array)): array[i]" does not
> need a runtime check that i is in range(len(array)) and hence can convert
> array[i] to its unchecked C equivalent.

This is exactly the sort of thing! ATS, being a systems language,
makes you insert the check yourself :(



> To really eliminate errors, we need a multiplicity of types. We need not
> just floats, but non-negative floats, positive floats, and 'complex' floats
> that get promoted to complex as necessary. But the checks have to be
> dynamic, since in general, only constants can be statically checked at
> compile time. Or you need relative 'x is bigger than y' pair types so that
> x-y can be determined to be positive.
>
> Hmmm. I guess that one could say that such a statement in the course of a
> proof amounts to a type statement. And the same for loop invariants.

This excites me. I think you get the appeal of value-dependent types.
Although I think I expressed myself poorly, as I haven't brought
across that languages like ATS do this at compile time (or maybe your
emphasis is on general case?)

In ATS, I can define all of these like so[*]:

typedef NonNegativeFloat = [a:float | a >= 0.0] float(a)
typedef PositiveFloat = [a:float | a > 0.0] float(a)
typedef SortedStrictPair = [a,b:float | a > b] @(a, b)

And furthermore, it agrees with you on loop invariants: such
constraints can form as the type of a function, and all loops are
written as tail-recursive functions... :)

And of course ATS uses all this type information to prove things.

It's absolutely exciting stuff.

[*] Actually this isn't true, because the compiler doesn't understand
floats. I haven't asked why, but I suspect that it's because float
arithmetic works in ways that could confuse the compiler -- e.g. there
exists an a > b > 0, such that a - b = a, which violates the
mathematical laws that ATS uses to infer things about numbers. I would
probably would need to use an infinitely-precise rational type to make
it work (quote notation? :D). There's a run-time float type, of
course. It might also be possible to define your own compile-time type
that only defines <, ==, and >, without dealing with issues like
arithmetic.

> Yes, only objects are (strongly ;-) typed. Names are untyped. Function
> parameters (signatures) are only directly typed as to whether the
> corresponding arguments are required or not and whether the matching of
> arguments and parameters is by position, name, or both. The 'type' of Python
> is something like "object-class based, procedural syntax, certain functional
> features". Strong/weak has nothing to do with Python itself.

This is a viewpoint I hadn't considered. Yes, you could also object to
what is typed and what isn't. I am not really sure how to do it
without static typing though. How does one duck-type a name?

> Just got it after a minute delay.

Hum, perhaps my paste is making the rounds on reddit? ;)

Sorry about that. I forgot that this happens with that pastebin. What
would be the recommended way of sending such a large and useless file
on a mailing list?

-- Devin

On Wed, Jan 4, 2012 at 1:37 AM, Terry Reedy <tjreedy at udel.edu> wrote:
> On 1/3/2012 8:04 PM, Devin Jeanpierre wrote:
>
>>> Can you give an actual example of such a definition of 'strongly typed
>>> object' that excludes Python objects?
>>
>>
>> I'd love to, but I neglected to above because I don't really know
>> where we share ideas, and I wouldn't want to jump ahead and say
>> something that seems outlandish. I hope I've done a decent enough job
>> below. Stop me if I say something you don't agree with.
>>
>> Here's a short version: The term often refers to a continuum, where
>> some languages are "stronger" than others, but there is no objective
>> definition of "strong" -- it's like asking how big is "big", or how
>> smart is "smart".
>
>
> Fine so far. cold/hot, good/bad, etc are false dichotomies. Better to use
> there as comparatives -- better/worse that average, or than the middle 1/2.
> I don't use 'strong type' unless someone else does, and maybe I should stop
> being suckered into using it even then ;-).
>
>
>> The actual choice of what continuum strength refers
>>
>> to varies between people -- some specify it as "reinterpretation of
>> bits in memory", in which case every dynamic language in the world is
>> strongly typed (so I'll move on from that),
>
>
> Fine.
>
>
>> or some as "implicit
>> conversions between types", which is iffy because any function or
>> overloaded operator can silently convert types (at least in Python),
>
>
> Since Python does not 'silently convert types' as I understand those 3
> words, you lose me here. Can you give a code example of what you mean?
>
>
>> meaning the language is as weak as the user (?). But they're both kind
>> of reasonable and held by a few people, and it's easy to see how maybe
>> somebody used to significantly more strict rules (OCaml, for example)
>> might accuse Python of not being all that strong.  So you see it isn't
>> really a change of definition, it's a redrawing of the lines on a
>> continuum. The OCaml people are reasonable for saying what they say,
>> because their point of reference is their favorite language, just as
>> ours is ours. To us, maybe Haskell and OCaml are insanely
>> strong/"strict", to them, we're insanely weak.
>
>
> I am aware that typed names lead to typed function signatures and that some
> languages in the ML family even type operators, so that different operators
> are required for 1+2 and 1.0+2.0 and mixed operations like 1+2.0 are
> prohibited. But we are talking about the typing of objects here.
>
>
>> Here's another take, more in line with what I really meant: the
>> continuums expressed above aren't universal either. IME they're chosen
>> by dynamic typing advocates, which are worried about code readability
>> and certain styles of immensely common errors that come from
>> lower-level languages like C. Static typing advocates (at least in the
>> functional programming language family) are instead concerned about
>> correctness. For them, type systems are a way to reduce the number of
>> errors in your code _in general_, not just the specific cases of type
>> coercion or memory-safety. In this particular sense, Python is not
>> very far from C
>
>
> I see the object model of Python as being very different from the data model
> of C. The class of an object is an internal attribute of the object while
> the type of a block of bits is an external convention enforced by the
> compiler. I see Python's objects as more similar to the objects of many
> functional languages, except that Python has mutable objects.
>
> ...
>
>
>> Type systems exist which are much "stronger" in what they can protect
>> you against. As an extreme example, one might take a dependently-typed
>> language. A statically dependently typed language can protect you
>> against out-of-bounds array access with a proof at compile time. In
>
>
> I am familiar with the length-parameterized array types of Pascal (which
> seem in practice to be a nuisance), but I see from
> https://en.wikipedia.org/wiki/Dependent_type
> that there is a lot more to the subject.
>
>
>> fact, a statically-dependently-typed language can do things like prove
>
>
> sometimes, though not always,
>
>
>> that two different implementations of the same function are in fact
>> implementations of the same function -- this is neat for implementing
>> a non-recursive version of a recursive function, and verifying the
>> change doesn't influence semantics.
>
>
> Using induction, I can prove, for instance, that these two functions
>
> def f_tr(n, val=base): # tail recursive
>    if n:
>        return f_tr(n-1, rec(n, val))
>    else:
>        return val
>
> def f_wi(n, val = base):  # while iterative
>    while n:
>        n, val = n-1, rec(n, val)
>    return val
>
> are equivalent, assuming enough stack and normal procedural Python
> semantics. (And assuming no typos ;-).
>
> f_tr(0) == base == f_wi(0)
> f_tr(n+1) == f_tr(n+1, base) == f_tr(n, rec(n+1, base))
> == by inductive hypothsis
> f_wi(n, rec(n+1, base)) == f_wi(n+1, base) == f_wi(n+1)
>
> So it is not clear to me what such proofs have to do with types as I
> understand the term.
>
>
>> It's definitely true that there is something here that Python's types
>> cannot do, at least not without extremely heavily mangling
>> (technically __instancecheck__ means that type objects are turing
>> complete, but I request that we ignore that). These are the sorts of
>> types that can, in a very real sense, replace unit tests. Just the
>> fact that your program typechecks is proof that certain properties
>> hold, and those properties can be very soothing, such as that example
>> of proving two algorithms return the same output.
>
>
> I can imagine that if one overloads 'type' with enough extra information,
> the procedural reasoning involved in such proofs might reduce to a more
> calculational demonstration. The question is whether the months and years of
> intellectual investment required on the front end pay off in reduced
> intellectual effort across several applications.
>
> I happen to be working on a book on inductive algorithms that will include
> several such equivalences like that above. I will stick with the sort of
> proof that I can write and that most can understand.
>
>
>> One last note about static vs dynamic vs strong typing: I do believe
>> that a strongly typed dynamic+statically typed language is possible
>
>
> Cython with superset features used.
>
>
>> and a good thing. For statically typed code, run-time checks are
>> disabled (speed),
>
>
> I believe Cython disables the run-times checks that it can tell are
> unnecessary. For instance, "for i in range(len(array)): array[i]" does not
> need a runtime check that i is in range(len(array)) and hence can convert
> array[i] to its unchecked C equivalent.
>
>
>  for dynamically typed code they are enabled, and all
>>
>> the while you can make relatively sophisticated queries about the type
>> of a thing using an interactive interpreter (somewhat similar to
>> unifying things in the prolog interpreter, in my view).
>
>
> There is n
>
>
>> So now that we've gotten this far, here's my proposition: It is
>> reasonable to consider type systems in terms of how many errors they
>> eliminate in your programs, rather than just readability/memory-safety
>
>
> And interesting and for me, different viewpoint. I tend to be interested in
> correctness through understanding, or, if you will, logical correctness.
> Textual correctness, especially with what I do with Python, is a secondary
> issue.
>
> Memory-safety does not directly concern me, since I am not writing or
> running servers. But it is a major target of malware.
>
> To really eliminate errors, we need a multiplicity of types. We need not
> just floats, but non-negative floats, positive floats, and 'complex' floats
> that get promoted to complex as necessary. But the checks have to be
> dynamic, since in general, only constants can be statically checked at
> compile time. Or you need relative 'x is bigger than y' pair types so that
> x-y can be determined to be positive.
>
> Hmmm. I guess that one could say that such a statement in the course of a
> proof amounts to a type statement. And the same for loop invariants.
>
>
>> concerns. Also, it is reasonable to have relatively strict standards,
>> if you're used to a very "strong" type system. Such a person might,
>> very reasonably, decide that Python is quite weak.
>
>
> Yes, only objects are (strongly ;-) typed. Names are untyped. Function
> parameters (signatures) are only directly typed as to whether the
> corresponding arguments are required or not and whether the matching of
> arguments and parameters is by position, name, or both. The 'type' of Python
> is something like "object-class based, procedural syntax, certain functional
> features". Strong/weak has nothing to do with Python itself.
>
>
>> I admit that a lot of what I said here is putting words in other
>> peoples' mouths, if you object, pretend I said it instead.
>
>
> No objections except as noted.
>
>
>> I am, after all, a fan of ATS at the moment, and thus have some
>> interest in ridiculously strongly typed languages ;)
>
> I see that in the list in the Wikipedia article.
>
>
>> *phew*. That took a lot of time. I hope I haven't lost your attention.
>
>
> No. I might even have learned something.
>
>
>> [ An example of a simple dependently typed program:
>> http://codepad.org/eLr7lLJd ]
>
>
> Just got it after a minute delay.
>
>
> --
> Terry Jan Reedy
>
> --
> http://mail.python.org/mailman/listinfo/python-list



More information about the Python-list mailing list