[Python-Dev] Allowing u.encode() to return non-strings

Mon Jun 21 22:37:57 EDT 2004

> > The issue is that currently the type inferencer can know that the
> > return type of u.encode(s) is 'unicode', assuming u's type is
> > 'unicode'.
> 
> Um, you don't mean that. u"foo".encode() == "foo", of type str.

Yes, my mistake in haste.

> >             But with the proposed change, the return type will depend
> > on the *value* of s, and I don't know how easy it is for the type
> > inferencers to handle that case -- likely, a type inferencer will have
> > to give up and say it returns 'object'.
> 
> When looking for near-C speed, type inferencing is most important
> for a relatively small set of particularly efficiently manipulable
> types: most notably, smallish integers.

If type inferencing only worked for *smallish* ints it would be a
waste of time.  You don't want the program to run 50x faster but
compute the wrong result if some intermediate result is larger than 32
bits.

> Being able to prove that
> something is a Unicode object just isn't all that useful for
> efficiency, because most of the things you can do to Unicode
> objects aren't all that cheap relative to the cost of finding out
> what they are. Likewise, though perhaps a bit less so, for being
> able to prove that something is a string.

Hm, strings are so fundamental as arguments to other things (used as
keys etc.) that my intuition tells me that it actually would matter.

And there are quite a few fast operations on strings: len(), "iftrue",
even slicing: slices with a fixed size are O(1).

Also, the type gets propagated to other function calls, so now you
have to analyze those with nothing more than 'object' for some
argument type.

> At least, so it seems to me. Maybe I'm wrong. I suppose the
> extract-one-character operation might be used quite a bit,
> and that could be cheap. But I can't help feeling that
> occasions where (1) the compiler can prove that something
> is a string because it comes from calling an "encode" method,
> (2) it can't prove that any other way, (3) this makes an
> appreciable difference to the speed of the code, and (4)
> there isn't any less-rigorous (Psyco-like, say) way for
> the type to be discovered and efficient code used, are
> likely to be pretty rare, and in particular rare enough
> that supplying some sort of optional type declaration
> won't be unacceptable to users. (I bet that any version
> of Python that achieves near-C speed by doing extensive
> type inference will have optional type declarations.)

Don't forget all those other uses of type inferencing, e.g. for
pointing out latent bugs in programs (pychecker etc.).

> The above paragraph, of course, presupposes that we keep
> the restriction on the return value of u.encode(s), and
> start enforcing it so that the compiler can take advantage.
> 
> > (I've never liked functions whose return type depends on the value of
> > an argument -- I guess my intuition has always anticipated type
> > inferencing. :-)
> 
> def f(x): return x+x
> 
> has that property, even if you pretend that "+" only works
> on numbers.

No, the type of f depends on the *type* of x (unless x has a type
whose '+' operation has a type that depends on the value of x).

--Guido van Rossum (home page: http://www.python.org/~guido/)