[Python-Dev] Allowing u.encode() to return non-strings

Tue Jun 22 07:53:10 EDT 2004

On Tuesday 2004-06-22 03:37, Guido van Rossum wrote:
>>>             But with the proposed change, the return type will depend
>>> on the *value* of s, and I don't know how easy it is for the type
>>> inferencers to handle that case -- likely, a type inferencer will have
>>> to give up and say it returns 'object'.
>> 
>> When looking for near-C speed, type inferencing is most important
>> for a relatively small set of particularly efficiently manipulable
>> types: most notably, smallish integers.
> 
> If type inferencing only worked for *smallish* ints it would be a
> waste of time.  You don't want the program to run 50x faster but
> compute the wrong result if some intermediate result is larger than 32
> bits.

Either I'm misunderstanding you, or that's a straw man.
I'm not saying type inference is useful if it gives the
wrong answer when non-smallish ints occur. I'm saying
it's useful if it stops providing major speedups when
non-smallish ints occur. Which is what happens in, say,
modern Lisp systems when their type inferencing can
prove that some important intermediate value is an
integer but not that it's small enough to fit in a
single word.

>> Being able to prove that
>> something is a Unicode object just isn't all that useful for
>> efficiency, because most of the things you can do to Unicode
>> objects aren't all that cheap relative to the cost of finding out
>> what they are. Likewise, though perhaps a bit less so, for being
>> able to prove that something is a string.
> 
> Hm, strings are so fundamental as arguments to other things (used as
> keys etc.) that my intuition tells me that it actually would matter.

As a Python user I am required by law to have great
respect for your intuition :-), and I would do anyway,
so you may be right here. But surely most places where
strings are used so very heavily almost always *do*
get strings, so their type-checking is just a matter
of, um, checking the type (i.e., no dynamic dispatch
is needed in the common case), so if you then need to
do something non-trivial like a dict lookup the cost
of the type check is relatively rather small.

> And there are quite a few fast operations on strings: len(), "iftrue",
> even slicing: slices with a fixed size are O(1).

Yes, though it's O(1) with a rather large constant. (Except
maybe for single-character slices.) I'll agree about len and
iftrue, though.

>> At least, so it seems to me. Maybe I'm wrong. I suppose the
>> extract-one-character operation might be used quite a bit,
>> and that could be cheap. But I can't help feeling that
>> occasions where (1) the compiler can prove that something
>> is a string because it comes from calling an "encode" method,
>> (2) it can't prove that any other way, (3) this makes an
>> appreciable difference to the speed of the code, and (4)
>> there isn't any less-rigorous (Psyco-like, say) way for
>> the type to be discovered and efficient code used, are
>> likely to be pretty rare, and in particular rare enough
>> that supplying some sort of optional type declaration
>> won't be unacceptable to users. (I bet that any version
>> of Python that achieves near-C speed by doing extensive
>> type inference will have optional type declarations.)
>
> Don't forget all those other uses of type inferencing, e.g. for
> pointing out latent bugs in programs (pychecker etc.).

Sure, and I think that's a better argument. If you'd said
"We'll probably do heavy type inferencing eventually for
speed, and it's really helpful for finding bugs too, so
it would be a shame to do anything that interferes with
it" then I'd probably just have agreed :-).

>>> (I've never liked functions whose return type depends on the value of
>>> an argument -- I guess my intuition has always anticipated type
>>> inferencing. :-)
>> 
>> def f(x): return x+x
>> 
>> has that property, even if you pretend that "+" only works
>> on numbers.
> 
> No, the type of f depends on the *type* of x (unless x has a type
> whose '+' operation has a type that depends on the value of x).

Oh, I see. I misunderstood you; sorry about that. How do
you feel about the "eval" function? :-)

Slightly more seriously and digressing a little, my "f" still
has that property if you consider Python's 'int' and 'long'
to be different types (which you certainly need to do if you're
doing type inference for the sake of speed). It is (or will be)
better for most purposes to consider them a single type with
two internal representations; I wonder whether sooner or later
it will be appropriate to take the same view of string and
unicode objects... Probably later rather than sooner, for
various reasons.

-- 
g