[Python-Dev] Allowing u.encode() to return non-strings
Phillip J. Eby
pje at telecommunity.com
Fri Jun 18 14:28:16 EDT 2004
At 09:59 PM 6/17/04 -0400, Jeremy Hylton wrote:
>On Thu, 17 Jun 2004 08:43:15 -0700, Guido van Rossum <guido at python.org> wrote:
> >
> > The issue is that currently the type inferencer can know that the
> > return type of u.encode(s) is 'unicode', assuming u's type is
> > 'unicode'. But with the proposed change, the return type will depend
> > on the *value* of s, and I don't know how easy it is for the type
> > inferencers to handle that case -- likely, a type inferencer will have
> > to give up and say it returns 'object'.
>
>Who cares about the type inference <0.2 wink>. It's harder for the
>reader of the program to understand if encode() returns a different
>type. Would there be some common property that all encode() return
>values would share? Can't think of one myself.
Indeed. What does this proposal offer that writing 'somefunc(u)' in place
of 'u.encode("somecodec")' doesn't? Unicode streams aren't going to work
with this, right? And anything else that already uses '.encode()' is going
to expect a string.
In the former case, you know you have to look at 'somefunc' to know what's
returned, but in the latter, you are encouraged to think that it's a
string, and tempted to worry about the details of the actual encoding
later, even if you don't recognize the codec name.
Anyway, it seems to me that things returned from u.encode() should either
be strings or "stringlike". Maybe implementing the read character buffer
interface should suffice? But I don't think this should be opened up to
any old objects without some kind of defined invariant that they should
satisfy.
More information about the Python-Dev
mailing list