[Python-Dev] Allowing u.encode() to return non-strings

Fri Jun 18 14:28:16 EDT 2004

At 09:59 PM 6/17/04 -0400, Jeremy Hylton wrote:
>On Thu, 17 Jun 2004 08:43:15 -0700, Guido van Rossum <guido at python.org> wrote:
> >
> > The issue is that currently the type inferencer can know that the
> > return type of u.encode(s) is 'unicode', assuming u's type is
> > 'unicode'.  But with the proposed change, the return type will depend
> > on the *value* of s, and I don't know how easy it is for the type
> > inferencers to handle that case -- likely, a type inferencer will have
> > to give up and say it returns 'object'.
>
>Who cares about the type inference <0.2 wink>.  It's harder for the
>reader of the program to understand if encode() returns a different
>type.  Would there be some common property that all encode() return
>values would share?  Can't think of one myself.

Indeed.  What does this proposal offer that writing 'somefunc(u)' in place 
of 'u.encode("somecodec")' doesn't?  Unicode streams aren't going to work 
with this, right?  And anything else that already uses '.encode()' is going 
to expect a string.

In the former case, you know you have to look at 'somefunc' to know what's 
returned, but in the latter, you are encouraged to think that it's a 
string, and tempted to worry about the details of the actual encoding 
later, even if you don't recognize the codec name.

Anyway, it seems to me that things returned from u.encode() should either 
be strings or "stringlike".  Maybe implementing the read character buffer 
interface should suffice?  But I don't think this should be opened up to 
any old objects without some kind of defined invariant that they should 
satisfy.