[Python-Dev] Allowing u.encode() to return non-strings

Thu Jun 17 11:43:15 EDT 2004

> M.-A. Lemburg wrote:
> > Now that more and more codecs become available and the scope
> > of those codecs goes far beyond only encoding from Unicode to
> > strings and back, I am tempted to open up that restriction,
> > thereby opening up u.encode() for applications that wish to
> > use other codecs that return e.g. Unicode objects as well.
> > [...]
> > Note that codecs are not restricted in what they can return
> > for their .encode() or .decode() method, so any object
> > type is acceptable, including subclasses of str or
> > unicode, buffers, mmapped files, etc.
> 
> +1. I find it surprising that the restriction exists. I would have
> thought u.encode('foo') would pretty transparently wrap the foo
> codec's .encode().
> 
> This is also a good reminder that type checking of the result of
> codec or unicode .encode() calls is prudent, anytime.

May I make one tiny objection?  I don't know if it's enough to stop
this (I value it at -0.5 at most), but this will make reasoning about
types harder.  Given that approaches like StarKiller and IronPython
are likely the best way to get near-C speed for Python, I'd like the
standard library at least to make life eacy for their approach.

The issue is that currently the type inferencer can know that the
return type of u.encode(s) is 'unicode', assuming u's type is
'unicode'.  But with the proposed change, the return type will depend
on the *value* of s, and I don't know how easy it is for the type
inferencers to handle that case -- likely, a type inferencer will have
to give up and say it returns 'object'.

(I've never liked functions whose return type depends on the value of
an argument -- I guess my intuition has always anticipated type
inferencing. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)