[Python-Dev] Allowing u.encode() to return non-strings
Brett C.
bac at OCF.Berkeley.EDU
Mon Jun 21 18:21:09 EDT 2004
Guido van Rossum wrote:
>>M.-A. Lemburg wrote:
>>
>>>Now that more and more codecs become available and the scope
>>>of those codecs goes far beyond only encoding from Unicode to
>>>strings and back, I am tempted to open up that restriction,
>>>thereby opening up u.encode() for applications that wish to
>>>use other codecs that return e.g. Unicode objects as well.
>>>[...]
>>>Note that codecs are not restricted in what they can return
>>>for their .encode() or .decode() method, so any object
>>>type is acceptable, including subclasses of str or
>>>unicode, buffers, mmapped files, etc.
>>
>>+1. I find it surprising that the restriction exists. I would have
>>thought u.encode('foo') would pretty transparently wrap the foo
>>codec's .encode().
>>
>>This is also a good reminder that type checking of the result of
>>codec or unicode .encode() calls is prudent, anytime.
>
>
> May I make one tiny objection? I don't know if it's enough to stop
> this (I value it at -0.5 at most), but this will make reasoning about
> types harder. Given that approaches like StarKiller and IronPython
> are likely the best way to get near-C speed for Python, I'd like the
> standard library at least to make life eacy for their approach.
>
> The issue is that currently the type inferencer can know that the
> return type of u.encode(s) is 'unicode', assuming u's type is
> 'unicode'. But with the proposed change, the return type will depend
> on the *value* of s, and I don't know how easy it is for the type
> inferencers to handle that case -- likely, a type inferencer will have
> to give up and say it returns 'object'.
>
If you use something like the Cartesian product algorithm (what
StarKiller uses) then for different call signatures a new inferred
return type is done for a method. But this pretty much only works with
Python code since you have full access to the source to do the analysis
again. With Unicode stuff being done in C, you would have to just take
the lowest common-denominator result, which would be 'object' since you
can't reanalyze the execution path for different call signatures unless
someone wants to take the pain of type inferring C code. Otherwise this
type fo case can be taken into consideration when developing a type
inferencing framework that deals with C code, but that just seems
painful and overly complicated.
-Brett
More information about the Python-Dev
mailing list