[Python-3000] PEP 3138- String representation in Python 3000

M.-A. Lemburg mal at egenix.com
Thu May 8 19:18:06 CEST 2008


On 2008-05-06 16:10, Nick Coghlan wrote:
> Atsuo Ishimoto wrote:
>>>  I proposed to make the Unicode repr() output a regular encoding
>>>  that's being implemented by a codec. You could then easily
>>>  change the encoding to whatever you need for your application
>>>  or console.
>>
>> I think global setting is not flexible enough. And I see no benefit to
>> customizable repr() except to keep compatible with Python 2, but I
>> think it is easy to migrate the existing code to the Py3k.
> 
> There's a bigger issue with trying to make whatever repr() does a codec 
> in Py3k. As a Unicode->Unicode transformation, it doesn't mesh well with 
> Py3k's strict Unicode->bytes/bytes->Unicode encoding/decoding philosophy.
> 
> That said, it would be nice to have a way to easily stack 
> Unicode->Unicode transforms on top of text IO streams, or byte->byte 
> transforms on top of binary streams.

+1

Here's what I wrote on the ticket for the PEP. I wasn't aware
of that change, otherwise, I'd have commented on this earlier:

> On 2008-05-06 19:10, Guido van Rossum wrote:
>> Guido van Rossum <guido at python.org> added the comment:
>>
>> On Tue, May 6, 2008 at 1:26 AM, Marc-Andre Lemburg wrote:
>>>  So you've limited the codec design to just doing Unicode<->bytes
>>>  conversions ?
>>
>> Yes. This was quite a conscious decision that was not taken lightly,
>> with lots of community input, quite a while ago.
>>
>>>  The original codec design was to have the codec decide which
>>>  types to take on input and to generate on output, e.g. to
>>>  escape characters in Unicode (converting Unicode to Unicode),
>>>  work on compressed 8-bit strings (converting 8-bit strings to
>>>  8-bit strings), etc.
>>
>> Unfortunately this design made it hard to reason about the correctness
>> of code, since (especially in Py3k, where bytes and str are more
>> different than str and unicode were in 2.x) it's hard to write code
>> that uses .encode() or .decode() unless it knows which codec is being
>> used.
>>
>> IOW, when translated to 3.0, the design violates the general design
>> principle that the *type* of a function's or method's return value
>> should not depend on the *value* of one of the arguments.
> 
> I understand where this concept originates and usual apply this
> rule to software design as well, however, in the particular case
> of codecs, the codec registry and its helper functions are merely
> interfaces to code that is defined elsewhere.
> 
> In comparison, the approach is very much like getattr() - you know
> what the attribute is called, but know nothing about its type
> until you receive it from the function.
> 
> The reason codecs where designed like this was to be able to
> easily stack them. For this to work, only the interfaces need
> to be defined, without restricting the codecs too much in terms
> of which types may be used.
> 
> I'd suggest to lift the type restrictions from the general
> codecs.c access APIs (PyCodec_*), since they don't really belong
> there and instead only impose the limitation on PyUnicode and
> PyString methods .encode() and .decode().
> 
> If you then also allow those methods to return *both*
> PyUnicode and PyString, you'd still have strong typing
> (only 1 of two possible types is allowed) and stacking
> streams or having codecs that work on PyUnicode->PyUnicode
> or PyString->PyString would still be accessible via
> .encode()/.decode(). 

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 08 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611


More information about the Python-3000 mailing list