[Python-3000] PEP 3138- String representation in Python 3000
M.-A. Lemburg
mal at egenix.com
Thu May 8 19:18:06 CEST 2008
On 2008-05-06 16:10, Nick Coghlan wrote:
> Atsuo Ishimoto wrote:
>>> I proposed to make the Unicode repr() output a regular encoding
>>> that's being implemented by a codec. You could then easily
>>> change the encoding to whatever you need for your application
>>> or console.
>>
>> I think global setting is not flexible enough. And I see no benefit to
>> customizable repr() except to keep compatible with Python 2, but I
>> think it is easy to migrate the existing code to the Py3k.
>
> There's a bigger issue with trying to make whatever repr() does a codec
> in Py3k. As a Unicode->Unicode transformation, it doesn't mesh well with
> Py3k's strict Unicode->bytes/bytes->Unicode encoding/decoding philosophy.
>
> That said, it would be nice to have a way to easily stack
> Unicode->Unicode transforms on top of text IO streams, or byte->byte
> transforms on top of binary streams.
+1
Here's what I wrote on the ticket for the PEP. I wasn't aware
of that change, otherwise, I'd have commented on this earlier:
> On 2008-05-06 19:10, Guido van Rossum wrote:
>> Guido van Rossum <guido at python.org> added the comment:
>>
>> On Tue, May 6, 2008 at 1:26 AM, Marc-Andre Lemburg wrote:
>>> So you've limited the codec design to just doing Unicode<->bytes
>>> conversions ?
>>
>> Yes. This was quite a conscious decision that was not taken lightly,
>> with lots of community input, quite a while ago.
>>
>>> The original codec design was to have the codec decide which
>>> types to take on input and to generate on output, e.g. to
>>> escape characters in Unicode (converting Unicode to Unicode),
>>> work on compressed 8-bit strings (converting 8-bit strings to
>>> 8-bit strings), etc.
>>
>> Unfortunately this design made it hard to reason about the correctness
>> of code, since (especially in Py3k, where bytes and str are more
>> different than str and unicode were in 2.x) it's hard to write code
>> that uses .encode() or .decode() unless it knows which codec is being
>> used.
>>
>> IOW, when translated to 3.0, the design violates the general design
>> principle that the *type* of a function's or method's return value
>> should not depend on the *value* of one of the arguments.
>
> I understand where this concept originates and usual apply this
> rule to software design as well, however, in the particular case
> of codecs, the codec registry and its helper functions are merely
> interfaces to code that is defined elsewhere.
>
> In comparison, the approach is very much like getattr() - you know
> what the attribute is called, but know nothing about its type
> until you receive it from the function.
>
> The reason codecs where designed like this was to be able to
> easily stack them. For this to work, only the interfaces need
> to be defined, without restricting the codecs too much in terms
> of which types may be used.
>
> I'd suggest to lift the type restrictions from the general
> codecs.c access APIs (PyCodec_*), since they don't really belong
> there and instead only impose the limitation on PyUnicode and
> PyString methods .encode() and .decode().
>
> If you then also allow those methods to return *both*
> PyUnicode and PyString, you'd still have strong typing
> (only 1 of two possible types is allowed) and stacking
> streams or having codecs that work on PyUnicode->PyUnicode
> or PyString->PyString would still be accessible via
> .encode()/.decode().
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, May 08 2008)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
More information about the Python-3000
mailing list