[Python-Dev] PEP 460 reboot

Jim J. Jewett jimjjewett at gmail.com
Tue Jan 14 19:11:59 CET 2014



Nick Coghlan wrote:
>> Arbitrary binary data and ASCII  compatible binary data are *different 
>> things* and the only argument in favour of modelling them with a single 
>> type is because Python 2 did it that way.

Greg Ewing replied:

> I would say that ASCII compatible binary data is a
> *subset* of arbitrary binary data. As such, a type
> designed for arbitrary binary data is a perfectly good
> way of representing ASCII compatible binary data.

But not when you care about the ASCII-compatible part;
then you should use a subclass.

Obviously, it is too late for separating bytes from
AsciiStructuredBytes.  PBP *may* even mean that just
using the "subclass" for everything (and just the
ignoring the ASCII specific methods when they aren't
appropriate) was always the right implementation choice.

But in terms of explaining the text model, that
separation is important enough that

    (1)  We should be reluctant to strengthen the
         "its really just ASCII" messages.
    (2)  It *may* be worth creating a virtual
         split in the documentation.

I'm willing ot work on (2) if there is general consensus
that it would be a good idea.  As a rough sketch, I
would change places like

    http://docs.python.org/3/library/stdtypes.html#typebytes

from:

    Bytes objects are immutable sequences of single bytes.
    Since many major binary protocols are based on the ASCII
    text encoding, bytes objects offer several methods that
    are only valid when working with ASCII compatible data
    and are closely related to string objects in a variety
    of other ways.

to something more like:

    Bytes objects are immutable sequences of single bytes.

    A Bytes object could represent anything, and is
    appropriate as the underlying storage for a sound sample
    or image file.

    Virtual subclass ASCIIStructuredBytes
    ====================================
    
    One particularly common use of bytes is to represent
    the contents of a file, or of a network message.  In
    these cases, the bytes will often represent Text
    *in a specific encoding* and that encoding will usually
    be a superset of ASCII.  Rather than create and support
    an ASCIIStructuredBytes subclass, Python simply added
    support for these use cases straight to Bytes objects,
    and assumes that this support simply won't be used when
    when it does not make sense. For example, bytes literals
    *could* be used to construct a sound sample, but the
    literals will be far easier to read when they are used
    to represent (encoded) ASCII text, such as "OPEN". 

-jJ

-- 

If there are still threading problems with my replies, please 
email me with details, so that I can try to resolve them.  -jJ



More information about the Python-Dev mailing list