[Python-Dev] PEP 460 reboot

Wed Jan 15 11:57:16 CET 2014

Aside: OK, Guido, ya got me.

I have a separate screed recounting the reasons for my apostasy, but
that's probably not interesting any more.  I'll send it to individuals
on request.

 > But in terms of explaining the text model, that
 > separation is important enough that
 > 
 >     (1)  We should be reluctant to strengthen the
 >          "its really just ASCII" messages.

True.  I think the right message is is "Unless you know why you
*desperately* want this, not only don't you need it, but using it is
the Python equivalent of skydiving without a parachute."

N.B. Don't take the metaphor as an insult.  I think it's become clear
that those who "desperately want this" not only use parachutes, they
pack their own.  No need to worry about them.

 >     (2)  It *may* be worth creating a virtual
 >          split in the documentation.

Please don't.  All we need to tell naive users is:

    Look at the structure of the bytes.  If that structure is "text",
    convert to str using .decode().  Please don't use bytes.

    If that structure isn't text, you're in a specialist domain, and
    it's your problem.  Many structured uses of bytes use ASCII-
    encoded keywords: we provide bytes methods for handling them, but
    you *must* be aware that these methods *cannot* distinguish "bytes
    representing text encoded as ASCII" from "any old bytes".  Be
    warned: They will happily -- and silently -- corrupt the latter.
    Make sure you respect the higher-level structure of your data when
    using them.

 >     Virtual subclass ASCIIStructuredBytes
 >     ====================================
 >     
 >     One particularly common use of bytes is to represent
 >     the contents of a file, or of a network message.  In
 >     these cases, the bytes will often represent Text
 >     *in a specific encoding* and that encoding will usually
 >     be a superset of ASCII.  Rather than create and support
 >     an ASCIIStructuredBytes subclass, Python simply added
 >     support for these use cases straight to Bytes objects,
 >     and assumes that this support simply won't be used when
 >     when it does not make sense. For example, bytes literals

This is going quite the wrong direction, I think.  The only people who
should care about "Text *in a specific encoding* and that encoding
will usually be a superset of ASCII" are codec writers, and by now
writing those is a very rare task.  Everybody else uses ASCII keywords
in a simple formal language.

 >     *could* be used to construct a sound sample, but the
 >     literals will be far easier to read when they are used
 >     to represent (encoded) ASCII text, such as "OPEN".