[Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5

Stephen Hansen me+python at ixokai.io
Sat Jan 11 10:44:41 CET 2014


For not caring much, your own stubbornness is quite notable throughout this
discussion. Stones and glass houses. :)

That said:

Twisted and Mercurial aren't the only ones who are hurt by this, at all.
I'm aware of at least two other projects who are actively hindered in their
support or migration to Python 3 by the bytes type not having some basic
functionality that "strings" had in 2.0.

The purity crowd in here has brought up that it was an important and
serious decision to split Text from Bytes in Py3, and I actually agree with
that. However, it is missing some very real and very concrete use-cases --
there are multiple situations where there are byte streams which have a
known text-subset which they really, really do need to operate on.

There's been a number of examples given: PDF, HTTP, network streams that
switch inline from text-ish to binary and back-again.. But, we can focus
that down to a very narrow and not at all uncommon situation in the latter.

Look at the HTTP Content-Length header. HTTP headers are fuzzy. My
understanding is, per the RFCs, their body can be arbitrary octets to the
exclusion of line feeds and DELs-- my understanding may be a bit off here,
and please feel free to correct me -- but the relevant specifications are a
bit fuzzy to begin with.

To my understanding of the spec, the header field name is essentially an
ASCII text field (sans separator), and the body is... anything, or nearly
anything. This is HTTP, which is surely one of the most used protocols in
the world.

The need to be able to assemble and disassemble such streams of that is a
real, valid use-case.

But looking at it, now look to the Content-Length header I mentioned. It
seems those who are declaring a purity priority in bytes/string separation
think it reasonable to do things like:

  headers.append((b"Content-Length": ("%d" %
(len(content))).encode("ascii")))

Or something. In the middle of processing a stream, you need to convert
this number into a string then encode it into bytes to just represent the
number as the extremely common, widely-accessible 7-bit ascii subset of its
numerical value. This isn't some rare, grandiose or fiendish undertaking,
or trying to merge Strings and Bytes back together: this is the simple
practical recognition that representing a number as its ascii-numerical
value is actually not at all uncommon.

This position seems utterly astonishing in its ridiculousness to me. The
recognition that the number "123" may be represented as b"123" surprises me
as a controversial thing, considering how often I see it in real life.

There is a LOT of code out there which needs a little bit of a middle
ground between bytes and strings; it doesn't mean you are giving way and
allowing strings and bytes to merge and giving up on the Edict of
Separation. But there are real world use-cases where you simply need to be
able to do many basic "String" like operations on byte-streams.

The removal of the ability to use interpolation to construct such byte
strings was a major regression in python 3 and is a big hurdle for more
then a few projects to upgrade.

I mean, its not like the "bytes" type lacks knowledge of the subset of
bytes that happen to be 7-bit ascii-compatible and can't perform text-ish
operations on them--

  Python 3.3.3 (v3.3.3:c3896275c0f6, Nov 18 2013, 21:18:40) [MSC v.1600 32
bit (Intel)] on win32
  Type "help", "copyright", "credits" or "license" for more information.
  >>> b"stephen hansen".title()
  b'Stephen Hansen'

How is this not a practical recognition that yes, while bytes are byte
streams and not text, a huge subset of bytes are text-y, and as long as we
maintain the barrier between higher characters and implicit conversion
therein, we're fine?

I don't see the difference here. There is a very real, practical need to
interpolate bytes. This very real, practical need includes the very real
recognition that converting 12345 to b'12345' is not something weird,
unusual, and subject to the thorny issues of Encodings. It is not violating
the doctrine of separation of powers between Text and Bytes.

Personally, I won't be converting my day job's codebase to Python 3 anytime
soon (where 'soon' is defined as 'within five years, assuming a best-case
scenario that a number of third-party issues are resolved. But! I'm aware
and involved with other projects, and this has bit two of them
specifically. I'm sure there are others who are not aware of this list or
don't feel comfortable talking on it (as it is, I encouraged one of the
project's coder to speak up, but they thought the question was a lost one
due to  previous responses on the original issue ticket and gave up.).

On Fri, Jan 10, 2014 at 6:04 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:

> On Fri, 10 Jan 2014 20:53:09 -0500
> "Eric V. Smith" <eric at trueblade.com> wrote:
> >
> > So, I'm -1 on the PEP. It doesn't address the cases laid out in issue
> > 3892. See for example http://bugs.python.org/issue3982#msg180432 .
>
> Then we might as well not do anything, since any attempt to advance
> things is met by stubborn opposition in the name of "not far enough".
>
> (I don't care much personally, I think the issue is quite overblown
> anyway)
>
> Regards
>
> Antoine.
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/me%2Bpython%40ixokai.io
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140111/0e0eebe6/attachment.html>


More information about the Python-Dev mailing list