[Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5

Wed Jan 8 21:07:41 CET 2014

On Wed, Jan 8, 2014 at 2:17 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Victor Stinner, 06.01.2014 14:24:
>> Abstract
>> ========
>> Add ``bytes % args`` operator and ``bytes.format(args)`` method to
>> Python 3.5.
>
> Here is a counterproposal. Let someone who needs this feature write a
> library that does byte string formatting. That properly handles it, a full
> featured tool set. Write it in Cython if you need raw speed, that will also
> help in making it run in both Python 2 and Python 3, or in providing easy
> integration with buffers like the array module, various byte containers,
> NumPy, etc.

> I'm confident that this will show that the current Py2 code that
> (legitimately) does byte string formatting can actually be improved,
> simplified or sped up, at least in some corners. I'm sure Py2 byte string
> formatting wasn't perfect for this use case either, it just happened to be
> there, so everyone used it and worked around its particular quirks for the
> particular use case at hand. (Think of "%s" % some_unicode_value, for example.)
>
> Instead of waiting for 3.5, a third party library allows users to get
> started porting their code earlier, and to make it work unchanged with
> Python versions before 3.5.

Maybe we can enumerate some of the stated drawbacks of b''.format()

Convenient string processing tools for bytes will make people ignore
Unicode or fail to notice it or do it wrong? (As opposed to the
alternative causing them to learn how to process and produce Unicode
correctly?)

Similar APIs on bytes and str will prevent implicit "assert
isinstance(x, str)" checks?

More-prevalent bytes will propagate across the program causing bugs?
A-la open(b'filename').name vs open('filename').name ?

It will take a long time.

Hopeful benefits may include easier porting and greater Py3 adoption,
less encoding dances and/or decoding non-Unicode into Unicode just to
make things work, hopefully fewer surrogate-encoded bytes and
therefore fewer encoding-bugs-distant-from-source-of-invalid-text, ...