[Python-ideas] a new bytestring type?

Stephen J. Turnbull stephen at xemacs.org
Mon Jan 6 11:57:18 CET 2014


Geert Jansen writes:

 > I'm not missing a new type, but I am missing the format method on the
 > binary types.

I'm curious about precisely what your use cases are, and just what
formatting they need.

The problem that Python 2 code has over and over imposed on me is that
the temptation to avoid the overhead of conversion to and then from
unicode when processing text by just using str results in the
equivalent of

    bs1 = returns_a_bytestring_encoded_in_utf8()
    bs2 = returns_a_bytestring_encoded_in_koi8()

    bs3 = b'{0} {1}'.format(bs1, bs2)
    # and lose big when something expects valid UTF-8 in bs3

In low-level code, the assignments to bs1, bs2, and bs3 are likely to
be in three separate contexts, even three separate modules.  I
understand about consenting adults, but it's just too hard to enforce
good practice here if you make it easy to pass around and operate on
encoded bytestrings.  I don't see how you avoid this pitfall, except
by making it easier to pass around Unicode than encoded strings.  And
given that encoding and decoding are unavoidable, that means making
use of bytestrings with text semantics painful.

So to answer my question from my own point of view, for example, I
would have no problem at all with

    b'{0:c}'.format(27) == b'\x1b'           # insert an ASCII ESC character

I would be leery of

    b'{0:s}'.format(b'\x1b[M') == b'\x1b[M'  # insert a ANSI control sequence

for the reason given above (for this use case, I would prefer

    blue_code = ord('M')                    # Or b'M', doesn't matter!
    b'\x1b[{0:c}'.format(blue_code) == b'\x1b[M'

-- and forgive me for not looking up my ANSI color sequences, it's
only luck if that's close) and I would consider

    b'{0:d}'.format(27) == b'27'             # insert the ASCII representation

to be an abomination since there's no reason to suppose that any given
bytestring is encoded in an ASCII-compatible way, or bigendian for
that matter.  Ditto everything else that involves representing a
number as a string of numeric characters.



More information about the Python-ideas mailing list