[Python-ideas] a new bytestring type?
Geert Jansen
geertj at gmail.com
Mon Jan 6 12:19:08 CET 2014
On Mon, Jan 6, 2014 at 11:57 AM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> > I'm not missing a new type, but I am missing the format method on the
> > binary types.
>
> I'm curious about precisely what your use cases are, and just what
> formatting they need.
One use case I came across was when creating chunks for the HTTP
chunked encoding. Chunks contain a ascii header, a raw/encoded chunk
body, and an ascii trailer. Using a bytes.format, it would look like
this:
chunk = '{0:X}\r\n{1}\r\n'.format(len(buf), buf)
This is what I am using now:
chunk = bytearray()
chunk.extend('{0:X}\r\n'.format(len(buf)).encode('ascii'))
chunk.extend(buf)
chunk.extend('\r\n'.encode('ascii'))
Regards,
Geert
>
> The problem that Python 2 code has over and over imposed on me is that
> the temptation to avoid the overhead of conversion to and then from
> unicode when processing text by just using str results in the
> equivalent of
>
> bs1 = returns_a_bytestring_encoded_in_utf8()
> bs2 = returns_a_bytestring_encoded_in_koi8()
>
> bs3 = b'{0} {1}'.format(bs1, bs2)
> # and lose big when something expects valid UTF-8 in bs3
>
> In low-level code, the assignments to bs1, bs2, and bs3 are likely to
> be in three separate contexts, even three separate modules. I
> understand about consenting adults, but it's just too hard to enforce
> good practice here if you make it easy to pass around and operate on
> encoded bytestrings. I don't see how you avoid this pitfall, except
> by making it easier to pass around Unicode than encoded strings. And
> given that encoding and decoding are unavoidable, that means making
> use of bytestrings with text semantics painful.
>
> So to answer my question from my own point of view, for example, I
> would have no problem at all with
>
> b'{0:c}'.format(27) == b'\x1b' # insert an ASCII ESC character
>
> I would be leery of
>
> b'{0:s}'.format(b'\x1b[M') == b'\x1b[M' # insert a ANSI control sequence
>
> for the reason given above (for this use case, I would prefer
>
> blue_code = ord('M') # Or b'M', doesn't matter!
> b'\x1b[{0:c}'.format(blue_code) == b'\x1b[M'
>
> -- and forgive me for not looking up my ANSI color sequences, it's
> only luck if that's close) and I would consider
>
> b'{0:d}'.format(27) == b'27' # insert the ASCII representation
>
> to be an abomination since there's no reason to suppose that any given
> bytestring is encoded in an ASCII-compatible way, or bigendian for
> that matter. Ditto everything else that involves representing a
> number as a string of numeric characters.
>
More information about the Python-ideas
mailing list