[Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?]

Tue Jan 7 19:33:19 CET 2014

Nick Coghlan writes:

 > I haven't been following the discussion in detail (linux.conf.au and
 > the Py3 discussions have most of my attention this week), but I'm
 > definitely not clear on how this 7-bit proposal differs meaningfully
 > from just using ascii with the surrogateescape error handler.
 > Cheers, Nick.

It doesn't differ meaningfully to me.  I doubt I'll be writing any
programs in the near future that aren't just as well and efficiently
done by decoding as ascii with surrogateescape.

It does give you an 8-bit representation, with the benefits that gives
you (very fast encode and fast decode), whereas the ascii +
surrogateescape approach gives you a 16-bit representation sometimes.
Some people seem to care about that, eg, it seems to fit the chunked
HTTP use-case perfectly.

It gives you an 8-bit almost-bytes type without the b prefix on
literals.  I don't know if that would actually be useful to anybody.

Finally (and again, I haven't thought this through) you have a halfway
house that can in principle be mixed more or less freely with either
bytes (and bytearray and memoryview) or Unicode, but not with both.
(There is intentionally no way to get back to "ascii-compatible"
representation from one of the other str representations, and in the
same way combining with one of the bytes types would give a bytes
type.)  I realize this probably doesn't work without modification
because as designed it *is* str and the type system wouldn't be able
to distinguish between the ascii-compatible representation and a str
in another representation.  So maybe this would bring us back to the
idea of a new bytestring type.

I'll get back to Steven's post later, but it and others seem to be
stuck in the greylist.  (Hate spam, hate spam, hate what spam does to
us....)