[Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.

Stephen J. Turnbull stephen at xemacs.org
Tue May 31 07:51:47 CEST 2011


Greg Ewing writes:
 > Stephen J. Turnbull wrote:
 > > Greg Ewing writes:
 > > 
 > >  > How would ascii behave when mixed with unicode strings? Should it
 > >  > automatically coerce to unicode,
 > > 
 > > Definitely not!  Bytes are not text, and the programmer must say when
 > > they want those bytes decoded.
 > 
 > But the proposed 'ascii' type *is* text, though.

If it's intended that the 'ascii' type *be* text, I don't see the
point.  It *is* Unicode (with a restricted range), and no coercion is
necessary between str and 'ascii', just a change of representation.
This can be done completely transparently[1], no need for a new type,
except that some effort on the part of implementer can be saved by
imposing ongoing annoyance on the application programmer.

But even as a separate type, 'ascii' still can't mix with bytes
safely, for the same reason that str can't mix with bytes: 'ascii' and
str have a known fixed encoding (Unicode), and bytes have an unknown,
variable encoding (possibly the non-encoding 'binary').  YAGNI...



Footnotes: 
[1]  For some use cases it might be useful to allow specifying the
representation in advance, as a micro-optimization.




More information about the Python-ideas mailing list