[Python-ideas] Python 3.x and bytes

Fri May 20 19:35:12 CEST 2011

Nick Coghlan <ncoghlan at gmail.com> wrote:

> On Fri, May 20, 2011 at 11:05 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
> > which is now
> >
> > field_type = chr(hdr[11])
> 
> This is definitely a modelling problem, and exactly the kind of
> thinking that the bytes model in Py3k is intended to combat.
> 
> Bytes are not text, even when you're dealing primarily with ASCII. The

To me, that's the crux of this issue, and that's the reason this will
keep coming up again and again, and that's the reason people will
continue to want to "improve" the 'bytes' type to be more 'string-like'.

The problem, of course, is that bytes often *are* text, in the sense
that the byte sequence contains an encoded string, and the programmer
both knows that and wants that.  Even for non-ASCII strings.  Because
Python is widely used for processing encoded strings of various kinds,
and programmers hate to decode/encode just to work on them *as* strings.

Mind you, that's exactly the wrong thing to do, in my opinion.  It just
gets us back to the bad old days of Python 2, where strings were often
kept in a sequence of bytes which had no way of indicating what encoding
it had.  But changing the mindset of programmers?  Hard to do, very hard
to do.

Personally, I think a more realistic approach might be to (a) improve
the implementation of 'str()' so that it avoids unnecessary
decode/encode operations, decoding only when necessary (yes, that means
there would be multiple C-level representations for a 'str'), and then
(b) making 'bytes' less useful as strings.

Bill