
Nick Coghlan <ncoghlan@gmail.com> wrote:
On Fri, May 20, 2011 at 11:05 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
which is now
field_type = chr(hdr[11])
This is definitely a modelling problem, and exactly the kind of thinking that the bytes model in Py3k is intended to combat.
Bytes are not text, even when you're dealing primarily with ASCII. The
To me, that's the crux of this issue, and that's the reason this will keep coming up again and again, and that's the reason people will continue to want to "improve" the 'bytes' type to be more 'string-like'. The problem, of course, is that bytes often *are* text, in the sense that the byte sequence contains an encoded string, and the programmer both knows that and wants that. Even for non-ASCII strings. Because Python is widely used for processing encoded strings of various kinds, and programmers hate to decode/encode just to work on them *as* strings. Mind you, that's exactly the wrong thing to do, in my opinion. It just gets us back to the bad old days of Python 2, where strings were often kept in a sequence of bytes which had no way of indicating what encoding it had. But changing the mindset of programmers? Hard to do, very hard to do. Personally, I think a more realistic approach might be to (a) improve the implementation of 'str()' so that it avoids unnecessary decode/encode operations, decoding only when necessary (yes, that means there would be multiple C-level representations for a 'str'), and then (b) making 'bytes' less useful as strings. Bill