[Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.

Mon May 30 06:45:10 CEST 2011

On Mon, May 30, 2011 at 12:39 PM, Stephen J. Turnbull
<stephen at xemacs.org> wrote:
> (However, there are use
> cases where it is claimed that 'HELO ' is needed both as str and as
> bytes.)

My current opinion is that all of this still needs more
experimentation outside the core before we start fiddling any further
with the builtins (we blinked once in the lead-up to 3.0 by allowing
bytes and bytearray to retain a lot of string methods that assume an
ASCII compatible encoding, and I now have my doubts about the wisdom
of even that step). I don't have a good answer on how to deal with the
real world situations where the *use case* blurs the bytes/text
distinction (typically by embedding ASCII text inside an otherwise
binary protocol), and given the potential to backslide into the bad
old days of 8-bit strings, I'm not prepared to guess, either.

3.x has largely cleared the decks to allow a better solution to evolve
in this space by making it harder to blur the line accidentally, and
decode()/manipulate/encode() already nicely covers many stateless use
cases. If it turns out we need another type, or some other API, to
deal gracefully with any use cases where that isn't enough, then so be
it. However, I think we need to let the status quo run for a while
longer and see what people actually using the current types in
production come up with. The bytes/text division in Python 3 is by far
the biggest conceptual change between the two languages, so it's going
to take some time before we can figure out how many of the problems
encountered are real issues with the split model not covering some use
cases and how many are just people (including us) taking time to get
used to the sharp division between the two worlds.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia