[Python-ideas] Adding 'bytes' as alias for 'latin_1' codec.

Nick Coghlan ncoghlan at gmail.com
Fri Jun 3 06:40:36 CEST 2011


On Fri, Jun 3, 2011 at 6:14 AM, Terry Reedy <tjreedy at udel.edu> wrote:
> I am a bit embarassed that I did not see sooner that characters are for
> people and bytes for computers. Thus Python produces both character and byte
> serializations for objects.

FWIW, even after being involved in the assorted bytes/str design
discussions for Py3k, I didn't really "get it" myself until I made the
changes to urllib.parse in Python 3.2 to get most of the APIs to
accept both str objects and byte sequences.

The contrast between my first attempt (which tried to provide a common
code path that handled both strings and byte sequences without
trashing the encoding of the latter) and my second (which just decodes
and reencodes byte sequences using strict ASCII and punts on malformed
URLs containing non-ASCII values) was amazing. My original plan was to
benchmark them before choosing, but the latter approach was so much
simpler and cleaner than the former that it wasn't even a contest.

Focusing efforts on things like PEP 393, and perhaps even a memoryview
based "strview" is likely to be a more fruitful way forward than
trying to shoehorn text-specific concerns into the general binary
storage types (and, as noted, the long release cycle means the
standard library is the wrong place for that kind of experimentation).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia



More information about the Python-ideas mailing list