
On Fri, Jun 3, 2011 at 6:14 AM, Terry Reedy <tjreedy@udel.edu> wrote:
I am a bit embarassed that I did not see sooner that characters are for people and bytes for computers. Thus Python produces both character and byte serializations for objects.
FWIW, even after being involved in the assorted bytes/str design discussions for Py3k, I didn't really "get it" myself until I made the changes to urllib.parse in Python 3.2 to get most of the APIs to accept both str objects and byte sequences. The contrast between my first attempt (which tried to provide a common code path that handled both strings and byte sequences without trashing the encoding of the latter) and my second (which just decodes and reencodes byte sequences using strict ASCII and punts on malformed URLs containing non-ASCII values) was amazing. My original plan was to benchmark them before choosing, but the latter approach was so much simpler and cleaner than the former that it wasn't even a contest. Focusing efforts on things like PEP 393, and perhaps even a memoryview based "strview" is likely to be a more fruitful way forward than trying to shoehorn text-specific concerns into the general binary storage types (and, as noted, the long release cycle means the standard library is the wrong place for that kind of experimentation). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia