On 1/11/2014 1:44 PM, Stephen J. Turnbull wrote:
We already *have* a type in Python 3.3 that provides text manipulations on arrays of 8-bit objects: str (per PEP 393).
BTW: I don't know why so many people keep asking for use cases. Isn't it obvious that text data without known (but ASCII compatible) encoding or multiple different encodings in a single data chunk is part of life ?
Isn't it equally obvious that if you create or read all such ASCII- compatible chunks as (encoding='ascii', errors='surrogateescape') that you *don't need* string APIs for bytes?
Why do these "text chunks" need to be bytes in the first place? That's why we ask for use cases. AFAICS, reading and writing ASCII- compatible text data as 'latin1' is just as fast as bytes I/O. So it's not I/O efficiency, and (since in this model we don't do any en/decoding on bytes/str), it's not redundant en/decoding of bytes to str and back.
The problem with some criticisms of using 'unicode in Python 3' is that there really is no such thing. Unicode in 3.0 to 3.2 used the old internal model inherited from 2.x. Unicode in 3.3+ uses a different internal model that is a game changer with respect to certain issues of space and time efficiency (and cross-platform correctness and portability). So at least some the valid criticisms based on the old model are out of date and no longer valid. -- Terry Jan Reedy