[Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5

Sat Jan 11 22:35:13 CET 2014

On Sat, Jan 11, 2014 at 4:28 PM, Terry Reedy <tjreedy at udel.edu> wrote:
> On 1/11/2014 1:44 PM, Stephen J. Turnbull wrote:
>
>> We already *have* a type in Python 3.3 that provides text
>> manipulations on arrays of 8-bit objects: str (per PEP 393).
>>
>>   > BTW: I don't know why so many people keep asking for use cases.
>>   > Isn't it obvious that text data without known (but ASCII compatible)
>>   > encoding or multiple different encodings in a single data chunk
>>   > is part of life ?
>>
>> Isn't it equally obvious that if you create or read all such ASCII-
>> compatible chunks as (encoding='ascii', errors='surrogateescape') that
>> you *don't need* string APIs for bytes?
>>
>> Why do these "text chunks" need to be bytes in the first place?
>> That's why we ask for use cases.  AFAICS, reading and writing ASCII-
>> compatible text data as 'latin1' is just as fast as bytes I/O.  So
>> it's not I/O efficiency, and (since in this model we don't do any
>> en/decoding on bytes/str), it's not redundant en/decoding of bytes to
>> str and back.
>
>
> The problem with some criticisms of using 'unicode in Python 3' is that
> there really is no such thing. Unicode in 3.0 to 3.2 used the old internal
> model inherited from 2.x. Unicode in 3.3+ uses a different internal model
> that is a game changer with respect to certain issues of space and time
> efficiency (and cross-platform correctness and portability). So at least
> some the valid criticisms based on the old model are out of date and no
> longer valid.

-1 on adding more surrogateesapes by default. It's a pain to track
down where the encoding errors came from.