[Python-Dev] PEP 467: last round (?)

Sun Sep 4 11:38:06 EDT 2016

On 4 September 2016 at 20:43, Koos Zevenhoven <k7hoven at gmail.com> wrote:
> On Sun, Sep 4, 2016 at 12:51 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> That said, the PEP does propose "getbyte()" and "iterbytes()" for
>> bytes-oriented indexing and iteration, so there's a reasonable
>> consistency argument in favour of also proposing "byte" as the builtin
>> factory function:
>>
>> * data.getbyte(idx) would be a more efficient alternative to byte(data[idx])
>> * data.iterbytes() would be a more efficient alternative to map(byte, data)
>>
>
> .. I don't understand the argument for having 'byte' in these names.
> They should have 'char' or 'chr' in them for exacly the same reason
> that the proposed builtin should have 'chr' in it instead of 'byte'.
> If 'bytes' is an iterable of ints, then get_byte should probably
> return an int
>
> I'm sorry, but this argument comes across as "were're proposing the
> wrong thing here, so for consistency, we might want to do the wrong
> thing in this other part too".

There are two self-consistent sets of names:

    bchr
    bytes.getbchr, bytearray.getbchr
    bytes.iterbchr, bytearray.iterbchr

    byte
    bytes.getbyte, bytearray.getbyte
    bytes.iterbytes, bytearray.iterbytes

The former set emphasises the "stringiness" of this behaviour, by
aligning with the chr() builtin

The latter set emphasises that these APIs are still about working with
arbitrary binary data rather than text, with a Python "byte"
subsequently being a length 1 bytes object containing a single integer
between 0 and 255, rather than "What you get when you index or iterate
over a bytes instance".

Having noticed the discrepancy, my personal preference is to go with
the latter option (since it better fits the "executable pseudocode"
ideal and despite my reservations about "bytes objects contain int
objects rather than byte objects", that shouldn't be any more
confusing in the long run than explaining that str instances are
containers of length-1 str instances). The fact "byte" is much easier
to pronounce than bchr (bee-cher? bee-char?) also doesn't hurt.

However, I suspect we'll need to put both sets of names in front of
Guido and ask him to just pick whichever he prefers to get it resolved
one way or the other.

> And didn't someone recently propose deprecating iterability of str
> (not indexing, or slicing, just iterability)? Then str would also need
> a way to provide an iterable or sequence view of the characters. For
> consistency, the str functionality would probably need to mimic the
> approach in bytes. IOW, this PEP may in fact ultimately dictate how to
> get a iterable/sequence from a str object.

Strings are not going to become atomic objects, no matter how many
times people suggest it.

>> With bchr, those mappings aren't as clear (plus there's a potentially
>> unwanted "text" connotation arising from the use of the "chr"
>> abbreviation).
>
> Which mappings?

The mapping between the builtin name and the method names.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia