On 18 Jan 2014 11:52, "Ethan Furman" <ethan@stoneleaf.us> wrote:
>
> On 01/17/2014 05:27 PM, Steven D'Aprano wrote:
>>
>> On Fri, Jan 17, 2014 at 08:49:21AM -0800, Ethan Furman wrote:
>>>
>>>
>>> Overriding Principles
>>> =====================
>>>
>>> In order to avoid the problems of auto-conversion and Unicode
>>> exceptions that could plague Py2 code, all object checking will
>>> be done by duck-typing, not by values contained in a Unicode
>>> representation [3]_.
>>
>>
>> I don't understand this paragraph. What does "values contained in a
>> Unicode representation" mean?
>
>
> Yeah, that is clunky. I'm trying to convey the idea that we don't want errors based on content, i.e. which characters happens to be in a str.
>
>
>
>> [...]
>>>
>>> %s is restricted in what it will accept::
>>>
>>> - input type supports Py_buffer?
>>> use it to collect the necessary bytes
>>
>>
>> Can you give some examples of what types support Py_buffer? Presumably
>> bytes. Anything else?
>
>
> Anybody? Otherwise I'll go spelunking in the code.
bytes, bytearray, memoryview, ctypes arrays, array.array, numpy.ndarrray
It may actually be clearer to express this in terms of memoryview for the benefits of those that aren't familiar with the C API, as that is the closest equivalent Python level API (while there is an open issue regarding the C only nature of the buffer export API, nobody has volunteered to put together a PEP and implementation for a Python level follow up to the C level PEP 3118. The problem is that the original use cases involve C extensions anyway, so the relevant experts don't have any personal need for a Python level buffer exporter interface. Instead, it's in the "should be done for completeness, and would make some of our testing easier, but doesn't have anyone clamouring for it" bucket.
>
>
>
>>> - input type is something else?
>>> use its __bytes__ method; if there isn't one, raise a TypeError
>>
>>
>> I think you should explicitly state that this is a new special method,
>> and state which built-in types will grow a __bytes__ method (if any).
>
>
> It's not new. I know bytes, str, and numbers /do not/ have __bytes__.
Right, it is already used by bytes to convert arbitrary objects to a binary representation. The difference with Py_buffer/memoryview is that they provide access to the raw data without necessarily copying anything.
str and numbers don't implement it as there's no obvious default interpretation (the b'\x00' * n interpretation of integers is part of the bytes constructor and now a decision we mostly regret - it should have been a keyword argument or a separate class method)
>
>
>
>>> Unsupported codes
>>> -----------------
>>>
>>> %r (which calls __repr__), and %a (which calls ascii() on __repr__) are not
>>> supported.
>>
>>
>> +1 on not supporting b'%r' (i.e. I agree with the PEP).
>>
>> Why not support b'%a'? That seems to be a strange thing to prohibit.
>
>
> I'll admit to being somewhat on the fence about %a.
>
> It seems there are two possibilities with %a:
>
> 1) have it be ascii(repr(obj))
>
> 2) have it be str(obj).encode('ascii', 'strict')
This gets very close to crossing the line into implicit encoding of text again. Binary interpolation is being added back for the specific use case of working with ASCII compatible segments in binary formats, and it's at best arguable that supporting %a will help with that use case.
However, without it, there may be a greater temptation to inappropriately define __bytes__ just to support binary interpolation, rather than because a type truly has an appropriate translation directly to bytes.
By allowing %a, we avoid that temptation. This is also potentially useful specifically in the case of binary logging formats and as a quick way to request backslash escaping of non-ASCII characters in text.
Call it +0.5 for allowing %a. I don't expect it to be used heavily, but I think it will head off a fair bit of potential misuse of __bytes__.
Cheers,
Nick.