On 01/18/2014 05:48 AM, Nick Coghlan wrote:
On 18 Jan 2014 11:52, "Ethan Furman" wrote:
I'll admit to being somewhat on the fence about %a.
It seems there are two possibilities with %a:
1) have it be ascii(repr(obj))
2) have it be str(obj).encode('ascii', 'strict')
This gets very close to crossing the line into implicit encoding of text again. Binary interpolation is being added back for the specific use case of working with ASCII compatible segments in binary formats, and it's at best arguable that supporting %a will help with that use case.
Agreed.
However, without it, there may be a greater temptation to inappropriately define __bytes__ just to support binary interpolation, rather than because a type truly has an appropriate translation directly to bytes.
True.
By allowing %a, we avoid that temptation. This is also potentially useful specifically in the case of binary logging formats and as a quick way to request backslash escaping of non-ASCII characters in text.
Call it +0.5 for allowing %a. I don't expect it to be used heavily, but I think it will head off a fair bit of potential misuse of __bytes__.
So, if %a is added it would act like: --------- "%a" % some_obj --------- tmp = str(some_obj) res = b'' for ch in tmp: if ord(ch) < 256: res += bytes([ord(ch)] else: res += unicode_escape(ch) --------- where 'unicode_escape' would yield something like "\u0440" ? -- ~Ethan~