[Python-Dev] PEP 460 reboot
v+python at g.nevcal.com
Mon Jan 13 10:19:08 CET 2014
On 1/13/2014 12:46 AM, Mark Shannon wrote:
> On 13/01/14 03:47, Guido van Rossum wrote:
>> On Sun, Jan 12, 2014 at 6:24 PM, Ethan Furman <ethan at stoneleaf.us>
>>> On 01/12/2014 06:16 PM, Ethan Furman wrote:
>>>> If you do :
>>>> --> b'%s' % 'some text'
>>> Ignore what I previously said. With no encoding the result would be:
>>> b"'some text'"
>>> So an encoding should definitely be specified.
>> Yes, but the encoding is no business of %s or %. As far as the
>> formatting operation cares, if the argument is bytes they will be
>> copied literally, and if the argument is a str (or anything else) it
>> will call ascii() on it.
> It seems to me that what people want from '%s' is:
> Convert to a str then encode as ascii for non-bytes
> or copy directly for bytes.
Maybe. But it only takes a small tweak to the parameter to get what they
want... a tweak that works in both Python 2.7 and Python
b"%s" % foo
they must use
b"%s" % foo.encode( explicitEncoding )
which is what they should have been doing in Python 2.7 all along, and
if they were, they need make no change.
Oh, foo was a Python 2.7 str? Converted to Python 3.x str, by default
conversion rules? Already in ASCII? No harm.
Oh, foo was a literal? Add b prefix, instead of the .encode("ASCII"), if
> So why not replace '%s' with '%a' for the ascii case and
> with '%b' for directly inserting bytes.
Because %a and %b don't exist in Python 2.7?
> That way, the encoding is explicit.
The encoding is already explicit. If it is bytes encoded from str, that
transformation had an explicit encoding. If it is "%s" % str(...), then
there is no encoding, but rather a transformation into an ASCII
representation of the Unicode code points, using escape sequences. Which
isn't likely to be what they want, but see the parameter tweak above.
> I think it is vital that the encoding is explicit in all cases where
> bytes <-> str conversion occurs.
Since it is explicit, you have no concerns in this area.
Regarding the concern about implicit use of ASCII by certain bytes
methods and proposed interpolations, I'm curious how many standard
encodings exist that do not have an ASCII subset. I can enumerate a
starting list, but if there are others in actual use, I'm unaware of them.
UTF-16 BE & LE
UTF-32 BE & LE
Wikipedia: The vast majority of code pages in current use are supersets
of ASCII <http://en.wikipedia.org/wiki/ASCII>, a 7-bit code representing
128 control codes and printable characters.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-Dev