[Python-Dev] PEP 460 reboot

Tue Jan 14 18:47:15 CET 2014

On Tue, Jan 14, 2014 at 12:29 PM, Yury Selivanov <yselivanov.ml at gmail.com>wrote:

> Brett,
>
>
> I like your proposal.  There is one idea I have that could,
> perhaps, improve it:
>
>
> 1. “%s" and “{}” will continue to work for bytes and bytearray in
> the following fashion:
>
>  - check if __bytes__/Py_buffer supported.
>  - if it is, check that the bytes are strictly in the printable
>    ASCII-subset (a-z, A-Z, 0-9 + special symbols like ! etc).
>    Throw an error if the check fails. If not - concatenate.
>  - Try str(), and do ".encode(‘ascii’, ‘stcict’)” on the result.

>
> This way *most* of the use cases of python2 will be covered without
> touching the code. So:
>

See, I'm fine with having people update their format strings to specify a
format spec; it's minor and isn't totally useless as it expresses what they
mean more explicitly (e.g. "I want this to be a int, I want this to be a
float, and I want this to be an ASCII string" using d, f, and s,
respectively). I want people to have to make a conscious decision to fall
back on an ASCII encoding. What you are suggesting is for people have to
make a conscious decision **not** to encode to ASCII implicitly which is
what I'm trying to avoid with this proposal. My goal is to make it easy to
work with ASCII but as an explicit choice to, not by default.

-Brett

>  - b’Hello {}’.format(‘world’)
>    will be the same as b’hello ‘ + str(‘world’).encode(‘ascii’, ‘strict’)
>
>  - b’Hello {}’.format(‘\u0394’) will throw UnicodeEncodeError
>
>  - b’Status: {}’.format(200)
>    will be the same as b’Status: ‘ + str(200).encode(‘ascii’, ‘strict’)
>
>  - b’Hello %s’ % (‘world’,) - the same as the first example
>
>  - b’Connection: {}’.format(b’keep-alive’) - works
>
>  - b’Hello %s’ % (b'\xce\x94’,) - will fail, not ASCII subset we accept
>
> I think it’s OK to check the buffers for ASCII-subset only. Yes, it
> will have some sort of sub-optimal performance, but then, it’s quite
> rare when string formatting is used to concatenate huge buffers.

> 2. new operators {!b} and %b. This ones will just use ‘__bytes__’ and
> Py_buffer.
>
> --
> Yury Selivanov
>
> On January 14, 2014 at 11:31:51 AM, Brett Cannon (brett at python.org) wrote:
> >
> > On Mon, Jan 13, 2014 at 5:14 PM, Guido van Rossum
> > wrote:
> >
> > > On Mon, Jan 13, 2014 at 2:05 PM, Brett Cannon
> > wrote:
> > > > I have been going on the assumption that bytes.format() would
> > change what
> > > > '{}' meant for itself and would only interpolate bytes. That
> > convenient
> > > > between Python 2 and 3 since it represents what we want it to
> > (str and
> > > bytes
> > > > under the hood, respectively), so it just falls through. We
> > could also
> > > add a
> > > > 'b' conversion for bytes() explicitly so as to help people
> > not
> > > accidentally
> > > > mix up things in bytes.format() and str.format(). But I was
> > not
> > > suggesting
> > > > adding a specific format spec for bytes but instead making
> > bytes.format()
> > > > just do the .encode('ascii') automatically to help with compatibility
> > > when a
> > > > format spec was present. If people want fancy formatting for
> > bytes they
> > > can
> > > > always do it themselves before calling bytes.format().
> > >
> > > This seems hastily written (e.g. verb missing :-), and I'm not
> > clear
> > > on what you are (or were) actually proposing. When exactly would
> > > bytes.format() need .encode('ascii')?
> > >
> > > I would be happy to wait a few hours or days for you to to write it
> > up
> > > clearly, rather than responding in a hurry.
> >
> >
> > Sorry about that. Busy day at work + trying to stay on top of this
> > entire
> > conversation was a bit tough. Let me try to lay out what I'm suggesting
> > for
> > bytes.format() in terms of how it changes
> > http://docs.python.org/3/library/string.html#format-string-syntax
> > for bytes.
> >
> > 1. New conversion operator of 'b' that operates as PEP 460 specifies
> > (i.e.
> > tries to get a buffer, else calls __bytes__). The default conversion
> > changes from 's' to 'b'.
> > 2. Use of the conversion field adds an added step of calling
> > str.encode('ascii', 'strict') on the result returned from
> > calling
> > __format__().
> >
> > That's it. So point 1 means that the following would work in Python
> > 3.5::
> >
> > b'Hello, {}, how are you?'.format(b'Guido')
> > b'Hello, {!b}, how are you?'.format(b'Guido')
> >
> > It would produce an error if you used a text argument for 'Guido'
> > since str
> > doesn't define __bytes__ or a buffer. That gives the EIBTI group
> > their
> > bytes.format() where nothing magical happens.
> >
> > For point 2, let's say you have the following in Python 2::
> >
> > 'I have {} bottles of beer on the wall'.format(10)
> >
> > Under my proposal, how would you change it to get the same result
> > in Python
> > 2 and 3?::
> >
> > b'I have {:d} bottles of beer on the wall'.format(10)
> >
> > In Python 2 you're just being more explicit about the format,
> > otherwise
> > it's the same semantics as today. In Python 3, though, this would
> > translate
> > into (under the hood)::
> >
> > b'I have {} bottles of beer on the wall'.format(format(10,
> > 'd').encode('ascii', 'strict'))
> >
> > This leads to the same bytes value in Python 2 (since it's just
> > a string)
> > and in Python 3 (as everything accepted by bytes.format() is
> > either bytes
> > already or converted to from encoding to ASCII bytes). While
> > Python 2 users
> > would need to make sure they used a format spec to get the same result
> > in
> > both Python 2 and 3 for ASCII bytes, it's a minor change which also
> > makes
> > the format more explicit so it's not an inherently bad thing.
> > And for those
> > that don't want to utilize the automatic ASCII encoding they
> > can just not
> > use a format spec in the format string and just pass in bytes directly
> > (i.e. call __format__() themselves and then call str.encode()
> > on their
> > own). So PBP people get to have a simple way to use bytes.format()
> > in
> > Python 2 and 3 when dealing with things that can be represented
> > as ASCII
> > (just as the bytes methods allow for currently).
> >
> > I think this covers your desire to have numbers and anything else
> > that can
> > be represented as ASCII be supported for easy porting while covering
> > my
> > desire that any automatic encoding is clearly explicit in the
> > format string
> > and in no way special-cased for only some types (the introduction
> > of a 'c'
> > converter from PEP 460 is also fine with me).
> >
> > How you would want to translate this proposal with the % operator
> > I'm not
> > sure since it has been quite a while since I last seriously used
> > it and so
> > I don't think I'm in a good position to propose a shift for it.
> > _______________________________________________
> > Python-Dev mailing list
> > Python-Dev at python.org
> > https://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/yselivanov.ml%40gmail.com
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140114/6f463d1a/attachment-0001.html>