[Python-Dev] PEP 460 reboot

Chris Barker chris.barker at noaa.gov
Tue Jan 14 18:45:59 CET 2014


On Tue, Jan 14, 2014 at 9:29 AM, Yury Selivanov <yselivanov.ml at gmail.com>wrote:

>  - Try str(), and do ".encode(‘ascii’, ‘stcict’)” on the result.
>

please no -- that's the source of a lot of pain in py2 now.

having a failure as a result of the value, rather than the type, of an
object just makes hard-to-test for bugs. Everything will be hunky dory for
development and testing, then in deployment some idiot ( ;-) ) will pass in
some non-ascii compatible string and you get  failure. And the person who
gets the failure doesn't understand why, or they wouldn't have passed in
non-ascii values in the first place...

Ease of porting is nice, but let's not make it easy to port bug-prone code.

-Chris












>
> This way *most* of the use cases of python2 will be covered without
> touching the code. So:
>
>  - b’Hello {}’.format(‘world’)
>    will be the same as b’hello ‘ + str(‘world’).encode(‘ascii’, ‘strict’)
>
>  - b’Hello {}’.format(‘\u0394’) will throw UnicodeEncodeError
>
>  - b’Status: {}’.format(200)
>    will be the same as b’Status: ‘ + str(200).encode(‘ascii’, ‘strict’)
>
>  - b’Hello %s’ % (‘world’,) - the same as the first example
>
>  - b’Connection: {}’.format(b’keep-alive’) - works
>
>  - b’Hello %s’ % (b'\xce\x94’,) - will fail, not ASCII subset we accept
>
> I think it’s OK to check the buffers for ASCII-subset only. Yes, it
> will have some sort of sub-optimal performance, but then, it’s quite
> rare when string formatting is used to concatenate huge buffers.
>
> 2. new operators {!b} and %b. This ones will just use ‘__bytes__’ and
> Py_buffer.
>
> --
> Yury Selivanov
>
> On January 14, 2014 at 11:31:51 AM, Brett Cannon (brett at python.org) wrote:
> >
> > On Mon, Jan 13, 2014 at 5:14 PM, Guido van Rossum
> > wrote:
> >
> > > On Mon, Jan 13, 2014 at 2:05 PM, Brett Cannon
> > wrote:
> > > > I have been going on the assumption that bytes.format() would
> > change what
> > > > '{}' meant for itself and would only interpolate bytes. That
> > convenient
> > > > between Python 2 and 3 since it represents what we want it to
> > (str and
> > > bytes
> > > > under the hood, respectively), so it just falls through. We
> > could also
> > > add a
> > > > 'b' conversion for bytes() explicitly so as to help people
> > not
> > > accidentally
> > > > mix up things in bytes.format() and str.format(). But I was
> > not
> > > suggesting
> > > > adding a specific format spec for bytes but instead making
> > bytes.format()
> > > > just do the .encode('ascii') automatically to help with compatibility
> > > when a
> > > > format spec was present. If people want fancy formatting for
> > bytes they
> > > can
> > > > always do it themselves before calling bytes.format().
> > >
> > > This seems hastily written (e.g. verb missing :-), and I'm not
> > clear
> > > on what you are (or were) actually proposing. When exactly would
> > > bytes.format() need .encode('ascii')?
> > >
> > > I would be happy to wait a few hours or days for you to to write it
> > up
> > > clearly, rather than responding in a hurry.
> >
> >
> > Sorry about that. Busy day at work + trying to stay on top of this
> > entire
> > conversation was a bit tough. Let me try to lay out what I'm suggesting
> > for
> > bytes.format() in terms of how it changes
> > http://docs.python.org/3/library/string.html#format-string-syntax
> > for bytes.
> >
> > 1. New conversion operator of 'b' that operates as PEP 460 specifies
> > (i.e.
> > tries to get a buffer, else calls __bytes__). The default conversion
> > changes from 's' to 'b'.
> > 2. Use of the conversion field adds an added step of calling
> > str.encode('ascii', 'strict') on the result returned from
> > calling
> > __format__().
> >
> > That's it. So point 1 means that the following would work in Python
> > 3.5::
> >
> > b'Hello, {}, how are you?'.format(b'Guido')
> > b'Hello, {!b}, how are you?'.format(b'Guido')
> >
> > It would produce an error if you used a text argument for 'Guido'
> > since str
> > doesn't define __bytes__ or a buffer. That gives the EIBTI group
> > their
> > bytes.format() where nothing magical happens.
> >
> > For point 2, let's say you have the following in Python 2::
> >
> > 'I have {} bottles of beer on the wall'.format(10)
> >
> > Under my proposal, how would you change it to get the same result
> > in Python
> > 2 and 3?::
> >
> > b'I have {:d} bottles of beer on the wall'.format(10)
> >
> > In Python 2 you're just being more explicit about the format,
> > otherwise
> > it's the same semantics as today. In Python 3, though, this would
> > translate
> > into (under the hood)::
> >
> > b'I have {} bottles of beer on the wall'.format(format(10,
> > 'd').encode('ascii', 'strict'))
> >
> > This leads to the same bytes value in Python 2 (since it's just
> > a string)
> > and in Python 3 (as everything accepted by bytes.format() is
> > either bytes
> > already or converted to from encoding to ASCII bytes). While
> > Python 2 users
> > would need to make sure they used a format spec to get the same result
> > in
> > both Python 2 and 3 for ASCII bytes, it's a minor change which also
> > makes
> > the format more explicit so it's not an inherently bad thing.
> > And for those
> > that don't want to utilize the automatic ASCII encoding they
> > can just not
> > use a format spec in the format string and just pass in bytes directly
> > (i.e. call __format__() themselves and then call str.encode()
> > on their
> > own). So PBP people get to have a simple way to use bytes.format()
> > in
> > Python 2 and 3 when dealing with things that can be represented
> > as ASCII
> > (just as the bytes methods allow for currently).
> >
> > I think this covers your desire to have numbers and anything else
> > that can
> > be represented as ASCII be supported for easy porting while covering
> > my
> > desire that any automatic encoding is clearly explicit in the
> > format string
> > and in no way special-cased for only some types (the introduction
> > of a 'c'
> > converter from PEP 460 is also fine with me).
> >
> > How you would want to translate this proposal with the % operator
> > I'm not
> > sure since it has been quite a while since I last seriously used
> > it and so
> > I don't think I'm in a good position to propose a shift for it.
> > _______________________________________________
> > Python-Dev mailing list
> > Python-Dev at python.org
> > https://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/yselivanov.ml%40gmail.com
> >
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov
>



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140114/77926184/attachment.html>


More information about the Python-Dev mailing list