Minor changes: updated version numbers, add punctuation.
The current text seems to take into account Guido's last comments.
Thoughts before asking for acceptance?
PEP: 467
Title: Minor API improvements for binary sequences
Version: $Revision$
Last-Modified: $Date$
Author: Nick Coghlan <ncoghlan(a)gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 2014-03-30
Python-Version: 3.5
Post-History: 2014-03-30 2014-08-15 2014-08-16
Abstract
========
During the initial development of the Python 3 language specification,
the core ``bytes`` type for arbitrary binary data started as the mutable
type that is now referred to as ``bytearray``. Other aspects of
operating in the binary domain in Python have also evolved over the
course of the Python 3 series.
This PEP proposes four small adjustments to the APIs of the ``bytes``,
``bytearray`` and ``memoryview`` types to make it easier to operate
entirely in the binary domain:
* Deprecate passing single integer values to ``bytes`` and ``bytearray``
* Add ``bytes.zeros`` and ``bytearray.zeros`` alternative constructors
* Add ``bytes.byte`` and ``bytearray.byte`` alternative constructors
* Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and
``memoryview.iterbytes`` alternative iterators
Proposals
=========
Deprecation of current "zero-initialised sequence" behaviour
------------------------------------------------------------
Currently, the ``bytes`` and ``bytearray`` constructors accept an
integer argument and interpret it as meaning to create a
zero-initialised sequence of the given size::
>>> bytes(3)
b'\x00\x00\x00'
>>> bytearray(3)
bytearray(b'\x00\x00\x00')
This PEP proposes to deprecate that behaviour in Python 3.6, and remove
it entirely in Python 3.7.
No other changes are proposed to the existing constructors.
Addition of explicit "zero-initialised sequence" constructors
-------------------------------------------------------------
To replace the deprecated behaviour, this PEP proposes the addition of
an explicit ``zeros`` alternative constructor as a class method on both
``bytes`` and ``bytearray``::
>>> bytes.zeros(3)
b'\x00\x00\x00'
>>> bytearray.zeros(3)
bytearray(b'\x00\x00\x00')
It will behave just as the current constructors behave when passed a
single integer.
The specific choice of ``zeros`` as the alternative constructor name is
taken from the corresponding initialisation function in NumPy (although,
as these are 1-dimensional sequence types rather than N-dimensional
matrices, the constructors take a length as input rather than a shape
tuple).
Addition of explicit "single byte" constructors
-----------------------------------------------
As binary counterparts to the text ``chr`` function, this PEP proposes
the addition of an explicit ``byte`` alternative constructor as a class
method on both ``bytes`` and ``bytearray``::
>>> bytes.byte(3)
b'\x03'
>>> bytearray.byte(3)
bytearray(b'\x03')
These methods will only accept integers in the range 0 to 255 (inclusive)::
>>> bytes.byte(512)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: bytes must be in range(0, 256)
>>> bytes.byte(1.0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'float' object cannot be interpreted as an integer
The documentation of the ``ord`` builtin will be updated to explicitly
note that ``bytes.byte`` is the inverse operation for binary data, while
``chr`` is the inverse operation for text data.
Behaviourally, ``bytes.byte(x)`` will be equivalent to the current
``bytes([x])`` (and similarly for ``bytearray``). The new spelling is
expected to be easier to discover and easier to read (especially when
used in conjunction with indexing operations on binary sequence types).
As a separate method, the new spelling will also work better with higher
order functions like ``map``.
Addition of optimised iterator methods that produce ``bytes`` objects
---------------------------------------------------------------------
This PEP proposes that ``bytes``, ``bytearray`` and ``memoryview`` gain
an optimised ``iterbytes`` method that produces length 1 ``bytes``
objects rather than integers::
for x in data.iterbytes():
# x is a length 1 ``bytes`` object, rather than an integer
The method can be used with arbitrary buffer exporting objects by
wrapping them in a ``memoryview`` instance first::
for x in memoryview(data).iterbytes():
# x is a length 1 ``bytes`` object, rather than an integer
For ``memoryview``, the semantics of ``iterbytes()`` are defined such that::
memview.tobytes() == b''.join(memview.iterbytes())
This allows the raw bytes of the memory view to be iterated over without
needing to make a copy, regardless of the defined shape and format.
The main advantage this method offers over the ``map(bytes.byte, data)``
approach is that it is guaranteed *not* to fail midstream with a
``ValueError`` or ``TypeError``. By contrast, when using the ``map``
based approach, the type and value of the individual items in the
iterable are only checked as they are retrieved and passed through the
``bytes.byte`` constructor.
Design discussion
=================
Why not rely on sequence repetition to create zero-initialised sequences?
-------------------------------------------------------------------------
Zero-initialised sequences can be created via sequence repetition::
>>> b'\x00' * 3
b'\x00\x00\x00'
>>> bytearray(b'\x00') * 3
bytearray(b'\x00\x00\x00')
However, this was also the case when the ``bytearray`` type was
originally designed, and the decision was made to add explicit support
for it in the type constructor. The immutable ``bytes`` type then
inherited that feature when it was introduced in PEP 3137.
This PEP isn't revisiting that original design decision, just changing
the spelling as users sometimes find the current behaviour of the binary
sequence constructors surprising. In particular, there's a reasonable
case to be made that ``bytes(x)`` (where ``x`` is an integer) should
behave like the ``bytes.byte(x)`` proposal in this PEP. Providing both
behaviours as separate class methods avoids that ambiguity.
References
==========
.. [1] Initial March 2014 discussion thread on python-ideas
(https://mail.python.org/pipermail/python-ideas/2014-March/027295.html)
.. [2] Guido's initial feedback in that thread
(https://mail.python.org/pipermail/python-ideas/2014-March/027376.html)
.. [3] Issue proposing moving zero-initialised sequences to a dedicated API
(http://bugs.python.org/issue20895)
.. [4] Issue proposing to use calloc() for zero-initialised binary sequences
(http://bugs.python.org/issue21644)
.. [5] August 2014 discussion thread on python-dev
(https://mail.python.org/pipermail/python-ideas/2014-March/027295.html)
Copyright
=========
This document has been placed in the public domain.
Hi all,
I noticed __qualname__ is exposed by locals() while defining a class. This
is handy but I'm not sure about its status: is it standard or just an
artifact of the current implementation? (btw, the pycodestyle linter
-former pep8- rejects its usage). I was unable to find any reference to
this behavior in PEP 3155 nor in the language reference.
Thank you in advance
--
Carlos
Hi,
I see many PEPs accepted for Python 3.6, or stil in draft status, but
only a few final PEPs. What is happening?
Reminder: the deadline for new features in Python 3.6 is 2016-09-12,
only in 2 months and these 2 months are summer in the northern
hemisphere which means holiday for many of them...
Python 3.6 schedule and What's New in Python 3.6 list some PEPs:
https://www.python.org/dev/peps/pep-0494/https://docs.python.org/dev/whatsnew/3.6.html
"PEP 499 -- python -m foo should bind sys.modules['foo'] in addition
to sys.modules['__main__']"
https://www.python.org/dev/peps/pep-0499/
=> draft
"PEP 498 -- Literal String Interpolation"
https://www.python.org/dev/peps/pep-0498/
=> accepted -- it's merged in Python 3.6, the status should be updated
to Final no?
"PEP 495 -- Local Time Disambiguation"
https://www.python.org/dev/peps/pep-0495/
=> accepted
Alexander Belopolsky asked for a review of the implementation:
https://mail.python.org/pipermail/python-dev/2016-June/145450.html
"PEP 447 -- Add __getdescriptor__ method to metaclass"
https://www.python.org/dev/peps/pep-0447/
=> draft
"PEP 487 -- Simpler customisation of class creation"
https://www.python.org/dev/peps/pep-0487/
=> draft
"PEP 520 -- Preserving Class Attribute Definition Order"
https://www.python.org/dev/peps/pep-0520/
=> accepted -- what is the status of its implementation?
"PEP 519 -- Adding a file system path protocol"
https://www.python.org/dev/peps/pep-0519/
=> accepted
"PEP 467 -- Minor API improvements for binary sequences"
https://www.python.org/dev/peps/pep-0467
=> draft -- I saw recently some discussions around this PEP (on python-ideas?)
It looks like os.fspath() exists, so the PEP is implemented. Its
status should be Final, but the PEP should also be mentioned in What's
New in Python 3.6 please.
I also see some discussions for even more compact dict implementation.
I wrote 3 PEPs, but I didn't have time recently to work of them (to
make progress on the implementation of FAT Python):
"PEP 509 -- Add a private version to dict"
https://www.python.org/dev/peps/pep-0509/
=> draft
Pyjion, Cython, and Yury Selivanov are interested to use this feature,
but last time I asked Guido, he didn't seem convinced by the
advantages of the PEP.
"PEP 510 -- Specialize functions with guards"
https://www.python.org/dev/peps/pep-0510/
"PEP 511 -- API for code transformers"
https://www.python.org/dev/peps/pep-0511/
These two PEPs are directly related to my FAT Python work. I was asked
to prove that FAT Python makes CPython faster. Sadly, I failed to
prove that. Moreover, it took me almost 2 months (and I'm not done
yet!) to get stable benchmarks results on Python. I want to make sure
that my changes don't make Python slower (don't introduce Python
regressions), but the CPython benchmark is unstable, some benchmarks
are very unstable. To get more information, follow the
speed(a)python.org mailing list ;-)
I probably forgot some PEPs, there are so many PEPs in the draft state :-(
Victor
Hi,
as you probably already know, today the PyPI index page (
https://pypi.python.org/pypi?%3Aaction=index) was deprecated and ceased to
be.
Among other things it affected PyCharm IDE that relied on that page to
enable packaging related features from the IDE. As a result users of
PyCharm can no longer install/update PyPI packages from PyCharm.
Here is an issue about that in our tracker:
https://youtrack.jetbrains.com/issue/PY-20081
Given that there are several hundred thouthands of PyCharm users in the
world -- all 3 editions: Professional, Community, and Educational are
affected -- this can lead to a storm of a negative feedback, when people
will start to face the denial of the service.
The deprecation of the index was totally unexpected for us and we weren't
prepared for that. Maybe we missed some announcement.
We will be very happy if the functionality of the index is restored at
least for some short
period of time: please, give as a couple of weeks. That will allow us to
implement a workaround and provide the fix for the several latest major
versions of PyChram.
Does anybody know who is responsible for that decision and whom to connect
about it? Please help.
Best regards,
Dmitry Trofimov
PyCharm Team Lead
JetBrainshttp://www.jetbrains.com
The Drive To Develop
I was using Py_LIMITED_API under 3.5 and PY_SSIZE_T_CLEAN was set, this
causes some functions not in the limited api to be used and the resulting
extension segfaults in Linux. Is that right?
Thanks,
Daniel
I'm in the process of trying to disentangle
http://bugs.python.org/issue27137 which points out some of the
behavioural differences that arise when falling back from the original
C implementation of functools.partial to the pure Python emulation
that uses a closure.
That issue was opened due to a few things that work with the C
implementation that fail with the Python implementation:
- the C version can be pickled (and hence used with multiprocessing)
- the C version can be subclassed
- the C version can be used in "isinstance" checks
- the C version behaves as a static method, the Python version as a
normal instance method
While I'm planning to accept the patch that converts the pure Python
version to a full class that matches the semantics of the C version in
these areas as well as in its core behaviour, that last case is one
where the pure Python version merely exhibits different behaviour from
the C version, rather than failing outright.
Given that the issues that arose in this case weren't at all obvious
up front, what do folks think of the idea of updating PEP 399 to
explicitly prohibit class/function mismatches between accelerator
modules and their pure Python counterparts?
The rationale for making such a change is that when it comes to true
drop-in API compatibility, we have reasonable evidence that "they're
both callables" isn't sufficient once the complexities of real world
applications enter the picture.
Regards,
Nick.
--
Nick Coghlan | ncoghlan(a)gmail.com | Brisbane, Australia
On behalf of the Python development community and the Python 3.6 release
team, I'm happy to announce the availability of Python 3.6.0a3.
3.6.0a3 is the third of four planned alpha releases of Python 3.6,
the next major release of Python. During the alpha phase, Python 3.6
remains under heavy development: additional features will be added
and existing features may be modified or deleted. Please keep in mind
that this is a preview release and its use is not recommended for
production environments.
You can find Python 3.6.0a3 here:
https://www.python.org/downloads/release/python-360a3/
The next release of Python 3.6 will be 3.6.0a4, currently scheduled for
2016-08-15.
--Ned
--
Ned Deily
nad(a)python.org -- []
Hi,
I am looking into how the python compiler generates basic blocks during the CFG generation process and my expectations from CFG theory seems to be at odds with how the python compiler actually generates its CFG. Take the following code snippet for example:
def median(pool):
copy = sorted(pool)
size = len(copy)
if size % 2 == 1:
return copy[(size - 1) / 2]
else:
return (copy[size/2 - 1] + copy[size/2]) / 2
From my understanding of basic blocks in compilers, the above code snippet should have at least 3 basic blocks as follows:
1. Block 1 - all instructions up to and including those for the if test.
2. Block 2 - all instructions for the if body i.e the first return statement.
3. Block 3 - instructions for the else block i.e. the second return statement.
My understanding of the the section on Control flow Graphs in the “Design of the CPython Compiler” also alludes to this -
>> As an example, consider an ‘if’ statement with an ‘else’ block. The guard on the ‘if’ is a basic block which is pointed to by the basic block containing the code leading to the ‘if’ statement. The ‘if’ statement block contains jumps (which are exit points) to the true body of the ‘if’ and the ‘else’ body (which may be NULL), each of which are their own basic blocks. Both of those blocks in turn point to the basic block representing the code following the entire ‘if’ statement.
The CPython compiler however seems to generate 2 basic blocks for the above snippets -
1. Block 1 - all instructions up to and including the if statement and the body of the if statement (the first return statement in this case)
2. Block 2 - instructions for the else block (the second return statement)
Is there any reason for this or have I somehow missed something in the CFG generation process?
Regards,
Obi