Argument Clinic "converters" specify how to convert an individual
argument to the function you're defining. Although a converter could
theoretically represent any sort of conversion, most of the time they
directly represent types like "int" or "double" or "str".
Because there's such variety in argument parsing, the converters are
customizable with parameters. Many of these are common enough that
Argument Clinic suggests some standard names. Examples: "zeroes=True"
for strings and buffers means "permit internal \0 characters", and
"bitwise=True" for unsigned integers means "copy the bits over, even if
there's overflow/underflow, and even if the original is negative".
A third example is "nullable=True", which means "also accept None for
this parameter". This was originally intended for use with strings
(compare the "s" and "z" format units for PyArg_ParseTuple), however it
looks like we'll have a use for "nullable ints" in the ongoing Argument
Clinic conversion work.
Several people have said they found the name "nullable" surprising,
suggesting I use another name like "allow_none" or "noneable". I, in
turn, find their surprise surprising; "nullable" is a term long
associated with exactly this concept. It's used in C# and SQL, and the
term even has its own Wikipedia page:
http://en.wikipedia.org/wiki/Nullable_type
Most amusingly, Vala *used* to have an annotation called "(allow-none)",
but they've broken it out into two annotations, "(nullable)" and
"(optional)".
http://blogs.gnome.org/desrt/2014/05/27/allow-none-is-dead-long-live-nullab…
Before you say "the term 'nullable' will confuse end users", let me
remind you: this is not user-facing. This is a parameter for an
Argument Clinic converter, and will only ever be seen by CPython core
developers. A group which I hope is not so easily confused.
It's my contention that "nullable" is the correct name. But I've been
asked to bring up the topic for discussion, to see if a consensus forms
around this or around some other name.
Let the bike-shedding begin,
//arry/
Here is some proposed wording. Since it is more of a clarification of what
it takes to garner support -- which is just a new section -- rather than a
complete rewrite I'm including just the diff to make it easier to read the
changes.
*diff -r 49d18bb47ebc pep-0011.txt*
*--- a/pep-0011.txt Wed May 14 11:18:22 2014 -0400*
*+++ b/pep-0011.txt Fri May 16 13:48:30 2014 -0400*
@@ -2,22 +2,21 @@
Title: Removing support for little used platforms
Version: $Revision$
Last-Modified: $Date$
-Author: martin(a)v.loewis.de (Martin von Löwis)
+Author: Martin von Löwis <martin(a)v.loewis.de>,
+ Brett Cannon <brett(a)python.org>
Status: Active
Type: Process
Content-Type: text/x-rst
Created: 07-Jul-2002
Post-History: 18-Aug-2007
+ 16-May-2014
Abstract
--------
-This PEP documents operating systems (platforms) which are not
-supported in Python anymore. For some of these systems,
-supporting code might be still part of Python, but will be removed
-in a future release - unless somebody steps forward as a volunteer
-to maintain this code.
+This PEP documents how an operating system (platform) garners
+support in Python as well as documenting past support.
Rationale
@@ -37,16 +36,53 @@
change to the Python source code will work on all supported
platforms.
-To reduce this risk, this PEP proposes a procedure to remove code
-for platforms with no Python users.
+To reduce this risk, this PEP specifies what is required for a
+platform to be considered supported by Python as well as providing a
+procedure to remove code for platforms with little or no Python
+users.
+Supporting platforms
+--------------------
+
+Gaining official platform support requires two things. First, a core
+developer needs to volunteer to maintain platform-specific code. This
+core developer can either already be a member of the Python
+development team or be given contributor rights on the basis of
+maintaining platform support (it is at the discretion of the Python
+development team to decide if a person is ready to have such rights
+even if it is just for supporting a specific platform).
+
+Second, a stable buildbot must be provided [2]_. This guarantees that
+platform support will not be accidentally broken by a Python core
+developer who does not have personal access to the platform. For a
+buildbot to be considered stable it requires that the machine be
+reliably up and functioning (but it is up to the Python core
+developers to decide whether to promote a buildbot to being
+considered stable).
+
+This policy does not disqualify supporting other platforms
+indirectly. Patches which are not platform-specific but still done to
+add platform support will be considered for inclusion. For example,
+if platform-independent changes were necessary in the configure
+script which was motivated to support a specific platform that would
+be accepted. Patches which add platform-specific code such as the
+name of a specific platform to the configure script will generally
+not be accepted without the platform having official support.
+
+CPU architecture and compiler support are viewed in a similar manner
+as platforms. For example, to consider the ARM architecture supported
+a buildbot running on ARM would be required along with support from
+the Python development team. In general it is not required to have
+a CPU architecture run under every possible platform in order to be
+considered supported.
Unsupporting platforms
----------------------
-If a certain platform that currently has special code in it is
-deemed to be without Python users, a note must be posted in this
-PEP that this platform is no longer actively supported. This
+If a certain platform that currently has special code in Python is
+deemed to be without Python users or lacks proper support from the
+Python development team and/or a buildbot, a note must be posted in
+this PEP that this platform is no longer actively supported. This
note must include:
- the name of the system
@@ -69,8 +105,8 @@
forward and offer maintenance.
-Resupporting platforms
-----------------------
+Re-supporting platforms
+-----------------------
If a user of a platform wants to see this platform supported
again, he may volunteer to maintain the platform support. Such an
@@ -101,7 +137,7 @@
release is made. Developers of extension modules will generally need
to use the same Visual Studio release; they are concerned both with
the availability of the versions they need to use, and with keeping
-the zoo of versions small. The Python source tree will keep
+the zoo of versions small. The Python source tree will keep
unmaintained build files for older Visual Studio releases, for which
patches will be accepted. Such build files will be removed from the
source tree 3 years after the extended support for the compiler has
@@ -223,6 +259,7 @@
----------
.. [1] http://support.microsoft.com/lifecycle/
+.. [2] http://buildbot.python.org/3.x.stable/
Copyright
---------
Hi David,
I noticed you run the "Builder x86 Ubuntu Shared" buildbot. It seems
it's running a very old version of Ubuntu. Is there any chance of
getting that updated?
Regards,
Benjamin
As promised in the "Move selected documentation repos to PSF BitBucket
account?" thread I've written up a PEP for moving selected repositories from
hg.python.org to Github.
You can see this PEP online at: https://www.python.org/dev/peps/pep-0481/
I've also reproduced the PEP below for inline discussion.
-----------------------
Abstract
========
This PEP proposes migrating to Git and Github for certain supporting
repositories (such as the repository for Python Enhancement Proposals) in a way
that is more accessible to new contributors, and easier to manage for core
developers. This is offered as an alternative to PEP 474 which aims to achieve
the same overall benefits but while continuing to use the Mercurial DVCS and
without relying on a commerical entity.
In particular this PEP proposes changes to the following repositories:
* https://hg.python.org/devguide/
* https://hg.python.org/devinabox/
* https://hg.python.org/peps/
This PEP does not propose any changes to the core development workflow for
CPython itself.
Rationale
=========
As PEP 474 mentions, there are currently a number of repositories hosted on
hg.python.org which are not directly used for the development of CPython but
instead are supporting or ancillary repositories. These supporting repositories
do not typically have complex workflows or often branches at all other than the
primary integration branch. This simplicity makes them very good targets for
the "Pull Request" workflow that is commonly found on sites like Github.
However where PEP 474 wants to continue to use Mercurial and wishes to use an
OSS and self-hosted and therefore restricts itself to only those solutions this
PEP expands the scope of that to include migrating to Git and using Github.
The existing method of contributing to these repositories generally includes
generating a patch and either uploading them to bugs.python.org or emailing
them to peps(a)python.org. This process is unfriendly towards non-comitter
contributors as well as making the process harder than it needs to be for
comitters to accept the patches sent by users. In addition to the benefits
in the pull request workflow itself, this style of workflow also enables
non techincal contributors, especially those who do not know their way around
the DVCS of choice, to contribute using the web based editor. On the committer
side the Pull Requests enable them to tell, before merging, whether or not
a particular Pull Request will break anything. It also enables them to do a
simple "push button" merge which does not require them to check out the
changes locally. Another such feature that is useful in particular for docs,
is the ability to view a "prose" diff. This Github specific feature enables
a committer to view a diff of the rendered output which will hide things like
reformatting a paragraph and show you what the actual "meat" of the change
actually is.
Why Git?
--------
Looking at the variety of DVCS which are available today it becomes fairly
clear that git has gotten the vast mindshare of people who are currently using
it. The Open Hub (Previously Ohloh) statistics [#openhub-stats]_ show that
currently 37% of the repositories Open Hub is indexing is using git which is
second only to SVN (which has 48%) while Mercurial has just 2% of the indexed
repositories (beating only bazaar which has 1%). In additon to the Open Hub
statistics a look at the top 100 projects on PyPI (ordered by total download
counts) shows us that within the Python space itself there is a majority of
projects using git:
=== ========= ========== ====== === ====
Git Mercurial Subversion Bazaar CVS None
=== ========= ========== ====== === ====
62 22 7 4 1 1
=== ========= ========== ====== === ====
Chosing a DVCS which has the larger mindshare will make it more likely that any
particular person who has experience with DVCS at all will be able to
meaningfully use the DVCS that we have chosen without having to learn a new
tool.
In addition to simply making it more likely that any individual will already
know how to use git, the number of projects and people using it means that the
resources for learning the tool are likely to be more fully fleshed out and
when you run into problems the liklihood that someone else had that problem
and posted a question and recieved an answer is also far likelier.
Thirdly by using a more popular tool you also increase your options for tooling
*around* the DVCS itself. Looking at the various options for hosting
repositories it's extremely rare to find a hosting solution (whether OSS or
commerical) that supports Mercurial but does not support Git, on the flip side
there are a number of tools which support Git but do not support Mercurial.
Therefore the popularity of git increases the flexibility of our options going
into the future for what toolchain these projects use.
Also by moving to the more popular DVCS we increase the likelhood that the
knowledge that the person has learned in contributing to these support
repositories will transfer to projects outside of the immediate CPython project
such as to the larger Python community which is primarily using Git hosted on
Github.
In previous years there was concern about how well supported git was on Windows
in comparison to Mercurial. However git has grown to support Windows as a first
class citizen. In addition to that, for Windows users who are not well aquanted
with the Windows command line there are GUI options as well.
On a techincal level git and Mercurial are fairly similar, however the git
branching model is signifcantly better than Mercurial "Named Branches" for
non-comitter contributors. Mercurial does have a "Bookmarks" extension however
this isn't quite as good as git's branching model. All bookmarks live in the
same namespace so it requires individual users to ensure that they namespace
the branchnames themselves lest the risk collision. It also is an extension
which requires new users to first discover they need an extension at all and
then figure out what they need to do in order to enable that extension. Since
it is an extension it also means that in general support for them outside of
Mercurial core is going to be less than 100% in comparison to git where the
feature is built in and core to using git at all. Finally users who are not
used to Mercurial are unlikely to discover bookmarks on their own, instead they
will likely attempt to use Mercurial's "Named Branches" which, given the fact
they live "forever", are not often what a project wants their contributors to
use.
Why Github?
-----------
There are a number of software projects or web services which offer
functionality similar to that of Github. These range from commerical web
services such as a Bitbucket to self-hosted OSS solutions such as Kallithea or
Gitlab. This PEP proposes that we move these repositories to Github.
There are two primary reasons for selecting Github: Popularity and
Quality/Polish.
Github is currently the most popular hosted repository hosting according to
Alexa where it currently has a global rank of 121. Much like for Git itself by
choosing the most popular tool we gain benefits in increasing the likelhood
that a new contributor will have already experienced the toolchain, the quality
and availablity of the help, more and better tooling being built around it, and
the knowledge transfer to other projects. A look again at the top 100 projects
by download counts on PyPI shows the following hosting locations:
====== ========= =========== ========= =========== ==========
GitHub BitBucket Google Code Launchpad SourceForge Other/Self
====== ========= =========== ========= =========== ==========
62 18 6 4 3 7
====== ========= =========== ========= =========== ==========
In addition to all of those reasons, Github also has the benefit that while
many of the options have similar features when you look at them in a feature
matrix the Github version of each of those features tend to work better and be
far more polished. This is hard to quantify objectively however it is a fairly
common sentiment if you go around and ask people who are using these services
often.
Finally a reason to choose a web service at all over something that is
self-hosted is to be able to more efficiently use volunteer time and donated
resources. Every additional service hosted on the PSF infrastruture by the
PSF infrastructure team further spreads out the amount of time that the
volunteers on that team have to spend and uses some chunk of resources that
could potentionally be used for something where there is no free or affordable
hosted solution available.
One concern that people do have with using a hosted service is that there is a
lack of control and that at some point in the future the service may no longer
be suitable. It is the opinion of this PEP that Github does not currently and
has not in the past engaged in any attempts to lock people into their platform
and that if at some point in the future Github is no longer suitable for one
reason or another than at that point we can look at migrating away from Github
onto a different solution. In other words, we'll cross that bridge if and when
we come to it.
Example: Scientific Python
--------------------------
One of the key ideas behind the move to both git and Github is that a feature
of a DVCS, the repository hosting, and the workflow used is the social network
and size of the community using said tools. We can see this is true by looking
at an example from a sub-community of the Python community: The Scientific
Python community. They have already migrated most of the key pieces of the
SciPy stack onto Github using the Pull Request based workflow starting with
IPython and as more projects moved over it became a natural default for new
projects.
They claim to have seen a great benefit from this move, where it enables casual
contributors to easily move between different projects within their
sub-community without having to learn a special, bespoke workflow and a
different toolchain for each project. They've found that when people can use
their limited time on actually contributing instead of learning the different
tools and workflows that not only do they contribute more to one project, that
they also expand out and contribute to other projects. This move is also
attributed to making it commonplace for members of that community to go so far
as publishing their research and educational materials on Github as well.
This showcases the real power behind moving to a highly popular toolchain and
workflow, as each variance introduces yet another hurdle for new and casual
contributors to get past and it makes the time spent learning that workflow
less reusable with other projects.
Migration
=========
Through the use of hg-git [#hg-git]_ we can easily convert a Mercurial
repository to a Git repository by simply pushing the Mercurial repository to
the Git repository. People who wish to continue to use Mercurual locally can
then use hg-git going into the future using the new Github URL, however they
will need to re-clone their repositories as using Git as the server seems to
trigger a one time change of the changeset ids.
As none of the selected repositories have any tags, branches, or bookmarks
other than the ``default`` branch the migration will simply map the ``default``
branch in Mercurial to the ``master`` branch in git.
In addition since none of the selected projects have any great need of a
complex bug tracker, they will also migrate their issue handling to using the
GitHub issues.
In addition to the migration of the repository hosting itself there are a
number of locations for each particular repository which will require updating.
The bulk of these will simply be changing commands from the hg equivilant to
the git equivilant.
In particular this will include:
* Updating www.python.org to generate PEPs using a git clone and link to
Github.
* Updating docs.python.org to pull from Github instead of hg.python.org for the
devguide.
* Enabling the ability to send an email to python-checkins(a)python.org for each
push.
* Enabling the ability to send an IRC message to #python-dev on Freenode for
each push.
* Migrate any issues for these projects to their respective bug tracker on
Github.
This will restore these repositories to similar functionality as they currently
have. In addition to this the migration will also include enabling testing for
each pull request using Travis CI [#travisci]_ where possible to ensure that
a new PR does not break the ability to render the documentation or PEPs.
User Access
===========
Moving to Github would involve adding an additional user account that will need
to be managed, however it also offers finer grained control, allowing the
ability to grant someone access to only one particular repository instead of
the coarser grained ACLs available on hg.python.org.
References
==========
.. [#openhub-stats] `Open Hub Statistics <https://www.openhub.net/repositories/compare>`
.. [#hg-git] `hg-git <https://hg-git.github.io/>`
.. [#travisci] `Travis CI <https://travis-ci.org/>`
---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
The current memory layout for dictionaries is
unnecessarily inefficient. It has a sparse table of
24-byte entries containing the hash value, key pointer,
and value pointer.
Instead, the 24-byte entries should be stored in a
dense table referenced by a sparse table of indices.
For example, the dictionary:
d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'}
is currently stored as:
entries = [['--', '--', '--'],
[-8522787127447073495, 'barry', 'green'],
['--', '--', '--'],
['--', '--', '--'],
['--', '--', '--'],
[-9092791511155847987, 'timmy', 'red'],
['--', '--', '--'],
[-6480567542315338377, 'guido', 'blue']]
Instead, the data should be organized as follows:
indices = [None, 1, None, None, None, 0, None, 2]
entries = [[-9092791511155847987, 'timmy', 'red'],
[-8522787127447073495, 'barry', 'green'],
[-6480567542315338377, 'guido', 'blue']]
Only the data layout needs to change. The hash table
algorithms would stay the same. All of the current
optimizations would be kept, including key-sharing
dicts and custom lookup functions for string-only
dicts. There is no change to the hash functions, the
table search order, or collision statistics.
The memory savings are significant (from 30% to 95%
compression depending on the how full the table is).
Small dicts (size 0, 1, or 2) get the most benefit.
For a sparse table of size t with n entries, the sizes are:
curr_size = 24 * t
new_size = 24 * n + sizeof(index) * t
In the above timmy/barry/guido example, the current
size is 192 bytes (eight 24-byte entries) and the new
size is 80 bytes (three 24-byte entries plus eight
1-byte indices). That gives 58% compression.
Note, the sizeof(index) can be as small as a single
byte for small dicts, two bytes for bigger dicts and
up to sizeof(Py_ssize_t) for huge dict.
In addition to space savings, the new memory layout
makes iteration faster. Currently, keys(), values, and
items() loop over the sparse table, skipping-over free
slots in the hash table. Now, keys/values/items can
loop directly over the dense table, using fewer memory
accesses.
Another benefit is that resizing is faster and
touches fewer pieces of memory. Currently, every
hash/key/value entry is moved or copied during a
resize. In the new layout, only the indices are
updated. For the most part, the hash/key/value entries
never move (except for an occasional swap to fill a
hole left by a deletion).
With the reduced memory footprint, we can also expect
better cache utilization.
For those wanting to experiment with the design,
there is a pure Python proof-of-concept here:
http://code.activestate.com/recipes/578375
YMMV: Keep in mind that the above size statics assume a
build with 64-bit Py_ssize_t and 64-bit pointers. The
space savings percentages are a bit different on other
builds. Also, note that in many applications, the size
of the data dominates the size of the container (i.e.
the weight of a bucket of water is mostly the water,
not the bucket).
Raymond
Hi all,
There was some discussion on python-ideas last month about how to make
it easier/more reliable for a module to override attribute access.
This is useful for things like autoloading submodules (accessing
'foo.bar' triggers the import of 'bar'), or for deprecating module
attributes that aren't functions. (Accessing 'foo.bar' emits a
DeprecationWarning, "the bar attribute will be removed soon".) Python
has had some basic support for this for a long time -- if a module
overwrites its entry in sys.modules[__name__], then the object that's
placed there will be returned by 'import'. This allows one to define
custom subclasses of module and use them instead of the default,
similar to how metaclasses allow one to use custom subclasses of
'type'.
In practice though it's very difficult to make this work safely and
correctly for a top-level package. The main problem is that when you
create a new object to stick into sys.modules, this necessarily means
creating a new namespace dict. And now you have a mess, because now
you have two dicts: new_module.__dict__ which is the namespace you
export, and old_module.__dict__, which is the globals() for the code
that's trying to define the module namespace. Keeping these in sync is
extremely error-prone -- consider what happens, e.g., when your
package __init__.py wants to import submodules which then recursively
import the top-level package -- so it's difficult to justify for the
kind of large packages that might be worried about deprecating entries
in their top-level namespace. So what we'd really like is a way to
somehow end up with an object that (a) has the same __dict__ as the
original module, but (b) is of our own custom module subclass. If we
can do this then metamodules will become safe and easy to write
correctly.
(There's a little demo of working metamodules here:
https://github.com/njsmith/metamodule/
but it uses ctypes hacks that depend on non-stable parts of the
CPython ABI, so it's not a long-term solution.)
I've now spent some time trying to hack this capability into CPython
and I've made a list of the possible options I can think of to fix
this. I'm writing to python-dev because none of them are obviously The
Right Way so I'd like to get some opinions/ruling/whatever on which
approach to follow up on.
Option 1: Make it possible to change the type of a module object
in-place, so that we can write something like
sys.modules[__name__].__class__ = MyModuleSubclass
Option 1 downside: The invariants required to make __class__
assignment safe are complicated, and only implemented for
heap-allocated type objects. PyModule_Type is not heap-allocated, so
making this work would require lots of delicate surgery to
typeobject.c. I'd rather not go down that rabbit-hole.
----
Option 2: Make PyModule_Type into a heap type allocated at interpreter
startup, so that the above just works.
Option 2 downside: PyModule_Type is exposed as a statically-allocated
global symbol, so doing this would involve breaking the stable ABI.
----
Option 3: Make it legal to assign to the __dict__ attribute of a
module object, so that we can write something like
new_module = MyModuleSubclass(...)
new_module.__dict__ = sys.modules[__name__].__dict__
sys.modules[__name__].__dict__ = {} # ***
sys.modules[__name__] = new_module
The line marked *** is necessary because the way modules are designed,
they expect to control the lifecycle of their __dict__. When the
module object is initialized, it fills in a bunch of stuff in the
__dict__. When the module object (not the dict object!) is
deallocated, it deletes everything from the __dict__. This latter
feature in particular means that having two module objects sharing the
same __dict__ is bad news.
Option 3 downside: The paragraph above. Also, there's stuff inside the
module struct besides just the __dict__, and more stuff has appeared
there over time.
----
Option 4: Add a new function sys.swap_module_internals, which takes
two module objects and swaps their __dict__ and other attributes. By
making the operation a swap instead of an assignment, we avoid the
lifecycle pitfalls from Option 3. By making it a builtin, we can make
sure it always handles all the module fields that matter, not just
__dict__. Usage:
new_module = MyModuleSubclass(...)
sys.swap_module_internals(new_module, sys.modules[__name__])
sys.modules[__name__] = new_module
Option 4 downside: Obviously a hack.
----
Option 3 or 4 both seem workable, it just depends on which way we
prefer to hold our nose. Option 4 is slightly more correct in that it
works for *all* modules, but OTOH at the moment the only time Option 3
*really* fails is for compiled modules with PEP 3121 metadata, and
compiled modules can already use a module subclass via other means
(since they instantiate their own module objects).
Thoughts? Suggestions on other options I've missed? Should I go ahead
and write a patch for one of these?
-n
--
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
Hi,
When I try to iterate through the lines of a
file("openssl-1.0.1j/crypto/bn/asm/x86_64-gcc.c"), I get a
UnicodeDecodeError (in python 3.4.0 on Ubuntu 14.04). But there is no
such error with python 2.7.6. What could be the problem?
In [39]: with open("openssl-1.0.1j/crypto/bn/asm/x86_64-gcc.c") as f:
for line in f:
print (line)
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-39-24a3ae32a691> in <module>()
1 with open("../openssl-1.0.1j/crypto/bn/asm/x86_64-gcc.c") as f:
----> 2 for line in f:
3 print (line)
4
/usr/lib/python3.4/codecs.py in decode(self, input, final)
311 # decode input (taking the buffer into account)
312 data = self.buffer + input
--> 313 (result, consumed) = self._buffer_decode(data,
self.errors, final)
314 # keep undecoded input until the next call
315 self.buffer = data[consumed:]
--
:-)balaji
I have a strong suspicion that I'm missing something; I have been
persuaded both directions too often to believe I have a grip on the
real issue.
So I'm putting out some assumptions; please tell me if I'm wrong, and
maybe make them more explicit in the PEP.
(1) The change will only affect situations where StopIteration is
currently raised as an Exception -- i.e., it leaks past the bounds of
a loop.
(2) This can happen because of an explicit raise StopIteration. This
is currently a supported idiom, and that is changing with PEP 479.
(2a) Generators in the unwind path will now need to catch and reraise.
(3) It can also happen because of an explicit next statement (as
opposed the the implicit next of a loop).
This is currently supported; after PEP 479, the next statement should
be wrapped in a try statement, so that the intent will be explicit.
(4) It can happen because of "yield from" yielding from an iterator,
rather than a generator?
(5) There is no other case where this can happen? (So the generator
comprehension case won't matter unless it also includes one of the
earlier cases.)
-jJ
Hi,
I'm trying to follow the discussion about the PEP 479 (Change
StopIteration handling inside generators), but it's hard to read all
messages. I'm concerned by trollius and asyncio which heavily rely on
StopIteration.
Trollius currently supports running asyncio coroutines: a trollius
coroutine can executes an asyncio coroutine, and and asyncio coroutine
can execute a trollius coroutine.
I modified the Return class of Trollius to not inherit from
StopIteration. All trollius tests pass on Python 3.3 except on one
(which makes me happy, the test suite is wide enough to detect bugs
;-)): test_trollius_in_asyncio.
This specific test executes an asyncio which executes a trollius coroutine.
https://bitbucket.org/enovance/trollius/src/873d21ac0badec36835ed24d13e2aed…
The problem is that an asyncio coroutine cannot execute a Trollius
coroutine anymore: "yield from coro" raises a Return exception instead
of simply "stopping" the generator and return the result (value passed
to Return).
I don't see how an asyncio coroutine calling "yield from
trollius_coroutine" can handle the Return exception if it doesn't
inherit from StopIteration. It means that I have to drop this feature
in Python 3.5 (or later when the PEP 479 becomes effective)?
I'm talking about the current behaviour of Python 3.3, I didn't try
the PEP 479 (I don't know if an exception exists).
Victor
Hi,
I have serious concerns about this PEP, and would ask you to reconsider it.
[ Very short summary:
Generators are not the problem. It is the naive use of next() in an
iterator that is the problem. (Note that all the examples involve calls
to next()).
Change next() rather than fiddling with generators.
]
I have five main concerns with PEP 479.
1. Is the problem, as stated by the PEP, really the problem at all?
2. The proposed solution does not address the underlying problem.
3. It breaks a fundamental aspect of generators, that they are iterators.
4. This will be a hindrance to porting code from Python 2 to Python 3.
5. The behaviour of next() is not considered, even though it is the real
cause of the problem (if there is a problem).
1. The PEP states that "The interaction of generators and StopIteration
is currently somewhat surprising, and can conceal obscure bugs."
I don't believe that to be the case; if someone knows what StopIteration
is and how it is used, then the interaction is entirely as expected.
I believe the naive use of next() in an iterator to be the underlying
problem.
The interaction of generators and next() is just a special case of this.
StopIteration is not a normal exception, indicating a problem, rather it
exists to signal exhaustion of an iterator.
However, next() raises StopIteration for an exhausted iterator, which
really is an error.
Any iterator code (generator or __next__ method) that calls next()
treats the StopIteration as a normal exception and propogates it.
The controlling loop then interprets StopIteration as a signal to stop
and thus stops.
*The problem is the implicit shift from signal to error and back to signal.*
2. The proposed solution does not address this issue at all, but rather
legislates against generators raising StopIteration.
3. Generators and the iterator protocol were introduced in Python 2.2,
13 years ago.
For all of that time the iterator protocol has been defined by the
__iter__(), next()/__next__() methods and the use of StopIteration to
terminate iteration.
Generators are a way to write iterators without the clunkiness of
explicit __iter__() and next()/__next__() methods, but have always
obeyed the same protocol as all other iterators. This has allowed code
to rewritten from one form to the other whenever desired.
Do not forget that despite the addition of the send() and throw()
methods and their secondary role as coroutines, generators have
primarily always been a clean and elegant way of writing iterators.
4. Porting from Python 2 to Python 3 seems to be hard enough already.
5. I think I've already covered this in the other points, but to
reiterate (excuse the pun):
Calling next() on an exhausted iterator is, I would suggest, a logical
error.
However, next() raises StopIteration which is really a signal to the
controlling loop.
The fault is with next() raising StopIteration.
Generators raising StopIteration is not the problem.
It also worth noting that calling next() is the only place a
StopIteration exception is likely to occur outside of the iterator protocol.
An example
----------
Consider a function to return the value from a set with a single member.
def value_from_singleton(s):
if len(s) < 2: #Intentional error here (should be len(s) == 1)
return next(iter(s))
raise ValueError("Not a singleton")
Now suppose we pass an empty set to value_from_singleton(s), then we get
a StopIteration exception, which is a bit weird, but not too bad.
However it is when we use it in a generator (or in the __next__ method
of an iterator) that we get a serious problem.
Currently the iterator appears to be exhausted early, which is wrong.
However, with the proposed change we get RuntimeError("generator raised
StopIteration") raised, which is also wrong, just in a different way.
Solutions
---------
My preferred "solution" is to do nothing except improving the
documentation of next(). Explain that it can raise StopIteration which,
if allowed to propogate can cause premature exhaustion of an iterator.
If something must be done then I would suggest changing the behaviour of
next() for an exhausted iterator.
Rather than raise StopIteration it should raise ValueError (or IndexError?).
Also, it might be worth considering making StopIteration inherit from
BaseException, rather than Exception.
Cheers,
Mark.
P.S. 5 days seems a rather short time to respond to a PEP.
Could we make it at least a couple of weeks in the future,
or better still specify a closing date for comments.