At the moment, the array module of the standard library allows to
create arrays of different numeric types and to initialize them from
an iterable (eg, another array).
What's missing is the possiblity to specify the final size of the
array (number of items), especially for large arrays.
I'm thinking of suffix arrays (a text indexing data structure) for
large texts, eg the human genome and its reverse complement (about 6
billion characters from the alphabet ACGT).
The suffix array is a long int array of the same size (8 bytes per
number, so it occupies about 48 GB memory).
At the moment I am extending an array in chunks of several million
items at a time at a time, which is slow and not elegant.
The function below also initializes each item in the array to a given
value (0 by default).
Is there a reason why there the array.array constructor does not allow
to simply specify the number of items that should be allocated? (I do
not really care about the contents.)
Would this be a worthwhile addition to / modification of the array module?
My suggestions is to modify array generation in such a way that you
could pass an iterator (as now) as second argument, but if you pass a
single integer value, it should be treated as the number of items to
Here is my current workaround (which is slow):
def filled_array(typecode, n, value=0, bsize=(1<<22)):
"""returns a new array with given typecode
(eg, "l" for long int, as in the array module)
with n entries, initialized to the given value (default 0)
a = array.array(typecode, [value]*bsize)
x = array.array(typecode)
r = n
while r >= bsize:
r -= bsize
I'd like to propose adding the ability for context managers to catch and
handle control passing into and out of them via yield and generator.send()
def __init__(self, path):
self.inner_path = path
self.outer_path = os.getcwd()
def __exit__(self, exc_type, exc_val, exc_tb):
self.inner_path = os.getcwd()
self.outer_path = os.getcwd()
Here __yield__() would be called when control is yielded through the with
block and __send__() would be called when control is returned via .send()
or .next(). To maintain compatibility, it would not be an error to leave
either __yield__ or __send__ undefined.
The rationale for this is that it's sometimes useful for a context manager
to set global or thread-global state as in the example above, but when the
code is used in a generator, the author of the generator needs to make
assumptions about what the calling code is doing. e.g.
Even if the author of this generator knows what effect do_something() and
do_something_else() have on the current working directory, the author needs
to assume that the caller of the generator isn't touching the working
directory. For instance, if someone were to create two my_generator()
generators with different paths and advance them alternately, the resulting
behaviour could be most unexpected. With the proposed change, the context
manager would be able to handle this so that the author of the generator
doesn't need to make these assumptions.
Naturally, nested with blocks would be handled by calling __yield__ from
innermost to outermost and __send__ from outermost to innermost.
I rather suspect that if this change were included, someone could come up
with a variant of the contextlib.contextmanager decorator to simplify
writing generators for this sort of situation.
J. D. Bartlett
I think it would be a good idea if Python tracebacks could be translated
into languages other than English - and it would set a good example.
For example, using French as my default local language, instead of
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: integer division or modulo by zero
I might get something like
Suivi d'erreur (appel le plus récent en dernier) :
Fichier "<stdin>", à la ligne 1, dans <module>
ZeroDivisionError: division entière ou modulo par zéro
Greg Ewing wrote:
> Mark Shannon wrote:
>> Why not have proper co-routines, instead of hacked-up generators?
> What do you mean by a "proper coroutine"?
A parallel, non-concurrent, thread of execution.
It should be able to transfer control from arbitrary places in
execution, not within generators.
Stackless provides coroutines. Greenlets are also coroutines (I think).
Lua has them, and is implemented in ANSI C, so it can be done portably.
(One of the examples in the paper uses coroutines to implement
generators, which is obviously not required in Python :) )
Here's an updated version of the PEP reflecting my
recent suggestions on how to eliminate 'codef'.
Author: Gregory Ewing <greg.ewing(a)canterbury.ac.nz>
Type: Standards Track
A syntax is proposed for defining and calling a special type of generator
called a 'cofunction'. It is designed to provide a streamlined way of
writing generator-based coroutines, and allow the early detection of
certain kinds of error that are easily made when writing such code, which
otherwise tend to cause hard-to-diagnose symptoms.
This proposal builds on the 'yield from' mechanism described in PEP 380,
and describes some of the semantics of cofunctions in terms of it. However,
it would be possible to define and implement cofunctions independently of
PEP 380 if so desired.
A cofunction is a special kind of generator, distinguished by the presence
of the keyword ``cocall`` (defined below) at least once in its body. It may
also contain ``yield`` and/or ``yield from`` expressions, which behave as
they do in other generators.
From the outside, the distinguishing feature of a cofunction is that it cannot
be called the same way as an ordinary function. An exception is raised if an
ordinary call to a cofunction is attempted.
Calls from one cofunction to another are made by marking the call with
a new keyword ``cocall``. The expression
cocall f(*args, **kwds)
is evaluated by first checking whether the object ``f`` implements
a ``__cocall__`` method. If it does, the cocall expression is
yield from f.__cocall__(*args, **kwds)
except that the object returned by __cocall__ is expected to be an
iterator, so the step of calling iter() on it is skipped.
If ``f`` does not have a ``__cocall__`` method, or the ``__cocall__``
method returns ``NotImplemented``, then the cocall expression is
treated as an ordinary call, and the ``__call__`` method of ``f``
Objects which implement __cocall__ are expected to return an object
obeying the iterator protocol. Cofunctions respond to __cocall__ the
same way as ordinary generator functions respond to __call__, i.e. by
returning a generator-iterator.
Certain objects that wrap other callable objects, notably bound methods,
will be given __cocall__ implementations that delegate to the underlying
The full syntax of a cocall expression is described by the following
atom: cocall | <existing alternatives for atom>
cocall: 'cocall' atom cotrailer* '(' [arglist] ')'
cotrailer: '[' subscriptlist ']' | '.' NAME
Note that this syntax allows cocalls to methods and elements of sequences
or mappings to be expressed naturally. For example, the following are valid:
y = cocall self.foo(x)
y = cocall funcdict[key](x)
y = cocall a.b.c[i].d(x)
Also note that the final calling parentheses are mandatory, so that for example
the following is invalid syntax:
y = cocall f # INVALID
New builtins, attributes and C API functions
To facilitate interfacing cofunctions with non-coroutine code, there will
be a built-in function ``costart`` whose definition is equivalent to
def costart(obj, *args, **kwds):
m = obj.__cocall__
result = NotImplemented
result = m(*args, **kwds)
if result is NotImplemented:
raise TypeError("Object does not support cocall")
There will also be a corresponding C API function
PyObject *PyObject_CoCall(PyObject *obj, PyObject *args, PyObject *kwds)
It is left unspecified for now whether a cofunction is a distinct type
of object or, like a generator function, is simply a specially-marked
function instance. If the latter, a read-only boolean attribute
``__iscofunction__`` should be provided to allow testing whether a given
function object is a cofunction.
Motivation and Rationale
The ``yield from`` syntax is reasonably self-explanatory when used for the
purpose of delegating part of the work of a generator to another function. It
can also be used to good effect in the implementation of generator-based
coroutines, but it reads somewhat awkwardly when used for that purpose, and
tends to obscure the true intent of the code.
Furthermore, using generators as coroutines is somewhat error-prone. If one
forgets to use ``yield from`` when it should have been used, or uses it when it
shouldn't have, the symptoms that result can be extremely obscure and confusing.
Finally, sometimes there is a need for a function to be a coroutine even though
it does not yield anything, and in these cases it is necessary to resort to
kludges such as ``if 0: yield`` to force it to be a generator.
The ``cocall`` construct address the first issue by making the syntax directly
reflect the intent, that is, that the function being called forms part of a
The second issue is addressed by making it impossible to mix coroutine and
non-coroutine code in ways that don't make sense. If the rules are violated, an
exception is raised that points out exactly what and where the problem is.
Lastly, the need for dummy yields is eliminated by making it possible for a
cofunction to call both cofunctions and ordinary functions with the same syntax,
so that an ordinary function can be used in place of a cofunction that yields
Record of Discussion
An earlier version of this proposal required a special keyword ``codef`` to be
used in place of ``def`` when defining a cofunction, and disallowed calling an
ordinary function using ``cocall``. However, it became evident that these
features were not necessary, and the ``codef`` keyword was dropped in the
interests of minimising the number of new keywords required.
The use of a decorator instead of ``codef`` was also suggested, but the current
proposal makes this unnecessary as well.
It has been questioned whether some combination of decorators and functions
could be used instead of a dedicated ``cocall`` syntax. While this might be
possible, to achieve equivalent error-detecting power it would be necessary
to write cofunction calls as something like
yield from cocall(f)(args)
making them even more verbose and inelegant than an unadorned ``yield from``.
It is also not clear whether it is possible to achieve all of the benefits of
the cocall syntax using this kind of approach.
An implementation of an earlier version of this proposal in the form of patches
to Python 3.1.2 can be found here:
If this version of the proposal is received favourably, the implementation will
be updated to match.
This document has been placed in the public domain.
On Wed, Apr 25, 2012 at 2:35 PM, Matt Joiner <anacrolix(a)gmail.com> wrote:
> If this is to be done I'd like to see all special methods supported. One of
> particular interest to modules is __getattr__...
For What It's Worth, supporting __setattr__ and __getattr__ is one of
the few reasons that I have considered subclassing modules.
The workarounds of either offering public set_varX and get_varX
functions, or moving configuration to a separate singleton, just feel
Since those module methods would be defined at the far left, I don't
think it would mess up understanding any more than they already do on
regular classes. (There is always *some* surprise, just because they
That said, I personally tend to view modules as a special case of
classes, so I wouldn't be shocked if others found it more confusing
than I would -- particularly as to whether or not the module's
__getattr__ would somehow affect the lookup chain for classes defined
within the module.
I've written up a PEP for the sys.implementation idea. Feedback is welcome!
You'll notice some gaps which I'll be working on to fill in over the
next couple days. Don't mind the gaps. <wink> They are in less
critical (?) portions and I wanted to get this out to you before the
Title: Adding sys.implementation
Author: Eric Snow <ericsnowcurrently(a)gmail.com>
Type: Standards Track
This PEP introduces a new variable for the sys module: ``sys.implementation``.
The variable holds consolidated information about the implementation of
the running interpreter. Thus ``sys.implementation`` is the source to
which the standard library may look for implementation-specific
The proposal in this PEP is in line with a broader emphasis on making
Python friendlier to alternate implementations. It describes the new
variable and the constraints on what that variable contains. The PEP
also explains some immediate use cases for ``sys.implementation``.
For a number of years now, the distinction between Python-the-language
and CPython (the reference implementation) has been growing. Most of
this change is due to the emergence of Jython, IronPython, and PyPy as
viable alternate implementations of Python.
Consider, however, the nearly two decades of CPython-centric Python
(i.e. most of its existance). That focus had understandably contributed
to quite a few CPython-specific artifacts both in the standard library
and exposed in the interpreter. Though the core developers have made an
effort in recent years to address this, quite a few of the artifacts
Part of the solution is presented in this PEP: a single namespace on
which to consolidate implementation specifics. This will help focus
efforts to differentiate the implementation specifics from the language.
Additionally, it will foster a multiple-implementation mindset.
We will add ``sys.implementation``, in the sys module, as a namespace to
contain implementation-specific information.
The contents of this namespace will remain fixed during interpreter
execution and through the course of an implementation version. This
ensures behaviors don't change between versions which depend on variables
``sys.implementation`` is a dictionary, as opposed to any form of "named"
tuple (a la ``sys.version_info``). This is partly because it doesn't
have meaning as a sequence, and partly because it's a potentially more
variable data structure.
The namespace will contain at least the variables described in the
`Required Variables`_ section below. However, implementations are free
to add other implementation information there. Some possible extra
variables are described in the `Other Possible Variables`_ section.
This proposal takes a conservative approach in requiring only two
variables. As more become appropriate, they may be added with discretion.
These are variables in ``sys.implementation`` on which the standard
library would rely, meaning they would need to be defined:
the name of the implementation (case sensitive).
the version of the implementation, as opposed to the version of the
language it implements. This would use a standard format, similar to
``sys.version_info`` (see `Version Format`_).
Other Possible Variables
These variables could be useful, but don't necessarily have a clear use
a string used for the PEP 3147 cache tag (e.g. 'cpython33' for
CPython 3.3). The name and version from above could be used to
compose this, though an implementation may want something else.
However, module caching is not a requirement of implementations, nor
is the use of cache tags.
the implementation's repository URL.
the revision identifier for the implementation.
identifies the tools used to build the interpreter.
url (or website)
the URL of the implementation's site.
the preferred site prefix for this implementation.
the run-time environment in which the interpreter is running.
the type of garbage collection used.
XXX same as sys.version_info?
The status quo for implementation-specific information gives us that
information in a more fragile, harder to maintain way. It's spread out
over different modules or inferred from other information, as we see with
This PEP is the main alternative to that approach. It consolidates the
implementation-specific information into a single namespace and makes
explicit that which was implicit.
With the single-namespace-under-sys so straightforward, no alternatives
have been considered for this PEP.
The topic of ``sys.implementation`` came up on the python-ideas list in
2009, where the reception was broadly positive _. I revived the
discussion recently while working on a pure-python ``imp.get_tag()`` _.
The messages in `issue 14673`_ are also relevant.
"explicit is better than implicit"
The platform module guesses the python implementation by looking for
clues in a couple different sys variables _. However, this approach
is fragile. Beyond that, it's limited to those implementations that core
developers have blessed by special-casing them in the platform module.
With ``sys.implementation` the various implementations would *explicitly*
set the values in their own version of the sys module.
Aside from the guessing, another concern is that the platform module is
part of the stdlib, which ideally would minimize implementation details
such as would be moved to ``sys.implementation``.
Any overlap between ``sys.implementation`` and the platform module would
simply defer to ``sys.implementation`` (with the same interface in
platform wrapping it).
Cache Tag Generation in Frozen Importlib
PEP 3147 defined the use of a module cache and cache tags for file names.
The importlib bootstrap code, frozen into the Python binary as of 3.3,
uses the cache tags during the import process. Part of the project to
bootstrap importlib has been to clean out of Lib/import.c any code that
did not need to be there.
The cache tag defined in Lib/import.c was hard-coded to
``"cpython" MAJOR MINOR`` _. For importlib the options are either
hard-coding it in the same way, or guessing the implementation in the
same way as does ``platform.python_implementation()``.
As long as the hard-coded tag is limited to CPython-specific code, it's
livable. However, inasmuch as other Python implementations use the
importlib code to work with the module cache, a hard-coded tag would
become a problem..
Directly using the platform module in this case is a non-starter. Any
module used in the importlib bootstrap must be built-in or frozen,
neither of which apply to the platform module. This is the point that
led to the recent interest in ``sys.implementation``.
Regardless of how the implementation name is gotten, the version to use
for the cache tag is more likely to be the implementation version rather
than the language version. That implementation version is not readily
identified anywhere in the standard library.
Jython's ``os.name`` Hack
Impact on CPython
Feedback From Other Python Implementators
XXX PEP 3139
XXX PEP 399
* What are the long-term objectives for sys.implementation?
- pull in implementation detail from the main sys namespace and
elsewhere (PEP 3137 lite).
* Alternatives to the approach dictated by this PEP?
* ``sys.implementation`` as a proper namespace rather than a dict. It
would be it's own module or an instance of a concrete class.
The implementatation of this PEP is covered in `issue 14673`_.
.. _issue 14673
This document has been placed in the public domain.
While I really like the argparse module, I've run into a case I think
it ought to handle that it doesn't.
So I'm asking here to see if 1) I've overlooked something, and it can
do this, or 2) there's a good reason for it not to do this or maybe 3)
this is a bad idea.
The usage I ran into looks like this:
If I provide the argument, everything works fine, and it opens the
named file for me. If I don't, parser.configfile is set to the string,
which doesn't work very well when I try to use it's read method.
Unfortunately, setting default to open('/my/default/config') has the
side affect of opening the file. Or raising an exception if the file
doesn't exist (which is a common reason for wanting to provide an
Could default handling could be made smarter, and if 1) type is set
and 2) the value of default is a string, call pass the value of
default to type? Or maybe a flag to make that happen, or even a
default_factory argument (incompatible with default) that would accept
something like default_factory=lambda: open('/my/default/config')?
Mike Meyer <mwm(a)mired.org> http://www.mired.org/
Independent Software developer/SCM consultant, email for more information.
O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
Formatted and finished Rebert's solution to this issue
But the question of where to put it is still open ( shutil.open vs.
shutil.launch vs. os.startfile ):
1. `shutil.open()` will break anyone that does `from shutil import *` or
edits the shutil.py file and tries to use the builtin open() after the
2. `shutil.launch()` is better than shutil.open() due to reduced breakage,
but not as simple or DRY or reverse-compatible as putting it in
os.startfile() in my mind. This fix just implements the functionality of
os.startfile() for non-Windows OSes.
3. `shutil.startfile()` was recommended against by a developer or two on
this mailing list, but seems appropriate to me. The only upstream
"breakage" for an os.startfile() location that I can think of is the
failure to raise exceptions on non-Windows OSes. Any legacy (<3.0) code
that relies on os.startfile() exceptions in order to detect a non-windows
OS is misguided and needs re-factoring anyway, IMHO. Though their only
indication of a "problem" in their code would be the successful launching
of a viewer for whatever path they pointed to...
4. `os.launch()` anyone? Not me.
On Mon, Apr 23, 2012 at 6:00 PM, <python-ideas-request(a)python.org> wrote:
> Send Python-ideas mailing list submissions to
> To subscribe or unsubscribe via the World Wide Web, visit
> or, via email, send a message with subject or body 'help' to
> You can reach the person managing the list at
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Python-ideas digest..."
> Today's Topics:
> 1. Anyone working on a platform-agnostic os.startfile() (Hobson Lane)
> 2. Re: Anyone working on a platform-agnostic os.startfile()
> (Chris Rebert)
> Message: 1
> Date: Mon, 23 Apr 2012 13:21:10 +0800
> From: Hobson Lane <hobsonlane(a)gmail.com>
> To: python-ideas(a)python.org
> Cc: Hobson's Totalgood Aliases <knowledge(a)totalgood.com>
> Subject: [Python-ideas] Anyone working on a platform-agnostic
> Content-Type: text/plain; charset="iso-8859-1"
> There is significant interest in a cross-platform
> file-launcher. The ideal implementation would be
> an operating-system-agnostic interface that launches a file for editing or
> viewing, similar to the way os.startfile() works for Windows, but
> generalized to allow caller-specification of view vs. edit preference and
> support all registered os.name operating systems, not just 'nt'.
> Mercurial has a mature python implementation for cross-platform launching
> of an editor (either GUI editor or terminal-based editor like vi).
> The python std lib os.startfile obviously works for Windows.
> The Mercurial functionality could be rolled into os.startfile() with
> additional named parameters for edit or view preference and gui or non-gui
> preference. Perhaps that would enable backporting belwo Python 3.x. Or is
> there a better place to incorporate this multi-platform file launching
> : http://selenic.com/repo/hg-stable/file/2770d03ae49f/mercurial/ui.py