At the moment, the array module of the standard library allows to
create arrays of different numeric types and to initialize them from
an iterable (eg, another array).
What's missing is the possiblity to specify the final size of the
array (number of items), especially for large arrays.
I'm thinking of suffix arrays (a text indexing data structure) for
large texts, eg the human genome and its reverse complement (about 6
billion characters from the alphabet ACGT).
The suffix array is a long int array of the same size (8 bytes per
number, so it occupies about 48 GB memory).
At the moment I am extending an array in chunks of several million
items at a time at a time, which is slow and not elegant.
The function below also initializes each item in the array to a given
value (0 by default).
Is there a reason why there the array.array constructor does not allow
to simply specify the number of items that should be allocated? (I do
not really care about the contents.)
Would this be a worthwhile addition to / modification of the array module?
My suggestions is to modify array generation in such a way that you
could pass an iterator (as now) as second argument, but if you pass a
single integer value, it should be treated as the number of items to
Here is my current workaround (which is slow):
def filled_array(typecode, n, value=0, bsize=(1<<22)):
"""returns a new array with given typecode
(eg, "l" for long int, as in the array module)
with n entries, initialized to the given value (default 0)
a = array.array(typecode, [value]*bsize)
x = array.array(typecode)
r = n
while r >= bsize:
r -= bsize
I think it would be a good idea if Python tracebacks could be translated
into languages other than English - and it would set a good example.
For example, using French as my default local language, instead of
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: integer division or modulo by zero
I might get something like
Suivi d'erreur (appel le plus récent en dernier) :
Fichier "<stdin>", à la ligne 1, dans <module>
ZeroDivisionError: division entière ou modulo par zéro
Greg Ewing wrote:
> Mark Shannon wrote:
>> Why not have proper co-routines, instead of hacked-up generators?
> What do you mean by a "proper coroutine"?
A parallel, non-concurrent, thread of execution.
It should be able to transfer control from arbitrary places in
execution, not within generators.
Stackless provides coroutines. Greenlets are also coroutines (I think).
Lua has them, and is implemented in ANSI C, so it can be done portably.
(One of the examples in the paper uses coroutines to implement
generators, which is obviously not required in Python :) )
Here's an updated version of the PEP reflecting my
recent suggestions on how to eliminate 'codef'.
Author: Gregory Ewing <greg.ewing(a)canterbury.ac.nz>
Type: Standards Track
A syntax is proposed for defining and calling a special type of generator
called a 'cofunction'. It is designed to provide a streamlined way of
writing generator-based coroutines, and allow the early detection of
certain kinds of error that are easily made when writing such code, which
otherwise tend to cause hard-to-diagnose symptoms.
This proposal builds on the 'yield from' mechanism described in PEP 380,
and describes some of the semantics of cofunctions in terms of it. However,
it would be possible to define and implement cofunctions independently of
PEP 380 if so desired.
A cofunction is a special kind of generator, distinguished by the presence
of the keyword ``cocall`` (defined below) at least once in its body. It may
also contain ``yield`` and/or ``yield from`` expressions, which behave as
they do in other generators.
From the outside, the distinguishing feature of a cofunction is that it cannot
be called the same way as an ordinary function. An exception is raised if an
ordinary call to a cofunction is attempted.
Calls from one cofunction to another are made by marking the call with
a new keyword ``cocall``. The expression
cocall f(*args, **kwds)
is evaluated by first checking whether the object ``f`` implements
a ``__cocall__`` method. If it does, the cocall expression is
yield from f.__cocall__(*args, **kwds)
except that the object returned by __cocall__ is expected to be an
iterator, so the step of calling iter() on it is skipped.
If ``f`` does not have a ``__cocall__`` method, or the ``__cocall__``
method returns ``NotImplemented``, then the cocall expression is
treated as an ordinary call, and the ``__call__`` method of ``f``
Objects which implement __cocall__ are expected to return an object
obeying the iterator protocol. Cofunctions respond to __cocall__ the
same way as ordinary generator functions respond to __call__, i.e. by
returning a generator-iterator.
Certain objects that wrap other callable objects, notably bound methods,
will be given __cocall__ implementations that delegate to the underlying
The full syntax of a cocall expression is described by the following
atom: cocall | <existing alternatives for atom>
cocall: 'cocall' atom cotrailer* '(' [arglist] ')'
cotrailer: '[' subscriptlist ']' | '.' NAME
Note that this syntax allows cocalls to methods and elements of sequences
or mappings to be expressed naturally. For example, the following are valid:
y = cocall self.foo(x)
y = cocall funcdict[key](x)
y = cocall a.b.c[i].d(x)
Also note that the final calling parentheses are mandatory, so that for example
the following is invalid syntax:
y = cocall f # INVALID
New builtins, attributes and C API functions
To facilitate interfacing cofunctions with non-coroutine code, there will
be a built-in function ``costart`` whose definition is equivalent to
def costart(obj, *args, **kwds):
m = obj.__cocall__
result = NotImplemented
result = m(*args, **kwds)
if result is NotImplemented:
raise TypeError("Object does not support cocall")
There will also be a corresponding C API function
PyObject *PyObject_CoCall(PyObject *obj, PyObject *args, PyObject *kwds)
It is left unspecified for now whether a cofunction is a distinct type
of object or, like a generator function, is simply a specially-marked
function instance. If the latter, a read-only boolean attribute
``__iscofunction__`` should be provided to allow testing whether a given
function object is a cofunction.
Motivation and Rationale
The ``yield from`` syntax is reasonably self-explanatory when used for the
purpose of delegating part of the work of a generator to another function. It
can also be used to good effect in the implementation of generator-based
coroutines, but it reads somewhat awkwardly when used for that purpose, and
tends to obscure the true intent of the code.
Furthermore, using generators as coroutines is somewhat error-prone. If one
forgets to use ``yield from`` when it should have been used, or uses it when it
shouldn't have, the symptoms that result can be extremely obscure and confusing.
Finally, sometimes there is a need for a function to be a coroutine even though
it does not yield anything, and in these cases it is necessary to resort to
kludges such as ``if 0: yield`` to force it to be a generator.
The ``cocall`` construct address the first issue by making the syntax directly
reflect the intent, that is, that the function being called forms part of a
The second issue is addressed by making it impossible to mix coroutine and
non-coroutine code in ways that don't make sense. If the rules are violated, an
exception is raised that points out exactly what and where the problem is.
Lastly, the need for dummy yields is eliminated by making it possible for a
cofunction to call both cofunctions and ordinary functions with the same syntax,
so that an ordinary function can be used in place of a cofunction that yields
Record of Discussion
An earlier version of this proposal required a special keyword ``codef`` to be
used in place of ``def`` when defining a cofunction, and disallowed calling an
ordinary function using ``cocall``. However, it became evident that these
features were not necessary, and the ``codef`` keyword was dropped in the
interests of minimising the number of new keywords required.
The use of a decorator instead of ``codef`` was also suggested, but the current
proposal makes this unnecessary as well.
It has been questioned whether some combination of decorators and functions
could be used instead of a dedicated ``cocall`` syntax. While this might be
possible, to achieve equivalent error-detecting power it would be necessary
to write cofunction calls as something like
yield from cocall(f)(args)
making them even more verbose and inelegant than an unadorned ``yield from``.
It is also not clear whether it is possible to achieve all of the benefits of
the cocall syntax using this kind of approach.
An implementation of an earlier version of this proposal in the form of patches
to Python 3.1.2 can be found here:
If this version of the proposal is received favourably, the implementation will
be updated to match.
This document has been placed in the public domain.
(reposting this from Google Group once more as the previous post missed
Mailing List, because I was not subscribed in Mailman)
*Static module/package inspection*
- static: without execution (as opposed to dynamic)
- module/package: .py or __init__.py file
- inspection: get an overview of the contents
*What should this do?*
The proposal to add a mechanism to Python interpreter to get an outline of
module/package contents without importing or executing module/package. The
outline includes names of classes, functions, variables. It also should
contain values for variables that could be provided without sophisticated
calculations (e.g. a string, integer, but probably not expressions as it
may lead to security leaks).
*user story PEPx.001:*
As a Python package maintainer, I find it bothersome to repeatedly write
bolierplate code (e.g. setup.py) to package my single file module. The
reason I should write setup.py is to provide version and description info.
This info is already available in my module source code. So I need to
either copy/paste the info from the module manually, or to import (and
hence execute) my module during packaging and installation, which I don't
want either, because modules are often installed with root privileges.
With this PEP, packing tool will be able to extract meta information from
my module without executing it or without me manually copying version
fields into some 'package configuration file'.
*user story PEPx.002:*
As a Python Application developer, I find it really complicated to provide
plugin extension subsystem for my users. Users need a mechanism to switch
between different versions of the plugin, and this mechanism is usually
provided by external tool such as setuptools to manage and install multiple
versions of plugins in local Python package repository. It is rather hard
to create an alternative approach, because you are forced to maintain
external meta-data about your plugin modules even in case it is already
available inside the module.
With this PEP, Python Application will be able to inspect
meta-data embedded inside of plugins before choosing which version to load.
This will also provide a standard mechanism for applications to check
modules returned by packaging tools without executing them. This will
greatly simplify writing and debugging custom plugins loaders on different
At this stage I'd like to a community response to two separate questions:
1. If everybody feels this functionality will be useful for Python
2. If the solution is technically feasible
As you may know, the python-ideas list is opened only to subscribers. This
is inconvenient, because:
1. it requires three step subscription process
2. it is impossible to post a reply to existing thread/idea
There is a web-interface in Google Groups at
https://groups.google.com/forum/#!forum/python-ideas that can solve
problems above and provide some more nifty features such as embedded
search. But there comes another problem that messages posted through the
group doesn't end in list, because list requires subscription. I've
already tried to find a solution, but run out of time, so I summarized the
proposal at http://wiki.python.org/moin/MailmanWithGoogleGroups
I may or may not be able to publish outcomes of my research, so it would be
nice to get some help in investigating the problem and publishing a
solution on aforementioned wiki page. Thanks.
I had tough time finding a connection to internet :)
@Arnaud + Terry
Thx, code corrected to have the right terminology.
at least my code really test for the good rules :)
But being precise matter, so that I changed all the terms to fit in.
<< is a symbol indicating an asymetry for left and right value, and I
carefully tried to have a symetrical behaviour with left & right :
associativity, distribution, commutativity.
The whole point about using a symbol, is that it follow rules.
That is the reason why consistent_addition class is all about testing
linear algebraic rules (hopefully the right way, and now, I hope with
the right terminology).
But I have reason to think that there are more than way to do
addition. Linear algebrae is the most used and intuitive one for non
developer, and developer think of sets operations as more intuitive.
I guess, it all come down to one point. Which is what is a dict as a
mathematical object. If there is more than one choice, which has the
most benefits. Should there be more than one definition ?
@Guido & Eric
In my book there are rules for sets.
The question is a dict the same as a vector or as a set ?
since sets are using logical operators for operations that are roughly
sets operations why not use & ^ | for sets operation on dict ?
it would be pretty consistent with sets operations.
( @jakkob )
In this case I would expect inconsistencies error when doing
dict( a = 1 ) | dict( a = 2 , b = 2 ) (thus key-> value are seen like
an atomic element of the set, unless a "value collision" operator is
given, and this collision operator would recursively and naturely
apply to the descendants)
because I expect both a | b == b | a and a + b = b + a
The problem would be indeed
list + list
vectorish behaviour is my prefered of course.
but if we shift the record algebrae addition to another symbol (let's
assume << ) we resolve the conflict. But then we have a problem of
We then may be tempted to use & | ^ (since it would be used for dict)
and then we have a problem :
should [ 1 , 2 ] & [ 3 , 2 ] mean set addition or applying & to all
the element of the same rank ?
I'm pretty stucked myself in trying to be consistent.
Just for the record, why dit you not use "." (dot) for concatenation ?
I know it is typographically unreadable on a messy screen, but is
there a better reason ?
@terry & Nathan Regarding Counter.
you may notice that I do have all the property of counter, but counter
does not as -this dict- aggregate values and (sub) totals at the same
It is very convenient for map reduce since it does some re-reduce
operations at reduce time. smart, no ? <:o)
Sorry for making assertive and carefree statements regarding strong
resentment. My bad.
truth is aim at similarity cosinus for dict, and I imagine a map
reduce on dict representing values of a model you want ideal = dict(
blue = 1 , height = 180, weight = 70, wage = 500 )
in the filter before the map reduce you would want to filter all
records close to this one by using similarity cosinus in this way :
filter( lambda model : cos( ideal, model) > .7 , all_model )
of course I will try to advocate dot product, and metrics.
As a result my real goal is to make people consider dict as vectors of
path to value => value not sets of key => value
To solve issues such as weighted decisions and non linear choices
(based on trigger or a value belonging to a set) I can fairly easily
concevie a projector (which would be a class transforming a vector in
a vector, but not with a a matrix, but with computed rules ).
+1 are you telepath ?
Yes a key collision operator would be nice :
given for instance an apache log, I will have segments of path, and I
may want keys to be the accumulated graph of a user on a website. so
my collision rule might be :
(key being the referer value the page visited afterwards)
dict( a = b, a = c ) + dict( b = e ) = dict( a = dict( b = e ), a = c)
I thought of it, but I really loved the conservation rule. And I
feared people would think of it as too complicated. Keep It Simple I
Wrapping everything to logicial sense.
I was thinking of supersets of object (thus out of the stdlib) that
would have different algebrae and could be used as casts on object to
redifine + - * / cos & | ^
and for different objects would behave consistently ex :
vector( dict ) vector( list ) vector( string )
would make dict & list & string behave like vectors thus mainly
supporting elementwise operations
sets( dict ) , sets( string ) ...
would make dict string be sets of elements ... (with the original dict
addition design of GvR et al)
Each superset would have a unit test for the operation verifying that
behaviour is consistent. (associativity, distributivity, ...) what
about this solution ?
On Sat, Dec 31, 2011 at 2:16 AM, julien tayon <julien(a)tayon.net> wrote:
> Dear All :)
> 2011/12/30 Eric Snow <ericsnowcurrently(a)gmail.com>:
> > On Fri, Dec 30, 2011 at 10:02 AM, Guido van Rossum <guido(a)python.org>
> >> What I meant is similar to set union on the keys, where if a key exists
> >> both dicts, the value in the result is equal to one of the values in the
> >> operands (and if the value is the same for both operands, that value is
> >> the result value).
> > +1
> > This is the one I was thinking of too.
> Well, since I have coded way too much in Perl, my altered sense of
> reality has come to a concept I may be introducing too early which is
> : algebrae.
> strings, lists, ... have a record algebrae.
> ndarray, accudict have a linear algebrae
> sets ... have sets algebrae.
> And much more algebrae exists wich all exists not only in my
> imagination, but also in math (wich I quite dislike). (Abelian
> All of these algebrae are consistent as long as any object in the
> chain of algebrae are following the same rules.
> And each of these are very legitimate (even though of course my dict
> addition is the best without trying to be obnoxious).
> I was kind of thinking of
> 1) giving a property to object called .. __algebrae__,
> 2) and through some magic being able to change the algebrae of an
> object on the fly.
> My twisted sense of reality inherited from Perl (but a little less
> than my math books) tells me There Is More Than One Way To
> Consistently Add/Mul/Div/Sub It.
> As a Proof of Concept I could deliver a monkeypatching of list() that
> makes it behave like an numpy array.
Please don't present this in terms of modifications of existing
functions/types/methods. Please use subclasses, new modules, new functions,
> But, at first I wish to concentrate on dict addition, since I can only
> steal a few hours connectivity per day ... So I will try to answer to
> everyone since I saw some spoilers of what I had hidden in my mind :)
--Guido van Rossum (python.org/~guido)
Don't despair! I have tried to get people to warm up to dict addition too
-- in fact it was my counter-proposal at the time when we were considering
adding sets to the language. I will look at your proposal, but I have a
point of order first: this should be discussed on python-ideas, not on
python-dev. I have added python-ideas to the thread and moved python-dev to
Bcc, so followups will hopefully all go to python-ideas.
On Fri, Dec 30, 2011 at 7:26 AM, julien tayon <julien(a)tayon.net> wrote:
> Sorry to annoy the very busy core devs :) out of the blue
> I quite noticed people were
> 1) wanting to have a new dict for Xmas
> 2) strongly resenting dict addition.
> Even though I am not a good developper, I have come to a definition of
> addition that would follow algebraic rules, and not something of a
> dutch logic. (it is a jest, not a troll)
> I propose the following code to validate my point of view regarding
> the dictionnatry addition as a proof of concept :
> It follows all my dusty math books regarding addition + it has the
> amability to have rules of conservation.
> I pretty much see a real advantage in this behaviour in functional
> programming (map/reduce). (see the demonstrate.py), and it has a sense
> (if dict can be seen has vectors).
> I have been told to be a troll, but I am pretty serious.
> Since, I coded with luck, no internet, intuition, and a complete
> ignorance of the real meaning of the magic methods most of the time,
> thus the actual implementation of the addition surely needs a complete
> Bonne fêtes
> Python-Dev mailing list
--Guido van Rossum (python.org/~guido)
On Tuesday, December 27, 2011 10:53:56 PM UTC+3, RunThePun wrote:
> On Tue, Dec 27, 2011 at 7:01 PM, anatoly techtonik <tech...(a)gmail.com>wrote:
>> As you may know, the python-ideas list is opened only to subscribers.
>> This is inconvenient, because:
>> 1. it requires three step subscription process
>> 2. it is impossible to post a reply to existing thread/idea
>> There is a web-interface in Google Groups at https://groups.google.com/**
>> forum/#!forum/python-ideas<https://groups.google.com/forum/#!forum/python-ideas> that
>> can solve problems above and provide some more nifty features such as
>> embedded search. But there comes another problem that messages posted
>> through the group doesn't end in list, because list requires subscription.
>> I've already tried to find a solution, but run out of time, so I
>> summarized the proposal at
>> I may or may not be able to publish outcomes of my research, so it would
>> be nice to get some help in investigating the problem and publishing a
>> solution on aforementioned wiki page. Thanks.
> Concerning the search problem I've used google queries as such:
> list comprehensions site:
> I agree that having a "nosey" or "star" feature like issue trackers could
> be nice, though I'm not sure Google Groups is the most modern
> infrastructure to solve all our problems. I remember hearing that open
> google groups get a lot of spam for example.
Over last six months I found only 3 spam messages sent from
https://groups.google.com/forum/#!forum/python-ideas and for some reason I
think that my mailbox filter would be smart enough to put them into
appropriate folder even they came from Mailman.
Maybe mailman can be improved?
> eg would it help if the PyPI login cookie allowed you to post on
> mailman? If mailman allowed starring threads?
Certainly. But the threads should be stacked in different order that just
by month, because some threads can span over several months.