At the moment, the array module of the standard library allows to
create arrays of different numeric types and to initialize them from
an iterable (eg, another array).
What's missing is the possiblity to specify the final size of the
array (number of items), especially for large arrays.
I'm thinking of suffix arrays (a text indexing data structure) for
large texts, eg the human genome and its reverse complement (about 6
billion characters from the alphabet ACGT).
The suffix array is a long int array of the same size (8 bytes per
number, so it occupies about 48 GB memory).
At the moment I am extending an array in chunks of several million
items at a time at a time, which is slow and not elegant.
The function below also initializes each item in the array to a given
value (0 by default).
Is there a reason why there the array.array constructor does not allow
to simply specify the number of items that should be allocated? (I do
not really care about the contents.)
Would this be a worthwhile addition to / modification of the array module?
My suggestions is to modify array generation in such a way that you
could pass an iterator (as now) as second argument, but if you pass a
single integer value, it should be treated as the number of items to
Here is my current workaround (which is slow):
def filled_array(typecode, n, value=0, bsize=(1<<22)):
"""returns a new array with given typecode
(eg, "l" for long int, as in the array module)
with n entries, initialized to the given value (default 0)
a = array.array(typecode, [value]*bsize)
x = array.array(typecode)
r = n
while r >= bsize:
r -= bsize
I just spent a few minutes staring at a bug caused by a missing comma
-- I got a mysterious argument count error because instead of foo('a',
'b') I had written foo('a' 'b').
This is a fairly common mistake, and IIRC at Google we even had a lint
rule against this (there was also a Python dialect used for some
specific purpose where this was explicitly forbidden).
Now, with modern compiler technology, we can (and in fact do) evaluate
compile-time string literal concatenation with the '+' operator, so
there's really no reason to support 'a' 'b' any more. (The reason was
always rather flimsy; I copied it from C but the reason why it's
needed there doesn't really apply to Python, as it is mostly useful
Would it be reasonable to start deprecating this and eventually remove
it from the language?
--Guido van Rossum (python.org/~guido)
This idea is already casually mentioned, but sank deep into the threads
of the discussion. Raise it up.
Currently reprs of classes and functions look as:
<built-in method from_bytes of type object at 0x826cf60>
<built-in function open>
>>> import collections
<bound method Counter.fromkeys of <class 'collections.Counter'>>
<function namedtuple at 0xb6fc4adc>
What if change default reprs of classes and functions to just full
qualified name __module__ + '.' + __qualname__ (or just __qualname__ if
__module__ is builtins)? This will look more neatly. And such reprs are
Looking at pep 492, it seems to me the handling of "for" loops has use
outside of just asyncio. The primary use-case I can think of is
multiprocessing and multithreading.
For example, you could create a multiprocessing pool, and let the pool
handle the items in a "for" loop, like so:
from multiprocessing import Pool
mypool = Pool(10, maxtasksperchild=2)
mypool for item in items:
Or something similar with third-party modules:
from greenlet import greenlet
greenlet for item in items:
Of course this sort of thing is possible with iterators and maps today, but
I think a lot of the same advantages that apply to asyncio also apply to
these sorts of cases. So I think that, rather than having a special
keyword just for asyncio, I think it would be better to have a more
flexible approach. Perhaps something like a "__for__" magic method that
lets a class implement "for" loop handling, along with the corresponding
changes in how the language processes the "for" loop.
Looking through PEP 492 I dislike the terminology.
With generators I can do:
>>> def f(): yield
>>> g = f()
<generator object f at 0x7fb81dadc798>
So f is a generator function and g is generator. That's all pretty
clear and "generator" is IMO a great name. The term "generator
function" is also unambiguous for f.
With PEP 492 it seems that I would get something like:
>>> async def af(): pass
>>> ag = af()
<coroutine_object object af at 0x7fb81dadc828>
According to the terminology in the PEP af is a coroutine but since
the word coroutine also refers to some generators in a looser sense I
should use "native coroutine" to disambiguate. ag is a "coroutine
object" but again I should use "native coroutine object" to explicitly
name the type because without the word "native" it also refers to
I think that if the terminology is already ambiguous with related
language constructs before it's introduced then it would be better to
change it now. The word coroutine (without "native" modifier) is used
by the PEP to refer to both generator based coroutines and the new
async def functions. I think it's reasonable to use the word as a
generic term in this sense. Python's idea of coroutines (both types)
doesn't seem to match with the pre-existing general definitions but
they are at least generalisations of functions so it can be reasonable
to describe them that way loosely.
Greg's suggestion to call an async def function (af above) an "async
function" seems like a big improvement. It clearly labels the purpose:
a function for use in a asynchronous execution probably with the
asyncio module. It also matches directly with the syntax: a function
prefixed with the word "async". There would be no ambiguity between
which of af or ag is referred to by the term.
It seems harder to think of a good name for ag though. ag is a wrapper
around a suspended call stack with methods to resume the stack so
describing what is is doesn't lead to anything helpful. OTOH the
purpose of ag is often described as implementing a "minithread" so the
word "minithread" makes intuitive sense to me. That way async code is
done by writing async functions that return minithreads. An event loop
schedules the minthreads etc. (I'm trying to imagine explaining this
I'm not sure what the best word is for a coroutine object but the
current terminology clearly has room for improvement. For a start
using the word "object" in the name of a type is a bit rubbish in a
language where everything is an object. Worse the PEP is reusing words
that have already been used with different meanings so that it's
already ambiguous. A concrete builtin language type such as the
coroutine object deserves a good, short, one-word name that will not
be confused with other things.
On Mon, Mar 23, 2015 at 2:08 AM, anatoly techtonik <techtonik(a)gmail.com>
> That's nice to know, but IIRC datetime is from the top 10 Python
> modules that need a redesign. Things contained therein doesn't pass
> human usability check, and are not used as a result.
Where have you been when PEP 3108 was discussed? I have not seen any other
list of Python modules that needed a redesign, so I cannot tell what's on
your top ten list.
Speaking of the datetime module, in what sense does it not "pass human
usability check"? It does have a few quirks, for example I would rather
see date accept a single argument in the constructor which may be a string,
another date or a tuple, but I am not even sure this desire is shared by
many other humans. It would be nice if datetime classes were named in
CamelCase according to PEP 8 conventions, but again this is a very minor
In my view, if anyone is to blame for the "human usability" of the datetime
module, it would be Pope Gregory XIII, Benjamin Franklin and scores of
unnamed astronomers who made modern timekeeping such a mess.
Well, after a few days no-one has responded to my post on another
thread about this , but the more I thought about it the more this
seemed like a good idea, so I wrote up a little more-formal proposal
(attached) for letting context managers react to 'yield's that occur
within their 'with' block.
This should in many ways be treated as a straw man proposal -- there's
tons I don't know about how async code is written in Python these days
-- but it seems like a good idea to me and I'd like to hear what
everyone else thinks :-).
Nathaniel J. Smith -- http://vorpus.org
Here is an idea that perhaps will help to prepare Python 2 code for
converting to Python 3.
Currently bytes is just an alias of str in Python 2, and the "b" prefix
of string literals is ignored. There are no differences between natural
strings and bytes. I propose to add special bit to str instances and set
it for bytes literals and strings created from binary sources (read from
binary files, received from sockets, the result of unicode.encode() and
struct.pack(), etc). With -3 flag operations with binary strings that
doesn't allowed for bytes in Python 3 (e.g. encoding or coercing to
unicode) will emit a warning. Unfortunately we can't change the bytes
constructor in minor version, it should left an alias to str in 2.7. So
the result of bytes() will be not tagged as binary string.
May be it is too late for this.
On Sat, Apr 18, 2015 at 12:28 AM, Yury Selivanov
> Hello python-ideas,
> Here's my proposal to add async/await in Python.
> I believe that PEPs 380 and 3156 were a major breakthrough for Python 3,
I am also interested in this topic --- from the other side.
As a teacher of python it is my finding that the
terminology/documentation around generators is rather chaotic and
bar = foo()
what do we call foo and what do we call bar?
It is suggested that foo is "generator-function" and bar is "generator-object"
Unfortunately python does not aid this distinction; witness
>>> def foo():
... yield 1
>>> bar = foo()
I asked about this on the python list
And it seems that many more dark corners emerged in the docs on this
subject in that discussion.
Should I start a separate thread?
I propose adding the ability to compare range objects using methods (e.g.
range.issubrange) and/or regular operators. Example:
In : range(0, 10, 3).issubrange(range(10))
In : range(0, 10, 3) <= range(10)
In : range(10) <= range(0, 10, 3)
I'll write a patch if you decide that this idea is worth implementing.