At the moment, the array module of the standard library allows to
create arrays of different numeric types and to initialize them from
an iterable (eg, another array).
What's missing is the possiblity to specify the final size of the
array (number of items), especially for large arrays.
I'm thinking of suffix arrays (a text indexing data structure) for
large texts, eg the human genome and its reverse complement (about 6
billion characters from the alphabet ACGT).
The suffix array is a long int array of the same size (8 bytes per
number, so it occupies about 48 GB memory).
At the moment I am extending an array in chunks of several million
items at a time at a time, which is slow and not elegant.
The function below also initializes each item in the array to a given
value (0 by default).
Is there a reason why there the array.array constructor does not allow
to simply specify the number of items that should be allocated? (I do
not really care about the contents.)
Would this be a worthwhile addition to / modification of the array module?
My suggestions is to modify array generation in such a way that you
could pass an iterator (as now) as second argument, but if you pass a
single integer value, it should be treated as the number of items to
Here is my current workaround (which is slow):
def filled_array(typecode, n, value=0, bsize=(1<<22)):
"""returns a new array with given typecode
(eg, "l" for long int, as in the array module)
with n entries, initialized to the given value (default 0)
a = array.array(typecode, [value]*bsize)
x = array.array(typecode)
r = n
while r >= bsize:
r -= bsize
I just spent a few minutes staring at a bug caused by a missing comma
-- I got a mysterious argument count error because instead of foo('a',
'b') I had written foo('a' 'b').
This is a fairly common mistake, and IIRC at Google we even had a lint
rule against this (there was also a Python dialect used for some
specific purpose where this was explicitly forbidden).
Now, with modern compiler technology, we can (and in fact do) evaluate
compile-time string literal concatenation with the '+' operator, so
there's really no reason to support 'a' 'b' any more. (The reason was
always rather flimsy; I copied it from C but the reason why it's
needed there doesn't really apply to Python, as it is mostly useful
Would it be reasonable to start deprecating this and eventually remove
it from the language?
--Guido van Rossum (python.org/~guido)
This idea is already casually mentioned, but sank deep into the threads
of the discussion. Raise it up.
Currently reprs of classes and functions look as:
<built-in method from_bytes of type object at 0x826cf60>
<built-in function open>
>>> import collections
<bound method Counter.fromkeys of <class 'collections.Counter'>>
<function namedtuple at 0xb6fc4adc>
What if change default reprs of classes and functions to just full
qualified name __module__ + '.' + __qualname__ (or just __qualname__ if
__module__ is builtins)? This will look more neatly. And such reprs are
On Mon, Mar 23, 2015 at 2:08 AM, anatoly techtonik <techtonik(a)gmail.com>
> That's nice to know, but IIRC datetime is from the top 10 Python
> modules that need a redesign. Things contained therein doesn't pass
> human usability check, and are not used as a result.
Where have you been when PEP 3108 was discussed? I have not seen any other
list of Python modules that needed a redesign, so I cannot tell what's on
your top ten list.
Speaking of the datetime module, in what sense does it not "pass human
usability check"? It does have a few quirks, for example I would rather
see date accept a single argument in the constructor which may be a string,
another date or a tuple, but I am not even sure this desire is shared by
many other humans. It would be nice if datetime classes were named in
CamelCase according to PEP 8 conventions, but again this is a very minor
In my view, if anyone is to blame for the "human usability" of the datetime
module, it would be Pope Gregory XIII, Benjamin Franklin and scores of
unnamed astronomers who made modern timekeeping such a mess.
recently I posted PEP 487, a simpler customization of class creation.
For newcomers: I propose the introduction of a __init_subclass__ classmethod
which initializes subclasses of a class, simplifying what metaclasses can
It took me a while to digest all the ideas from the list here, but well, we're
not in a hurry. So, I updated PEP 487, and pushed the changes to github
I applied the following changes:
PEP 487 contained the possibility to set a namespace for subclasses.
The most important usecase for this feature would be to have an
OrderedDict as the class definition namespace. As Eric Snow pointed
out, that will be soon standard anyways, so I took out this feature from
the PEP. The implementation on PyPI now just uses an OrderedDict
as a namespace, anticipating the expected changes to CPython.
I also did some reading on possible usecases for PEP 487, so that it
actually may be used by someone. Some Traits-like thing is a standard
usecase, so I looked especially at IPython's traitlets, which are a simple
example of that usecase.
Currently traitlets use both __new__ and __init__ of a metaclass. So I
tried to also introduce a __new_subclass__ the same way I introduced
__init_subclass__. This turned out much harder than I thought, actually
impossible, because it is type.__new__ that sets the method resolution
order, so making super() work in a __new_subclass__ hook is a
chicken-egg problem: we need the MRO to find the next base class to
call, but the most basic base class is the one creating the MRO. Nick,
how did you solve that problem in PEP 422?
Anyhow, I think that traitlets can also be written just using
__init_subclass__. There is just this weird hint in the docs that you should
use __new__ for metaclasses, not __init__, a hint I never understood
as the reasons when to use __new__ or __init__ are precisely the same
for normal classes and metaclasses. So I think we don't miss out much
when not having __new_subclass__.
I also updated the implementation of PEP 487, it's still at
I want Python to have macros. This is obviously a hard sell. I'm willing
to do some legwork to demonstrate value.
What would a good proposal include? Are there good examples of failed
proposals on this topic?
Is the core team open to this topic?
Thank you for your time,
- Mathew Rocklin
Sometimes we need a simple class to hold some mutable attributes,
provide a nice repr, support == for testing, and support iterable
unpacking, so you can write:
>>> p = Point(3, 4)
>>> x, y = p
That's very much like the classes built by namedtuple, but mutable.
I propose we add to the collections module another class factory. I am
calling it plainclass, but perhaps we can think of a better name. Here
is how it would be used:
>>> import collections
>>> Point = collections.plainclass('Point', 'x y')
The signature of the plainclass function would be exactly the same as
namedtuple, supporting the same alternative ways of naming the
The semantics of the generated Point class would be like this code:
What do you think?
PS. I am aware that there are "Namespace" classes in the standard
library (e.g. ). They solve a different problem.
| Author of Fluent Python (O'Reilly, 2015)
| Professor em: http://python.pro.br
| Twitter: @ramalhoorg
(Sorry for the manual thread continuation, I wasn't previously on the
I presume 'support' could and would include a version of given() that
> accessed .__annotations__.
This is on my list of planned features independently of PEP 484 FWIW. It's
on the long list of nice to have but minor things that didn't make it into
One of the problems is that I am for the moment somewhat committed to
supporting Python 2.7 as well as Python 3 (I really wish I wasn't), or I
would have probably built Hypothesis on using annotations from the
To a degree though it's fortunate that I didn't given PEP 484. It doesn't
look like I would be able to build the full range of features Hypothesis
needs for @given using only type annotations, and 484 suggests the goal is
to move to those being the only valid ones.
The reason for this is that Hypothesis deliberately lets you mix strategies
in with the specifiers used for annotation. So for example you can do
Palindromes = strategy(str).map(lambda x: x + str(reversed(x)))
This is now a SearchStrategy object that still returns strings, but only
ones that are palindromes.
You can now do something like
# I'm not actually sure what interesting things you can say about a list
So the point is that you can use a SearchStrategy deep inside the "type"
signatures that Hypothesis uses, despite SearchStrategy not actually being
at all type-like.
(It actually used to be the case that an earlier version of SearchStrategy
allowed you to test whether a value could have come from that strategy,
which would have made them a bit more "type-like", but this proved to be a
bad idea and the work that let me remove it was one of the best decisions I
made in the design of Hypothesis)
So basically I would fully intend for Hypothesis to become a consumer of
types and annotations, but I don't think they can be used as a primary
building block of it.
(Note: I don't consider this a problem with PEP 484, which looks like a
good proposal. Hypothesis's requirements are just weird and specialized and
don't quite match type checking)
> Of course, if every 'test_xyz' were to be
> decorated with the same decorator, then the decorator could be omitted
> and the fuzzing moved into the test runner.
I'd be moderately against this being the primary mode of using Hypothesis,
even without the above caveats. One of the things I like about Hypothesis
right now is that because it just gives you Python functions it can be very
unopinionated about what you're using to run your tests (even if in
practice basically everyone is using unittest or pytest). However it
should be relatively easy to support alongside once I've done some much
needed refactoring of the core example exploration code.