At the moment, the array module of the standard library allows to
create arrays of different numeric types and to initialize them from
an iterable (eg, another array).
What's missing is the possiblity to specify the final size of the
array (number of items), especially for large arrays.
I'm thinking of suffix arrays (a text indexing data structure) for
large texts, eg the human genome and its reverse complement (about 6
billion characters from the alphabet ACGT).
The suffix array is a long int array of the same size (8 bytes per
number, so it occupies about 48 GB memory).
At the moment I am extending an array in chunks of several million
items at a time at a time, which is slow and not elegant.
The function below also initializes each item in the array to a given
value (0 by default).
Is there a reason why there the array.array constructor does not allow
to simply specify the number of items that should be allocated? (I do
not really care about the contents.)
Would this be a worthwhile addition to / modification of the array module?
My suggestions is to modify array generation in such a way that you
could pass an iterator (as now) as second argument, but if you pass a
single integer value, it should be treated as the number of items to
Here is my current workaround (which is slow):
def filled_array(typecode, n, value=0, bsize=(1<<22)):
"""returns a new array with given typecode
(eg, "l" for long int, as in the array module)
with n entries, initialized to the given value (default 0)
a = array.array(typecode, [value]*bsize)
x = array.array(typecode)
r = n
while r >= bsize:
r -= bsize