At the moment, the array module of the standard library allows to
create arrays of different numeric types and to initialize them from
an iterable (eg, another array).
What's missing is the possiblity to specify the final size of the
array (number of items), especially for large arrays.
I'm thinking of suffix arrays (a text indexing data structure) for
large texts, eg the human genome and its reverse complement (about 6
billion characters from the alphabet ACGT).
The suffix array is a long int array of the same size (8 bytes per
number, so it occupies about 48 GB memory).
At the moment I am extending an array in chunks of several million
items at a time at a time, which is slow and not elegant.
The function below also initializes each item in the array to a given
value (0 by default).
Is there a reason why there the array.array constructor does not allow
to simply specify the number of items that should be allocated? (I do
not really care about the contents.)
Would this be a worthwhile addition to / modification of the array module?
My suggestions is to modify array generation in such a way that you
could pass an iterator (as now) as second argument, but if you pass a
single integer value, it should be treated as the number of items to
Here is my current workaround (which is slow):
def filled_array(typecode, n, value=0, bsize=(1<<22)):
"""returns a new array with given typecode
(eg, "l" for long int, as in the array module)
with n entries, initialized to the given value (default 0)
a = array.array(typecode, [value]*bsize)
x = array.array(typecode)
r = n
while r >= bsize:
r -= bsize
I just spent a few minutes staring at a bug caused by a missing comma
-- I got a mysterious argument count error because instead of foo('a',
'b') I had written foo('a' 'b').
This is a fairly common mistake, and IIRC at Google we even had a lint
rule against this (there was also a Python dialect used for some
specific purpose where this was explicitly forbidden).
Now, with modern compiler technology, we can (and in fact do) evaluate
compile-time string literal concatenation with the '+' operator, so
there's really no reason to support 'a' 'b' any more. (The reason was
always rather flimsy; I copied it from C but the reason why it's
needed there doesn't really apply to Python, as it is mostly useful
Would it be reasonable to start deprecating this and eventually remove
it from the language?
--Guido van Rossum (python.org/~guido)
Stephen J. Turnbull wrote:
> Vernon D. Cole writes:
>> I cannot compile a Python extension module with any Microsoft compiler
>> I can obtain.
> Your pain is understood, but it's not simple to address it.
FWIW, I'm working on making the compiler easily obtainable. The VS 2008 link that was posted is unofficial, and could theoretically disappear at any time (I'm not in control of that), but the Windows SDK for Windows 7 and .NET 3.5 SP1 (http://www.microsoft.com/en-us/download/details.aspx?id=3138) should be around for as long as Windows 7 is supported. The correct compiler (VC9) is included in this SDK, but unfortunately does not install the vcvarsall.bat file that distutils expects. (Though it's pretty simple to add one that will switch on %1 and call the correct vcvars(86|64|...).bat.)
The SDK needed for Python 3.3 and 3.4 (VC10) is even worse - there are many files missing. I'm hoping we'll be able to set up some sort of downloadable package/tool that will fix this. While we'd obviously love to move CPython onto our latest compilers, it's simply not possible (for good reason). Python 3.4 is presumably locked to VC10, but hopefully 3.5 will be able to use whichever version is current when that decision is made.
> The basic problem is that the ABI changes. Therefore it's going to require
> a complete new set of *all* C extensions for Windows, and the duplication
> of download links for all those extensions from quite a few different vendors
> is likely to confuse a lot of users.
Specifically, the CRT changes. The CRT is an interesting mess of data structures that are exposed in header files, which means while you can have multiple CRTs loaded, they cannot touch each other's data structures at all or things will go bad/crash, and there's no nice way to set it up to avoid this (my colleague who currently owns MSVCRT suggested a not-very-nice way to do it, but I don't think it's going to be reliable enough). Python's stable ABI helps, but does not solve this problem.
The file APIs are the worst culprits. The layout of FILE* objects can and do change between CRT versions, and file descriptors are simply indices into an array of these objects that is exposed through macros rather than function calls. As a result, you cannot mix either FILE pointers or file descriptors between CRTs. The only safe option is to build with the matching CRT, and for MSVCRT, this means with the matching compiler. It's unfortunate, and the responsible teams are well aware of the limitation, but it's history at this point, so we have no choice but to work with it.
I guess this is so obvious that someone must have suggested it before:
in list comprehensions you can currently exclude items based on the if
[n for n in range(1,1000) if n % 4 == 0]
Why not extend this filtering by allowing a while statement in addition to
if, as in:
[n for n in range(1,1000) while n < 400]
Trivial effect, I agree, in this example since you could achieve the same by
using range(1,400), but I hope you get the point.
This intuitively understandable extension would provide a big speed-up for
sorted lists where processing all the input is unnecessary.
some_names=["Adam", "Andrew", "Arthur", "Bob", "Caroline","Lancelot"] #
a sorted list of names
[n for n in some_names if n.startswith("A")]
# certainly gives a list of all names starting with A, but .
[n for n in some_names while n.startswith("A")]
# would have saved two comparisons
On Mon, Dec 30, 2013 at 10:39 AM, anatoly techtonik <techtonik(a)gmail.com> wrote:
> Nothing personal, but that code is awful.
You're right, it's buggy because I didn't test it. Superfluous close
parenthesis there. But my point is that the architecture is right
there in the name, so you can save yourself the trouble and just use
what comes from uname directly.
It's a fairly standard pattern to see things like this:
foo = None
(and of course, variants with from...import et cetera). These can
potentially add a lot of clutter to the imports section of a file, given
that it requires 4 lines to do a conditional import.
It seems like it'd be useful and clean to have a syntax that looked like
maybe import foo
from bar maybe import baz
from qux maybe import quy as quz
Where the behavior would essentially be as above - attempt to run the
import normally, and in cases where the import fails, map the name to a
value of None instead. Users who want a different behavior are still free
to use the long-form syntax. A possibly variant might be to also only run
the import if the name isn't already bound, so that you could do something
from frobber_a maybe import frob as frobber
from frobbler_b maybe import frobble as frobber
from frobber_c maybe import frobit as frobber
...to potentially try different fallback options if the first choice for an
interface provider isn't available.
On Mon, Dec 30, 2013 at 10:57 AM, anatoly techtonik <techtonik(a)gmail.com> wrote:
> On Mon, Dec 30, 2013 at 2:42 AM, Chris Angelico <rosuav(a)gmail.com> wrote:
>> On Mon, Dec 30, 2013 at 10:39 AM, anatoly techtonik <techtonik(a)gmail.com> wrote:
>>> Nothing personal, but that code is awful.
>> You're right, it's buggy because I didn't test it. Superfluous close
>> parenthesis there. But my point is that the architecture is right
>> there in the name, so you can save yourself the trouble and just use
>> what comes from uname directly.
> I write code for aliens. They might be confused that x86 < x64.
Who cares? You're building up a URL and going and downloading it. The
user needn't even be aware of it. You're actually writing code for
I would very much appreciate your opinion on my proposal for improvement of
Comparing to other languages such as Scala and C#, Python’s futures
significantly fall behind in functionality especially in ability to chain
computations and compose different futures without blocking and waiting for
result. New packages continue to emerge (*asyncio*) which provide their own
futures implementation, making composition even more difficult.
Proposed improvement implements Scala-like Future as a monadic construct.
It allows performing multiple kinds of operations on Future’s result
without blocking, enabling reactive programming in Python. It implements
common pattern separating *Future* and *Promise* interface, making it very
easy for 3rd party systems to use futures in their API.
Please have a look at this PEP
and reference implementation <https://github.com/mikhtonyuk/rxpython> (as
I’m very interested in:
- How PEPable is this?
- What are your thoughts on backward compatibility (current implementation
does not sacrifice any design points for it, but better compatibility can
- Thoughts on Future-based APIs in other packages?