[Numpy-discussion] fromiter and objects

Tim Hochberg tim.hochberg at ieee.org
Sun Nov 19 13:01:57 EST 2006


I was looking at fromiter again today with an eye toward extending it to 
accept iterators of sequences instead of just iterators of scalars. For 
example:

    fromiter(([x, x // 2, x+5] for x in range(1000)), dtype=int)

This would result in a shape-(1000,3) array. At first glance at least, 
this looks straightforward: one would simply have to correctly deduce 
size of the sequence with the given dtype and I imagine that I can 
enlist existing numpy machinery to do this for me without a problem. But 
enough about that, I won't be able to try this till next week, and these 
things are often not as easy as they appear, the real reason I'm writing 
is this comment that explains why object arrays are disallowed in 
fromiter (multiarraymodule.c::PyArray_:FromIter)

            /* We would need to alter the memory RENEW code to decrement any
               reference counts before just throwing away the memory.
             */

This doesn't seem right. The array that we would be RENEWing is a bunch 
of PyObject*s. The reference counts don't reside there, but in the 
objects themselves. When we do the RENEW, we don't want the reference 
counts to change at all. The one tricky case is if we run out of memory, 
I'm not certain that the current setup correctly deals with reference 
counts in this case, although it appears likely that it should work 
since ret->data should still point to a valid chunk of memory and 
decreffing ret should result in the subsequent deallocation of all the 
stored objects.

So, it looks like objects should either just work, or should work with a 
minimal amount of tweaking. However, it's possible that I'm getting 
rusty at Python extension writing (or more to the point, reading). Does 
anyone remember if this check was added to address a specific problem? 
If so, do you also remember what it is? I suppose I can track back 
through the revision history if no one remembers, but I figured I'd try 
the lazy approach first and ask about it.

-tim




More information about the NumPy-Discussion mailing list