[Numpy-discussion] fromiter

Sat Jun 10 17:42:03 EDT 2006

On Sat, Jun 10, 2006 at 01:18:05PM -0700, Tim Hochberg wrote:
> 
> I finally got around to cleaning up and checking in fromiter. As Travis 
> suggested, this version does not require that you specify count. From 
> the docstring:
> 
>     fromiter(...)
>         fromiter(iterable, dtype, count=-1) returns a new 1d array
>     initialized from iterable. If count is nonegative, the new array
>     will have count elements, otherwise it's size is determined by the
>     generator.
> 
> If count is specified, it allocates the full array ahead of time. If it 
> is not, it periodically reallocates space for the array, allocating 50% 
> extra space each time and reallocating back to the final size at the end 
> (to give realloc a chance to reclaim any extra space).
> 
> Speedwise, "fromiter(iterable, dtype, count)" is about twice as fast as 
> "array(list(iterable),dtype=dtype)". Omitting count slows things down by 
> about 15%; still much faster than using "array(list(...))".  It also is 
> going to chew up more memory than if you include count, at least 
> temporarily, but still should typically use much less than the 
> "array(list(...))" approach.

Can this be integrated into array() so that array(iterable, dtype=dtype)
does the expected thing?

Can you try to find the length of the iterable, with PySequence_Size() on
the original object? This gets a bit iffy, as that might not be correct
(but it could be used as a hint).

What about iterables that return, say, tuples? Maybe add a shape argument,
so that fromiter(iterable, dtype, count, shape=(None, 3)) expects elements
from iterable that can be turned into arrays of shape (3,)? That could
replace count, too.

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke                      http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca