numpy.array does not take generators
Hi All, I want to construct a numpy array based on Python objects. In the below code, opts is a list of tuples. For example, opts=[ ('C', 100, 3, 'A'), ('K', 200, 5.4, 'B')] If I use a generator like the following: K=numpy.array(o[2]/1000.0 for o in opts) It does not work. I have to use: numpy.array([o[2]/1000.0 for o in opts]) Is this behavior intended? By the way, it is quite inefficient to create numpy array this way, because I have to create a regular python first, and then construct a numpy array. But I do not want to store everything in vector form initially, as it is more natural to store them in objects, and easier to use when organizing the data. Does anyone know any better way? Thanks, Geoffrey
Geoffrey Zhu wrote:
Yes. With arbitrary generators, there is no good way to do the kind of mind-reading that numpy.array() usually does with sequences. It would have to unroll the whole generator anyways. fromiter() works for this, but you are restricted to 1-D arrays which is a lot easier to implement the mind-reading for. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
Is there a reason not to add an argument to fromiter that specifies the final size of the n-d array? Reading this discussion, I realized that there are several places in my code where I create 2-D arrays like this: arr = N.array([d.data() for d in list_of_data_containers]), where d.data() returns a buffer object. I would guess that this paradigm causes lots of memory copying. The more efficient solution, I think, would be to preallocate the array and then assign each row in a loop. It's so much clearer this way, however, that I've kept it as is in the code. So, what if I could do something like arr = N.fromiter(d.data() for d in list_of_data_containers, shape=(x,y)), with the contract that fromiter will throw an exception if any of the d.data() are not of size y or if there are more than x elements in list_of_data_containers? Just a thought for discussion. barry On 8/16/07, Robert Kern <robert.kern@gmail.com> wrote:
On 8/17/07, Barry Wark <barrywark@gmail.com> wrote:
I don't know that there's any theoretical problem in terms of doing something like this. There are a couple of practical issues though. One is that it would significantly increase the implementation complexity of fromiter, which right now is about as simple as it can reasonably be. Someone would need to step forward and write and test the code. The second issue is with the interface. The interface that you propose isn't really right. The current interface is: fromiter(iterable, dtype, count=-1) where count indicates how many items to extract from the iterable (-1 iterates until it is empty). 'shape' as you propose would couple to this in an unnatural way. Adding another keyword argument that indicates just the shape of the elements would make more sense, but it starts to seem a bit clunky. fromiter(iterable, dtype, count-1, itemshape=()) For this particular application, there doesn't seem to be any problem simply defining yourself a little utility function to do this for you. def from_shaped_iter(iterable, dtype, shape): a = numpy.empty(shape, dtype) for i, x in enumerate(iterable): a[i] = x return a I expect this would have decent performance if y dimension is reasonably large. regards, -tim with the contract that fromiter will throw an exception if any of the
-- . __ . |-\ . . tim.hochberg@ieee.org
participants (5)
-
Alan G Isaac
-
Barry Wark
-
Geoffrey Zhu
-
Robert Kern
-
Timothy Hochberg