Numpy Performance
Robert Kern
robert.kern at gmail.com
Thu Apr 23 22:30:13 EDT 2009
On 2009-04-23 10:32, timlash wrote:
> Still fairly new to Python. I wrote a program that used a class
> called RectangularArray as described here:
>
> class RectangularArray:
> def __init__(self, rows, cols, value=0):
> self.arr = [None]*rows
> self.row = [value]*cols
> def __getitem__(self, (i, j)):
> return (self.arr[i] or self.row)[j]
> def __setitem__(self, (i, j), value):
> if self.arr[i]==None: self.arr[i] = self.row[:]
> self.arr[i][j] = value
>
> This class was found in a 14 year old post:
> http://www.python.org/search/hypermail/python-recent/0106.html
>
> This worked great and let me process a few hundred thousand data
> points with relative ease. However, I soon wanted to start sorting
> arbitrary portions of my arrays and to transpose others. I turned to
> Numpy rather than reinventing the wheel with custom methods within the
> serviceable RectangularArray class. However, once I refactored with
> Numpy I was surprised to find that the execution time for my program
> doubled! I expected a purpose built array module to be more efficient
> rather than less.
It depends on how much you refactored you code. numpy tries to optimize bulk
operations. If you are doing a lot of __getitem__s and __setitem__s with
individual elements as you would with RectangularArray, numpy is going to do a
lot of extra work creating and deleting the scalar objects.
> I'm not doing any linear algebra with my data. I'm working with
> rectangular datasets, evaluating individual rows, grouping, sorting
> and summarizing various subsets of rows.
>
> Is a Numpy implementation overkill for my data handling uses? Should
> I evaluate prior array modules such as Numeric or Numarray?
No.
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
More information about the Python-list
mailing list