Numpy Performance

Thu Apr 23 22:30:13 EDT 2009

On 2009-04-23 10:32, timlash wrote:
> Still fairly new to Python.  I wrote a program that used a class
> called RectangularArray as described here:
>
> class RectangularArray:
>     def __init__(self, rows, cols, value=0):
>        self.arr = [None]*rows
>        self.row = [value]*cols
>     def __getitem__(self, (i, j)):
>        return (self.arr[i] or self.row)[j]
>     def __setitem__(self, (i, j), value):
>        if self.arr[i]==None: self.arr[i] = self.row[:]
>        self.arr[i][j] = value
>
> This class was found in a 14 year old post:
> http://www.python.org/search/hypermail/python-recent/0106.html
>
> This worked great and let me process a few hundred thousand data
> points with relative ease.  However, I soon wanted to start sorting
> arbitrary portions of my arrays and to transpose others.  I turned to
> Numpy rather than reinventing the wheel with custom methods within the
> serviceable RectangularArray class.  However, once I refactored with
> Numpy I was surprised to find that the execution time for my program
> doubled!  I expected a purpose built array module to be more efficient
> rather than less.

It depends on how much you refactored you code. numpy tries to optimize bulk 
operations. If you are doing a lot of __getitem__s and __setitem__s with 
individual elements as you would with RectangularArray, numpy is going to do a 
lot of extra work creating and deleting the scalar objects.

> I'm not doing any linear algebra with my data.  I'm working with
> rectangular datasets, evaluating individual rows, grouping, sorting
> and summarizing various subsets of rows.
>
> Is a Numpy implementation overkill for my data handling uses?  Should
> I evaluate prior array modules such as Numeric or Numarray?

No.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco