Hi all, I'm looking for some advice on how to implement a 'unit checking proxy array'. Background: I'm writing a package to run simulations which make extensive use of linear algebra, for which I'm using numpy. However - it is important to my package that quantities can have dimesions, so I've written a class Quantity subclassed from float which includes the physical dimensions and raises exceptions etc. if you try to add volts to kilograms or whatever. This seems to work fine, but I also wanted arrays containing physical dimensions. I wrote a class qarray subclassed from numpy.ndarray which mostly just copies the functionality of an array with dtype=object. Here is the issue. The code for my package internally doesn't use these qarray objects, it uses numpy arrays because it would obviously be terribly slow to be repeatedly checking if the dimensions were consistent all the time. However, I want to present a (somewhat) consistent strict unit-checking front to the user of this package. As one part of this, I would like a class that appears to act exactly like a qarray (which itself is supposed to act very much like a numpy array) but maintains a connection with its numpy array counterpart in my package. Suppose I have an array X in my package, then I want an object Y which the user is free to manipulate like an array, but if they do something like Y[0]=2*volt then the object Y does (1) check that Y[0] has volt units, (2) if so, update Y[0] and also X[0]. My question is about what is the best way to implement something like this. So far, what I've done is written a proxy array class (call it pqarray say) derived from qarray (which is pretty much just array with dtype=object but not exactly as it does some other things too). When my package creates a pqarray for the user, it does the following: take the underlying numpy array X; create Y a qarray based on X with the correct units (this necessarily copies the data in X because the dtype is different); create Z from X, Y, a pqarray that knows that Y is based on X. The class pqarray simply overrides the __setitem__ method to update itself and X if the units are correct (or raise an exception). Now, this does actually work, but it's not terribly elegant and there is a problem. The standard way that this proxy array will be accessed is through another object, let's just call it P. P has a property val (it has to be a property in general, because the internal numpy array could change at any moment). If you write P.val it returns a pqarray corresponding to the internal representation of the data. Let's suppose P.val has length 1000. Suppose the user writes: for i in range(1000): P.val[i] = i*volt Now what happens is that this creates a new pqarray 1000 times, and this involves copying 1000 elements each time for something of the order of 1M operations total when it should really only be of the order of 1000 operations total. Now, obviously given the way it's written, the user 'ought' to write: val = P.val for i in range(1000): val[i] = i*volt or even better (and what you would do in practice): P.val = qarray(range(1000))*volt and then it will be O(1000) operations, but (a) the user shouldn't need to know this, and (b) my pqarray class seems horrifically ugly and I want something better. It seems to me that my options are: (1) Use my current solution, and instruct the user about the potential problem (which probably won't come up too often anyway). (2) Write pqarray to be a subclass of numpy.ndarray so that no data is copied in creating the pqarray, and give pqarray __getitem__ and __setitem__ methods that return a Quantity with the right units and do unit checking, respectively. I have actually implemented this, and it sort of works but I have some worries about it. For example, if Y is a pqarray of the numpy array X then it seems print Y will give the same result as print X (i.e. it forgets the units). I could presumably look this up and override this behaviour but that's a game which has no end (or at least, a very distant end that might end up with me having reimplemented the whole of numpy.ndarray). (3) Have some sort of caching mechanism, so that P.val returns the cached value unless the underlying numpy array is changed. The problem with this is that you have to keep track of whether or not the underlying numpy array has changed which could be fiddly and rather unsafe. (4) Give up hoping that P.val can be a subclass of an array, make it a new class that keeps a copy of the necessary units and the underlying array, and can do some basic array operations. Since it's not an array type object itself, the user can be told not to expect it to behave like one. This would be quite a sad solution though, because I want the user to be able to write things like mean(P.val) and have them just work as they'd expect (which works at the moment). (5) Something I haven't thought of yet...? Any advice of any sort would be very much appreciated. I did search for relevant information but nothing much came up. Plenty about how to do subclassing of numpy array's, but that's not really my problem here - it's more specific than that. Many thanks in advance, -- Dan Goodman http://thesamovar.net/contact
On 13/02/2008, Dan Goodman <dg.gmane.comp.python.numeric.general@thesamovar.net> wrote:
Background: I'm writing a package to run simulations which make extensive use of linear algebra, for which I'm using numpy. However - it is important to my package that quantities can have dimesions, so I've written a class Quantity subclassed from float which includes the physical dimensions and raises exceptions etc. if you try to add volts to kilograms or whatever. This seems to work fine, but I also wanted arrays containing physical dimensions. I wrote a class qarray subclassed from numpy.ndarray which mostly just copies the functionality of an array with dtype=object. Here is the issue. The code for my package internally doesn't use these qarray objects, it uses numpy arrays because it would obviously be terribly slow to be repeatedly checking if the dimensions were consistent all the time. However, I want to present a (somewhat) consistent strict unit-checking front to the user of this package. As one part of this, I would like a class that appears to act exactly like a qarray (which itself is supposed to act very much like a numpy array) but maintains a connection with its numpy array counterpart in my package.
I'm not sure I understand all the different layers you're talking about here. I think that I would attack your problem in the following way: Use ScientificPython's PhysicalQuantity (alone and in object arrays) or "unum"s to represent quantities with units: http://books.google.com/books?id=3nR75KSvsq4C&pg=PA169&lpg=PA169&dq=numpy+physicalquantity&source=web&ots=_cmXEC0qpx&sig=JBEz9BlegnIQJor-1BwnAdRTKLE Object arrays are fairly horrible, but having one unit per array shouldn't be very inefficient. It could cause some headaches, since it's sometimes useful to have an array with mixed units (differential equation solvers often require this when you convert a higher-order problem to first-order, for example), but it should be quite efficient computationally. So I would pull together an array that held a collection of quantities all in the same units. Type checking then becomes a once-per-array operation, which is relatively cheap. The way I'd implement this is (by preference) by grabbing a version off the net, or (if that fails) as a subclass of ndarray. If I want to do some heavy-duty array mangling without type checking getting in my way, I'll just convert to an ndarray (either normalizing the units or saving them), do the heavy lifting, and reapply the units afterwards. I don't really see why you're talking about proxy classes... Perhaps your problem is actually two problems: implementing unit arrays, and doing something I haven't understood. Is there any reason not to separate the two, and use one of the existing solutions for unit arrays? Anne
Hi Anne, Thanks for your reply. As you say there are a few different problems in here. The first is about implementing a system of physical quantities with units. I reviewed various options for this including the ones you mentioned (unum and ScientificPython), but they all had some important features missing, so I implemented my own class which does this (building on the ideas they used). This is finished now and I don't need to worry about it. The problem I'm working on now is the best way to deal with arrays of quantities with units, and efficient computations. At the moment, what I'm going for is an internal implementation (the code in the package) which knows about the units of the arrays, but stores them as numpy arrays for speed. What I want to do though is to have it so that as far as the user of the package is concerned, everything appears to have units (single quantities or arrays of quantities). One way of doing this is as you have suggested, defining an array class that has a single unit. This is quite appealing and I'm trying to implement this in a way that does everything I want it to do (Unum does this but it has some odd behaviours that make it unsuitable for what I want to do). There is still a problem though that one of the main data structures in my package is a 2d array with each column having a different unit. This wouldn't be a huge problem because I can pass the user slices of that array which have a single unit. Perhaps a more specific question I can ask is: what considerations do you need to take into account when subclassing numpy arrays in a way that changes their semantics in a substantial way like adding units to them? The documentation that I have found only seems to deal with fairly simple cases where you are just adding something to the array (like in the http://scipy.org/Subclasses document). For example, when I tried to add a single unit to a numpy array, I tried overriding __getitem__(self,i) so that it returned numpy.ndarray.__getitem__(self,i)*self.unit which kind of works except that when I print the array you don't see the units, just the underlying float values. As I said before, I can probably override this behaviour, but how many other similar things like that would I need to do? Dan
participants (2)
-
Anne Archibald -
Dan Goodman