Trying to debug code written by an undergrad working for a colleague of mine who ported code over from MATLAB, I am seeing an ugly melange of matrix objects and ndarrays that are interacting poorly with each other and various functions in SciPy/other libraries. In particular there was a custom minimizer function that contained a line "a * b", that was receiving an Nx1 "matrix" and a N-length array and computing an outer product. Hence the unexpected 6 GB of memory usage and weird results...
We've had this discussion before and it seems that the matrix class isn't going anywhere (I *really* wish it would at least be banished from the top-level namespace), but it has its adherents for pedagogical reasons. Could we at least consider putting a gigantic warning on all the functions for creating matrix objects (matrix, mat, asmatrix, etc.) that they may not behave quite so predictably in some situations and should be avoided when writing nontrivial code?
There are already such warnings scattered about on SciPy.org but the situation is so bad, in my opinion (bad from a programming perspective and bad from a new user perspective, asking "why doesn't this work? why doesn't that work? why is this language/library/etc. so stupid, inconsistent, etc.?") that the situation warrants steering people still further away from the matrix object.
I apologize for ranting, but it pains me when people give up on Python/NumPy because they can't figure out inconsistencies that aren't really there for a good reason. IMHO, of course.
David
David
On Apr 26, 2010, at 7:19 PM, David Warde-Farley wrote:
Trying to debug code written by an undergrad working for a colleague of mine who ported code over from MATLAB, I am seeing an ugly melange of matrix objects and ndarrays that are interacting poorly with each other and various functions in SciPy/other libraries. In particular there was a custom minimizer function that contained a line "a * b", that was receiving an Nx1 "matrix" and a N-length array and computing an outer product. Hence the unexpected 6 GB of memory usage and weird results...
Overloading '*' and '**' while convenient does have consequences. It would be nice if we could have a few more infix operators in Python to allow separation of element-by-element calculations and "dot-product" calculations.
A proposal was made to allow "calling a NumPy array" to infer dot product:
a(b) is equivalent to dot(a,b)
a(b)(c) would be equivalent to dot(dot(a,b),c)
This seems rather reasonable.
While I don't have any spare cycles to push it forward and we are already far along on the NumPy to 3.0, I had wondered if we couldn't use the leverage of Python core developers wanting NumPy to be ported to Python 3 to actually add a few more infix operators to the language.
One of the problems of moving to Python 3.0 for many people is that there are not "new features" to outweigh the hassle of moving. Having a few more infix operators would be a huge incentive to the NumPy community to move to Python 3.
Anybody willing to lead the charge with the Python developers?
-Travis
On Wed, Apr 28, 2010 at 11:05, Travis Oliphant oliphant@enthought.com wrote:
On Apr 26, 2010, at 7:19 PM, David Warde-Farley wrote:
Trying to debug code written by an undergrad working for a colleague of mine who ported code over from MATLAB, I am seeing an ugly melange of matrix objects and ndarrays that are interacting poorly with each other and various functions in SciPy/other libraries. In particular there was a custom minimizer function that contained a line "a * b", that was receiving an Nx1 "matrix" and a N-length array and computing an outer product. Hence the unexpected 6 GB of memory usage and weird results...
Overloading '*' and '**' while convenient does have consequences. It would be nice if we could have a few more infix operators in Python to allow separation of element-by-element calculations and "dot-product" calculations.
A proposal was made to allow "calling a NumPy array" to infer dot product:
a(b) is equivalent to dot(a,b)
a(b)(c) would be equivalent to dot(dot(a,b),c)
This seems rather reasonable.
While I don't have any spare cycles to push it forward and we are already far along on the NumPy to 3.0, I had wondered if we couldn't use the leverage of Python core developers wanting NumPy to be ported to Python 3 to actually add a few more infix operators to the language.
One of the problems of moving to Python 3.0 for many people is that there are not "new features" to outweigh the hassle of moving. Having a few more infix operators would be a huge incentive to the NumPy community to move to Python 3.
Anybody willing to lead the charge with the Python developers?
There is currently a moratorium on language changes. This will have to wait.
Robert Kern robert.kern@gmail.com writes:
Overloading '*' and '**' while convenient does have consequences. It would be nice if we could have a few more infix operators in Python to allow separation of element-by-element calculations and "dot-product" calculations.
http://www.python.org/dev/peps/pep-0225/ was considered and rejected. But that was in 2000...
While I don't have any spare cycles to push it forward and we are already far along on the NumPy to 3.0, I had wondered if we couldn't use the leverage of Python core developers wanting NumPy to be ported to Python 3 to actually add a few more infix operators to the language.
I don't think that stands a chance: http://www.python.org/dev/peps/pep-3003/
Best,
-Nikolaus
Nikolaus Rath wrote:
Robert Kern robert.kern@gmail.com writes:
Overloading '*' and '**' while convenient does have consequences. It would be nice if we could have a few more infix operators in Python to allow separation of element-by-element calculations and "dot-product" calculations.
http://www.python.org/dev/peps/pep-0225/ was considered and rejected. But that was in 2000...
See also: http://www.python.org/dev/peps/pep-0211/ http://mail.python.org/pipermail/python-dev/2008-November/083493.html (The .html link Fernando gives is dead but the reST source is below in that message)
On Apr 28, 2010, at 11:50 AM, Nikolaus Rath wrote:
Robert Kern robert.kern@gmail.com writes:
Overloading '*' and '**' while convenient does have consequences. It would be nice if we could have a few more infix operators in Python to allow separation of element-by-element calculations and "dot- product" calculations.
http://www.python.org/dev/peps/pep-0225/ was considered and rejected. But that was in 2000...
While I don't have any spare cycles to push it forward and we are already far along on the NumPy to 3.0, I had wondered if we couldn't use the leverage of Python core developers wanting NumPy to be ported to Python 3 to actually add a few more infix operators to the language.
I don't think that stands a chance: http://www.python.org/dev/peps/pep-3003/
Frankly, I still think we should move forward. It will take us as long as the moratorium is in effect to figure out what operators we want anyway and we can do things like put attributes on arrays in the meantime to implement the infix operators we think we need.
It's too bad we don't have more of a voice with the Python core team. This is our fault of course (we don't have people with spare cycles to spend the time interfacing), but it's still too bad.
-Travis
On Apr 28, 2010, at 11:19 AM, Robert Kern wrote:
On Wed, Apr 28, 2010 at 11:05, Travis Oliphant oliphant@enthought.com wrote:
On Apr 26, 2010, at 7:19 PM, David Warde-Farley wrote:
Trying to debug code written by an undergrad working for a colleague of mine who ported code over from MATLAB, I am seeing an ugly melange of matrix objects and ndarrays that are interacting poorly with each other and various functions in SciPy/other libraries. In particular there was a custom minimizer function that contained a line "a * b", that was receiving an Nx1 "matrix" and a N-length array and computing an outer product. Hence the unexpected 6 GB of memory usage and weird results...
Overloading '*' and '**' while convenient does have consequences. It would be nice if we could have a few more infix operators in Python to allow separation of element-by-element calculations and "dot- product" calculations.
A proposal was made to allow "calling a NumPy array" to infer dot product:
a(b) is equivalent to dot(a,b)
a(b)(c) would be equivalent to dot(dot(a,b),c)
This seems rather reasonable.
While I don't have any spare cycles to push it forward and we are already far along on the NumPy to 3.0, I had wondered if we couldn't use the leverage of Python core developers wanting NumPy to be ported to Python 3 to actually add a few more infix operators to the language.
One of the problems of moving to Python 3.0 for many people is that there are not "new features" to outweigh the hassle of moving. Having a few more infix operators would be a huge incentive to the NumPy community to move to Python 3.
Anybody willing to lead the charge with the Python developers?
There is currently a moratorium on language changes. This will have to wait.
Exceptions can always be made for the right reasons. I don't think this particular question has received sufficient audience with Python core developers. The reason they want the moratorium is for stability, but they also want Python 3k to be adopted.
-Travis
On Wed, Apr 28, 2010 at 15:50, Travis Oliphant oliphant@enthought.com wrote:
On Apr 28, 2010, at 11:19 AM, Robert Kern wrote:
On Wed, Apr 28, 2010 at 11:05, Travis Oliphant oliphant@enthought.com wrote:
On Apr 26, 2010, at 7:19 PM, David Warde-Farley wrote:
Trying to debug code written by an undergrad working for a colleague of mine who ported code over from MATLAB, I am seeing an ugly melange of matrix objects and ndarrays that are interacting poorly with each other and various functions in SciPy/other libraries. In particular there was a custom minimizer function that contained a line "a * b", that was receiving an Nx1 "matrix" and a N-length array and computing an outer product. Hence the unexpected 6 GB of memory usage and weird results...
Overloading '*' and '**' while convenient does have consequences. It would be nice if we could have a few more infix operators in Python to allow separation of element-by-element calculations and "dot- product" calculations.
A proposal was made to allow "calling a NumPy array" to infer dot product:
a(b) is equivalent to dot(a,b)
a(b)(c) would be equivalent to dot(dot(a,b),c)
This seems rather reasonable.
While I don't have any spare cycles to push it forward and we are already far along on the NumPy to 3.0, I had wondered if we couldn't use the leverage of Python core developers wanting NumPy to be ported to Python 3 to actually add a few more infix operators to the language.
One of the problems of moving to Python 3.0 for many people is that there are not "new features" to outweigh the hassle of moving. Having a few more infix operators would be a huge incentive to the NumPy community to move to Python 3.
Anybody willing to lead the charge with the Python developers?
There is currently a moratorium on language changes. This will have to wait.
Exceptions can always be made for the right reasons. I don't think this particular question has received sufficient audience with Python core developers.
It received plenty of audience on python-dev in 2008. But no one from our community cared enough to actually implement it.
http://fperez.org/py4science/numpy-pep225/numpy-pep225.html
The reason they want the moratorium is for stability, but they also want Python 3k to be adopted.
This is not something that will justify an exception. Things like "oh crap, this old feature has a lurking flaw that we've never noticed before and needs a language change to fix" are possible exceptions to the moratorium, not something like this. PEP 3003 quite clearly lays out the possible exceptions:
""" Case-by-Case Exemptions
New methods on built-ins
The case for adding a method to a built-in object can be made.
Incorrect language semantics
If the language semantics turn out to be ambiguous or improperly implemented based on the intention of the original design then the semantics may change.
Language semantics that are difficult to implement
Because other VMs have not begun implementing Python 3.x semantics there is a possibility that certain semantics are too difficult to replicate. In those cases they can be changed to ease adoption of Python 3.x by the other VMs. """
This feature falls into none of these categories. It does fall into this one:
""" Cannot Change ... Language syntax
The grammar file essentially becomes immutable apart from ambiguity fixes. """
Guido is taking a hard line on this.
On Wed, Apr 28, 2010 at 10:05 AM, Travis Oliphant oliphant@enthought.comwrote:
On Apr 26, 2010, at 7:19 PM, David Warde-Farley wrote:
Trying to debug code written by an undergrad working for a colleague of mine who ported code over from MATLAB, I am seeing an ugly melange of matrix objects and ndarrays that are interacting poorly with each other and various functions in SciPy/other libraries. In particular there was a custom minimizer function that contained a line "a * b", that was receiving an Nx1 "matrix" and a N-length array and computing an outer product. Hence the unexpected 6 GB of memory usage and weird results...
Overloading '*' and '**' while convenient does have consequences. It would be nice if we could have a few more infix operators in Python to allow separation of element-by-element calculations and "dot-product" calculations.
A proposal was made to allow "calling a NumPy array" to infer dot product:
a(b) is equivalent to dot(a,b)
a(b)(c) would be equivalent to dot(dot(a,b),c)
This seems rather reasonable.
I like this too. A similar proposal that recently showed up on the list was to add a dot method to ndarrays so that a(b)(c) would be written a.dot(b).dot(c).
While I don't have any spare cycles to push it forward and we are already far along on the NumPy to 3.0, I had wondered if we couldn't use the leverage of Python core developers wanting NumPy to be ported to Python 3 to actually add a few more infix operators to the language.
One of the problems of moving to Python 3.0 for many people is that there are not "new features" to outweigh the hassle of moving. Having a few more infix operators would be a huge incentive to the NumPy community to move to Python 3.
Anybody willing to lead the charge with the Python developers?
Problem is that we couldn't decide on an appropriate operator. Adding a keyword that functioned like "and" would likely break all sorts of code, so it needs to be something that is not currently seen in the wild.
Chuck
On 2010-04-28, at 12:05 PM, Travis Oliphant wrote:
a(b) is equivalent to dot(a,b)
a(b)(c) would be equivalent to dot(dot(a,b),c)
This seems rather reasonable.
Indeed, and it leads to a rather pleasant way of permuting syntax to change the order of operations, i.e. a(b(c)) vs. a(b)(c).
David
On Wed, Apr 28, 2010 at 1:30 PM, David Warde-Farley dwf@cs.toronto.edu wrote:
On 2010-04-28, at 12:05 PM, Travis Oliphant wrote:
a(b) is equivalent to dot(a,b)
a(b)(c) would be equivalent to dot(dot(a,b),c)
This seems rather reasonable.
Indeed, and it leads to a rather pleasant way of permuting syntax to change the order of operations, i.e. a(b(c)) vs. a(b)(c).
I like the explicit dot method much better, __call__ (parentheses) can mean anything, and reading the code will be more difficult. (especially when switching from matlab)
Josef
David _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
josef.pktd@gmail.com wrote:
On Wed, Apr 28, 2010 at 1:30 PM, David Warde-Farley dwf@cs.toronto.edu wrote:
On 2010-04-28, at 12:05 PM, Travis Oliphant wrote:
a(b) is equivalent to dot(a,b)
a(b)(c) would be equivalent to dot(dot(a,b),c)
This seems rather reasonable.
Indeed, and it leads to a rather pleasant way of permuting syntax to change the order of operations, i.e. a(b(c)) vs. a(b)(c).
I like the explicit dot method much better, __call__ (parentheses) can mean anything, and reading the code will be more difficult. (especially when switching from matlab)
Yes, you have a point, though I do think it would read easier as a method, a.dot(b.dot(c)) or a.dot(b).dot(c) rather than as a prefix/function.
(Down the line, another concern is the fact that dot() always incurs a memory allocation for the result, which matters sometimes when you have a huge amount of data and limited RAM. As I currently understand the situation, it shouldn't be hard to add an optional 'out' argument or a gaxpy primitive that maps directly onto the BLAS call; I just haven't had time yet to investigate it further.)
David
David
On 4/28/2010 12:05 PM, Travis Oliphant wrote:
A proposal was made to allow "calling a NumPy array" to infer dot product:
a(b) is equivalent to dot(a,b)
a(b)(c) would be equivalent to dot(dot(a,b),c)
Here is a related ticket that proposes a more explicit alternative: adding a ``dot`` method to ndarray. http://projects.scipy.org/numpy/ticket/1456
fwiw, Alan
On Wed, Apr 28, 2010 at 2:12 PM, Alan G Isaac aisaac@american.edu wrote:
On 4/28/2010 12:05 PM, Travis Oliphant wrote:
A proposal was made to allow "calling a NumPy array" to infer dot product:
a(b) is equivalent to dot(a,b)
a(b)(c) would be equivalent to dot(dot(a,b),c)
Here is a related ticket that proposes a more explicit alternative: adding a ``dot`` method to ndarray. http://projects.scipy.org/numpy/ticket/1456
FWIW, I have borrowed a convenience function chain_dot originally from pandas that works for me as a stop gap for more readable code.
def chain_dot(*arrs): """ Returns the dot product of the given matrices.
Parameters ---------- arrs: argument list of ndarrays
Returns ------- Dot product of all arguments.
Example ------- >>> import numpy as np >>> from scikits.statsmodels.tools import chain_dot >>> A = np.arange(1,13).reshape(3,4) >>> B = np.arange(3,15).reshape(4,3) >>> C = np.arange(5,8).reshape(3,1) >>> chain_dot(A,B,C) array([[1820], [4300], [6780]]) """ return reduce(lambda x, y: np.dot(y, x), arrs[::-1])
Skipper
Wed, 28 Apr 2010 14:12:07 -0400, Alan G Isaac wrote: [clip]
Here is a related ticket that proposes a more explicit alternative: adding a ``dot`` method to ndarray. http://projects.scipy.org/numpy/ticket/1456
I kind of like this idea. Simple, obvious, and leads to clear code:
a.dot(b).dot(c)
or in another multiplication order,
a.dot(b.dot(c))
And here's an implementation:
http://github.com/pv/numpy-work/commit/414429ce0bb0c4b7e780c4078c5ff71c11305...
I think I'm going to apply this, unless someone complains, as I don't see any downsides (except maybe adding one more to the huge list of methods ndarray already has).
Cheers, Pauli
Hi,
I kind of like this idea. Simple, obvious, and leads to clear code:
a.dot(b).dot(c)
or in another multiplication order,
a.dot(b.dot(c))
And here's an implementation:
http://github.com/pv/numpy-work/commit/414429ce0bb0c4b7e780c4078c5ff71c11305...
I think I'm going to apply this, unless someone complains, as I don't see any downsides (except maybe adding one more to the huge list of methods ndarray already has).
Excellent excellent excellent. Once again, I owe you a beverage of your choice.
Matthew
On Apr 29, 2010, at 2:30 PM, Pauli Virtanen wrote:
Wed, 28 Apr 2010 14:12:07 -0400, Alan G Isaac wrote: [clip]
Here is a related ticket that proposes a more explicit alternative: adding a ``dot`` method to ndarray. http://projects.scipy.org/numpy/ticket/1456
I kind of like this idea. Simple, obvious, and leads to clear code:
a.dot(b).dot(c)
or in another multiplication order,
a.dot(b.dot(c))
And here's an implementation:
http://github.com/pv/numpy-work/commit/414429ce0bb0c4b7e780c4078c5ff71c11305...
I think I'm going to apply this, unless someone complains, as I don't see any downsides (except maybe adding one more to the huge list of methods ndarray already has).
+1
-Travis
On Thu, Apr 29, 2010 at 1:30 PM, Pauli Virtanen pav@iki.fi wrote:
Wed, 28 Apr 2010 14:12:07 -0400, Alan G Isaac wrote: [clip]
Here is a related ticket that proposes a more explicit alternative: adding a ``dot`` method to ndarray. http://projects.scipy.org/numpy/ticket/1456
I kind of like this idea. Simple, obvious, and leads to clear code:
a.dot(b).dot(c)
or in another multiplication order,
a.dot(b.dot(c))
And here's an implementation:
http://github.com/pv/numpy-work/commit/414429ce0bb0c4b7e780c4078c5ff71c11305...
I think I'm going to apply this, unless someone complains, as I don't see any downsides (except maybe adding one more to the huge list of methods ndarray already has).
Hey, that was my weekend project 8-) But obviously I think it is a good idea.
Chuck
On Thu, Apr 29, 2010 at 1:30 PM, Pauli Virtanen pav@iki.fi wrote:
Wed, 28 Apr 2010 14:12:07 -0400, Alan G Isaac wrote: [clip]
Here is a related ticket that proposes a more explicit alternative: adding a ``dot`` method to ndarray. http://projects.scipy.org/numpy/ticket/1456
I kind of like this idea. Simple, obvious, and leads to clear code:
a.dot(b).dot(c)
or in another multiplication order,
a.dot(b.dot(c))
And here's an implementation:
http://github.com/pv/numpy-work/commit/414429ce0bb0c4b7e780c4078c5ff71c11305...
I think I'm going to apply this, unless someone complains, as I don't see any downsides (except maybe adding one more to the huge list of methods ndarray already has).
That should do it. I was going to link directly to the code in multiarraymodule, maybe break it out into a separate dot.c file, but the call to the python function gets to the goal with less effort.
Chuck
Pauli Virtanen wrote:
Wed, 28 Apr 2010 14:12:07 -0400, Alan G Isaac wrote: [clip]
Here is a related ticket that proposes a more explicit alternative: adding a ``dot`` method to ndarray. http://projects.scipy.org/numpy/ticket/1456
I kind of like this idea. Simple, obvious, and leads to clear code:
a.dot(b).dot(c)
or in another multiplication order,
a.dot(b.dot(c))
And here's an implementation:
http://github.com/pv/numpy-work/commit/414429ce0bb0c4b7e780c4078c5ff71c11305...
I think I'm going to apply this, unless someone complains, as I don't see any downsides (except maybe adding one more to the huge list of methods ndarray already has).
Wonderful. Thanks for jumping on it.
David
On Thu, Apr 29, 2010 at 07:30:31PM +0000, Pauli Virtanen wrote:
a.dot(b.dot(c))
And here's an implementation:
http://github.com/pv/numpy-work/commit/414429ce0bb0c4b7e780c4078c5ff71c11305...
/me very happy.
Gaël
On Thu, Apr 29, 2010 at 12:30 PM, Pauli Virtanen pav@iki.fi wrote:
I think I'm going to apply this, unless someone complains, as I don't see any downsides (except maybe adding one more to the huge list of methods ndarray already has).
But one of the most badly needed ones, so run, don't walk to commit it :)
Thanks!
f
On Thu, Apr 29, 2010 at 12:30 PM, Pauli Virtanen pav@iki.fi wrote:
Wed, 28 Apr 2010 14:12:07 -0400, Alan G Isaac wrote: [clip]
Here is a related ticket that proposes a more explicit alternative: adding a ``dot`` method to ndarray. http://projects.scipy.org/numpy/ticket/1456
I kind of like this idea. Simple, obvious, and leads to clear code:
a.dot(b).dot(c)
or in another multiplication order,
a.dot(b.dot(c))
And here's an implementation:
http://github.com/pv/numpy-work/commit/414429ce0bb0c4b7e780c4078c5ff71c11305...
I think I'm going to apply this, unless someone complains,
I have a big one: NO DOCSTRING!!! We're just perpetuating the errors of the past people! Very discouraging!
DG
as I don't see any downsides (except maybe adding one more to the huge list of methods ndarray already has).
Cheers, Pauli
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
David Warde-Farley wrote:
Trying to debug code written by an undergrad working for a colleague of mine who ported code over from MATLAB, I am seeing an ugly melange of matrix objects and ndarrays that are interacting poorly with each other and various functions in SciPy/other libraries. In particular there was a custom minimizer function that contained a line "a * b", that was receiving an Nx1 "matrix" and a N-length array and computing an outer product. Hence the unexpected 6 GB of memory usage and weird results...
If this was in a library function of some sort, I think they should always call np.asarray on the input arguments. That converts matrices to normal arrays.
It could have been Python lists-of-lists, other PEP 3118 objects -- in Python an object can be everything in general, and I think it is very proper for most reusable functions to either validate the type of their arguments or take some steps to convert.
That said, I second that it would be good to deprecate the matrix class from NumPy. The problem for me is not the existance of a matrix class as such, but the fact that it subclasses np.ndarray and is so similar with it, breaking a lot of rules for OO programming in the process.
(Example: I happen to have my own oomatrix.py which allows me to do
P, L = (A * A.H).cholesky() y = L.solve_right(x)
This works fine because the matrices don't support any NumPy operations, and so I don't confuse them. But it helps to have to habit to do np.asarray in reusable functions so that errors are caught early.
I do this so that A above can be either sparse, dense, triangular, diagonal, etc. -- i.e. polymorphic linear algebra. On the other hand, they don't even support single-element lookups, although that's just because I've been to lazy to implement it. Iteration is out of the question, it's just not the level of abstraction I'd like a "matrix" to work at.)
Dag Sverre
We've had this discussion before and it seems that the matrix class isn't going anywhere (I *really* wish it would at least be banished from the top-level namespace), but it has its adherents for pedagogical reasons. Could we at least consider putting a gigantic warning on all the functions for creating matrix objects (matrix, mat, asmatrix, etc.) that they may not behave quite so predictably in some situations and should be avoided when writing nontrivial code?
There are already such warnings scattered about on SciPy.org but the situation is so bad, in my opinion (bad from a programming perspective and bad from a new user perspective, asking "why doesn't this work? why doesn't that work? why is this language/library/etc. so stupid, inconsistent, etc.?") that the situation warrants steering people still further away from the matrix object.
I apologize for ranting, but it pains me when people give up on Python/NumPy because they can't figure out inconsistencies that aren't really there for a good reason. IMHO, of course.
David
David _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Wed, Apr 28, 2010 at 10:08 AM, Dag Sverre Seljebotn < dagss@student.matnat.uio.no> wrote:
David Warde-Farley wrote:
Trying to debug code written by an undergrad working for a colleague of mine who ported code over from MATLAB, I am seeing an ugly melange of matrix objects and ndarrays that are interacting poorly with each other and various functions in SciPy/other libraries. In particular there was a custom minimizer function that contained a line "a * b", that was receiving an Nx1 "matrix" and a N-length array and computing an outer product. Hence the unexpected 6 GB of memory usage and weird results...
If this was in a library function of some sort, I think they should always call np.asarray on the input arguments. That converts matrices to normal arrays.
It could have been Python lists-of-lists, other PEP 3118 objects -- in Python an object can be everything in general, and I think it is very proper for most reusable functions to either validate the type of their arguments or take some steps to convert.
That said, I second that it would be good to deprecate the matrix class from NumPy. The problem for me is not the existance of a matrix class as such, but the fact that it subclasses np.ndarray and is so similar with it, breaking a lot of rules for OO programming in the process.
Yeah. Masked arrays have similar problems. Pierre has done so much work to have masked versions of the various functions that it might as well be a standalone class.
<snip>
Chuck
On 4/28/2010 12:08 PM, Dag Sverre Seljebotn wrote:
it would be good to deprecate the matrix class from NumPy
Please let us not have this discussion all over again.
The matrix class is very useful for teaching. In economics for example, the use of matrix algebra is widespread, while algebra with arrays that are not matrices is very rare. I can (and do) use NumPy matrices even in undergraduate courses.
If you do not like them, do not use them.
If you want `matrix` replaced with a better matrix object, offer a replacement for community consideration.
Thank you, Alan Isaac
PS There is one change I would not mind: let A * M be undefined if A is an ndarray and M is a NumPy matrix.
On 28 April 2010 14:30, Alan G Isaac aisaac@american.edu wrote:
On 4/28/2010 12:08 PM, Dag Sverre Seljebotn wrote:
it would be good to deprecate the matrix class from NumPy
Please let us not have this discussion all over again.
I think you may be too late on this, but it's worth a try.
The matrix class is very useful for teaching. In economics for example, the use of matrix algebra is widespread, while algebra with arrays that are not matrices is very rare. I can (and do) use NumPy matrices even in undergraduate courses.
If you do not like them, do not use them.
This is the problem: lots of people start using numpy and think "hmm, I want to store two-dimensional data so I'll use a matrix", and have no idea that "matrix" means anything different from "two-dimensional array". It was this that inspired David's original post, and it's this that we're trying to find a solution for.
If you want `matrix` replaced with a better matrix object, offer a replacement for community consideration.
Thank you, Alan Isaac
PS There is one change I would not mind: let A * M be undefined if A is an ndarray and M is a NumPy matrix.
I can definitely vote for this, in the interest of catching as many inadvertent matrix users as possible.
Anne
On 2010-04-28, at 2:30 PM, Alan G Isaac wrote:
Please let us not have this discussion all over again.
Agreed. See my preface to this discussion.
My main objection is that it's not easy to explain to a newcomer what the difference precisely is, how they interact, why two of them exist, how they are sort-of-compatible-but-not...
The matrix class is very useful for teaching. In economics for example, the use of matrix algebra is widespread, while algebra with arrays that are not matrices is very rare. I can (and do) use NumPy matrices even in undergraduate courses.
Would it be acceptable to retain the matrix class but not have it imported in the default namespace, and have to import e.g. numpy.matlib to get at them?
If you do not like them, do not use them.
The problem isn't really with seasoned users of NumPy not liking them, but rather new users being confused by the presence of (what seems to be) two primitives, array and matrix.
Two things tend to happen:
a) Example code that expects arrays instead receives matrices. If these aren't "cast" with asarray(), mayhem ensues at the first sight of *.
b) Users of class matrix use a proper function correctly coerces input to ndarray, but returns an ndarray. Users are thus confused that, thinking of the function as a black box, putting matrices 'in' doesn't result in getting matrices 'out'. It doesn't take long to get the hang of if you really sit down and work it through, but it also "doesn't take long" to go back to MATLAB or whatever else. My interest is in having as few conceptual stumbling stones as possible.
c) Complicating the situation further, people try to use functions e.g. from scipy.optimize which expect a 1d array by passing in column or row matrices. Even when coerced to array, these have the wrong rank and you get unexpected results (you could argue that we should instead use asarray(..).squeeze() on all incoming arguments, but this may not generalize well).
PS There is one change I would not mind: let A * M be undefined if A is an ndarray and M is a NumPy matrix.
What about the other binary ops? I would say, matrix goes with matrix, array with array, never the two shall meet unless you explicitly coerce. The ability to mix the two in a single expression does more harm than good, IMHO.
David
On Apr 28, 2010, at 4:46 PM, David Warde-Farley wrote:
On 2010-04-28, at 2:30 PM, Alan G Isaac wrote:
Please let us not have this discussion all over again.
Agreed. See my preface to this discussion.
My main objection is that it's not easy to explain to a newcomer what the difference precisely is, how they interact, why two of them exist, how they are sort-of-compatible-but-not...
The matrix class is very useful for teaching. In economics for example, the use of matrix algebra is widespread, while algebra with arrays that are not matrices is very rare. I can (and do) use NumPy matrices even in undergraduate courses.
Would it be acceptable to retain the matrix class but not have it imported in the default namespace, and have to import e.g. numpy.matlib to get at them?
If you do not like them, do not use them.
The problem isn't really with seasoned users of NumPy not liking them, but rather new users being confused by the presence of (what seems to be) two primitives, array and matrix.
Two things tend to happen:
a) Example code that expects arrays instead receives matrices. If these aren't "cast" with asarray(), mayhem ensues at the first sight of *.
b) Users of class matrix use a proper function correctly coerces input to ndarray, but returns an ndarray. Users are thus confused that, thinking of the function as a black box, putting matrices 'in' doesn't result in getting matrices 'out'. It doesn't take long to get the hang of if you really sit down and work it through, but it also "doesn't take long" to go back to MATLAB or whatever else. My interest is in having as few conceptual stumbling stones as possible.
c) Complicating the situation further, people try to use functions e.g. from scipy.optimize which expect a 1d array by passing in column or row matrices. Even when coerced to array, these have the wrong rank and you get unexpected results (you could argue that we should instead use asarray(..).squeeze() on all incoming arguments, but this may not generalize well).
PS There is one change I would not mind: let A * M be undefined if A is an ndarray and M is a NumPy matrix.
What about the other binary ops? I would say, matrix goes with matrix, array with array, never the two shall meet unless you explicitly coerce. The ability to mix the two in a single expression does more harm than good, IMHO.
This could be done in NumPy 2.0. I'm +1 on not allowing a mix of the two.
-Travis
Alan wrote:
There is one change I would not mind: let A * M be undefined if A is an ndarray and M is a NumPy matrix.
On 4/28/2010 5:46 PM, David Warde-Farley wrote:
What about the other binary ops? I would say, matrix goes with matrix, array with array, never the two shall meet unless you explicitly coerce. The ability to mix the two in a single expression does more harm than good, IMHO.
I would be fine with this. I mentioned ``*`` explicitly, because it is the dangerous one (most likely to cause surprises).
Since M.A gives the array representation of the matrix M, explicit "coercion" is almost costless.
Alan Isaac
PS Just to be clear, I'm just a user ...
On 4/28/2010 5:46 PM, David Warde-Farley wrote:
Would it be acceptable to retain the matrix class but not have it imported in the default namespace, and have to import e.g. numpy.matlib to get at them?
If we can have A * M undefined, then I do not think this is a needed addition. But I do not have a strong opinion. It does not create a significant barrier in teaching (and might even have some advantages).
Alan
On Wed, Apr 28, 2010 at 2:46 PM, David Warde-Farley dwf@cs.toronto.edu wrote:
Would it be acceptable to retain the matrix class but not have it imported in the default namespace, and have to import e.g. numpy.matlib to get at them?
+1
Jarrod
Alan G Isaac wrote:
On 4/28/2010 12:08 PM, Dag Sverre Seljebotn wrote:
it would be good to deprecate the matrix class from NumPy
Please let us not have this discussion all over again.
The matrix class is very useful for teaching. In economics for example, the use of matrix algebra is widespread, while algebra with arrays that are not matrices is very rare. I can (and do) use NumPy matrices even in undergraduate courses.
If you do not like them, do not use them.
If you want `matrix` replaced with a better matrix object, offer a replacement for community consideration.
Point taken, I'm sorry.
(If or when my own matrix linear algebra code matures I'll open source it for sure, but it's not there yet, and I don't believe it will fit within NumPy.)
Dag Sverre