[Numpy-discussion] Matrix Class

Sat Feb 14 20:21:43 EST 2015

On Sat, Feb 14, 2015 at 4:27 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
>
>
> On Sat, Feb 14, 2015 at 12:36 PM, <josef.pktd at gmail.com> wrote:
>>
>> On Sat, Feb 14, 2015 at 12:05 PM, cjw <cjw at ncf.ca> wrote:
>> >
>> > On 14-Feb-15 11:35 AM, josef.pktd at gmail.com wrote:
>> >>
>> >> On Wed, Feb 11, 2015 at 4:18 PM, Ryan Nelson <rnelsonchem at gmail.com>
>> >> wrote:
>> >>>
>> >>> Colin,
>> >>>
>> >>> I currently use Py3.4 and Numpy 1.9.1. However, I built a quick test
>> >>> conda
>> >>> environment with Python2.7 and Numpy 1.7.0, and I get the same:
>> >>>
>> >>> ############
>> >>> Python 2.7.9 |Continuum Analytics, Inc.| (default, Dec 18 2014,
>> >>> 16:57:52)
>> >>> [MSC v
>> >>> .1500 64 bit (AMD64)]
>> >>> Type "copyright", "credits" or "license" for more information.
>> >>>
>> >>> IPython 2.3.1 -- An enhanced Interactive Python.
>> >>> Anaconda is brought to you by Continuum Analytics.
>> >>> Please check out: http://continuum.io/thanks and https://binstar.org
>> >>> ?         -> Introduction and overview of IPython's features.
>> >>> %quickref -> Quick reference.
>> >>> help      -> Python's own help system.
>> >>> object?   -> Details about 'object', use 'object??' for extra details.
>> >>>
>> >>> In [1]: import numpy as np
>> >>>
>> >>> In [2]: np.__version__
>> >>> Out[2]: '1.7.0'
>> >>>
>> >>> In [3]: np.mat([4,'5',6])
>> >>> Out[3]:
>> >>> matrix([['4', '5', '6']],
>> >>>         dtype='|S1')
>> >>>
>> >>> In [4]: np.mat([4,'5',6], dtype=int)
>> >>> Out[4]: matrix([[4, 5, 6]])
>> >>> ###############
>> >>>
>> >>> As to your comment about coordinating with Statsmodels, you should see
>> >>> the
>> >>> links in the thread that Alan posted:
>> >>> http://permalink.gmane.org/gmane.comp.python.numeric.general/56516
>> >>> http://permalink.gmane.org/gmane.comp.python.numeric.general/56517
>> >>> Josef's comments at the time seem to echo the issues the devs (and
>> >>> others)
>> >>> have with the matrix class. Maybe things have changed with
>> >>> Statsmodels.
>> >>
>> >> Not changed, we have a strict policy against using np.matrix.
>> >>
>> >> generic efficient versions for linear operators, kronecker or sparse
>> >> block matrix styly operations would be useful, but I would use array
>> >> semantics, similar to using dot or linalg functions on ndarrays.
>> >>
>> >> Josef
>> >> (long reply canceled because I'm writing too much that might only be
>> >> of tangential interest or has been in some of the matrix discussion
>> >> before.)
>> >
>> > Josef,
>> >
>> > Many thanks.  I have gained the impression that there is some antipathy
>> > to
>> > np.matrix, perhaps this is because, as others have suggested, the array
>> > doesn't provide an appropriate framework.
>>
>> It's not directly antipathy, it's cost-benefit analysis.
>>
>> np.matrix has few advantages, but makes reading and maintaining code
>> much more difficult.
>> Having to watch out for multiplication `*` is a lot of extra work.
>>
>> Checking shapes and fixing bugs with unexpected dtypes is also a lot
>> of work, but we have large benefits.
>> For a long time the policy in statsmodels was to keep pandas out of
>> the core of functions (i.e. out of the actual calculations) and
>> restrict it to inputs and returns. However, pandas is becoming more
>> popular and can do some things much better than plain numpy, so it is
>> slowly moving inside some of our core calculations.
>> It's still an easy source of bugs, but we do gain something.
>
>
> Any bits of Pandas that might be good for numpy/scipy to steal?

I'm not a Pandas expert.
Some of it comes into statsmodels because we need the data handling
also inside a function, e.g. keeping track of labels, indices, and so
on. Another reason is that contributors are more familiar with
pandas's way of solving a problems, even if I suspect numpy would be
more efficient.

However, a recent change, replaces where I would have used np.unique
with pandas.factorize which is supposed to be faster.
https://github.com/statsmodels/statsmodels/pull/2213

Two or three years ago my numpy way of group handling (using
np.unique, bincount and similar) was still faster than the pandas
`apply` version, I'm not sure that's still true.

And to emphasize: all our heavy stuff especially the big models still
only have numpy and scipy inside (with the exception of one model
waiting in a PR).

Josef

>
> <snip>
>
> Chuck
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>