[Numpy-discussion] .T Transpose shortcut for arrays again

Bill Baxter wbaxter at gmail.com
Fri Jul 7 02:17:36 EDT 2006


On 7/7/06, Robert Kern <robert.kern at gmail.com> wrote:
>
> Bill Baxter wrote:
> > I am also curious, given the number of times I've heard this nebulous
> > argument of "there are lots kinds of numerical computing that don't
> > invlolve linear algebra", that no one ever seems to name any of these
> > "lots of kinds".  Statistics, maybe?  But you can find lots of linear
> > algebra in statistics.
>
> That's because I'm not waving my hands at general fields of application.
> I'm
> talking about how people actually use array objects on a line-by-line
> basis. If
> I represent a dataset as an array and fit a nonlinear function to that
> dataset,
> am I using linear algebra at some level? Sure! Does having a .T attribute
> on
> that array help me at all? No. Arguing about how fundamental linear
> algebra is
> to numerical endeavors is entirely besides the point.


Ok.  If line-by-line usage is what everyone really means, then I'll get off
the linear algebra soap box, but that's not what it sounded like to me.

So, if you want to talk line-by-line, I really can't talk about much beside
my own code.  But I just grepped through it and out of 2445 non-empty lines
of code:

927 lines contain '='
390 lines contain a '['
75 lines contain matrix,asmatrix, or mat
==>  47 lines contain a '.T' or '.transpose' of some sort.  <==
33 lines contain array, or asarray, or asanyarray
24 lines contain 'rand('  --- I use it for generating bogus test data a lot
17 lines contain 'newaxis' or 'NewAxis'
16 lines contain 'zeros('
13 lines contain 'dot('
12 lines contain 'empty('
8 lines contain 'ones('
7 lines contain 'inv('

I'm pretty new to numpy, so that's all the code I got right now.  I'm sure
I've written many more lines of emails about numpy than I have lines of
actual numpy code.  :-/

But from that, I can say that -- at least in my code -- transpose is pretty
common.  If someone can point me to some larger codebases written in numpy
or numeric, I'd be happy to do a similar analysis of those.


I'm not saying that people who do use arrays for linear algebra are rare or
> unimportant. It's that syntactical convenience for one set of conventional
> ways
> to use an array object, by itself, is not a good enough reason to add
> stuff to
> the core array object.


I wish I had a way to magically find out the distribution of array
dimensions used by all numpy and numeric code out there.  My guess is it
would be something like 1-d: 50%,  2-d: 30%, 3-d: 10%, everything else:
10%.  I can't think of a good way to even get an estimate on that.  But in
any event, I'm positive ndims==2 is a significant percentage of all usages.
It seems like the opponents to this idea are suggesting the distribution is
more flat than that.  But whatever the distribution is, it has to have a
fairly light tail since memory usage is exponential in ndim.  If ndim == 20,
then it takes 8 megabytes just to store the smallest possible non-degenerate
array of float64s (i.e. a 2x2x2x2x...)

It seems crazy to even be arguing this.  Transposing is not some specialized
esoteric operation.  It's important enough that R and S give it a one letter
function, and Matlab, Scilab, K all give it a single-character operator.
[*]   Whoever designed the numpy.matrix class also thought it was worthy of
a shortcut, and I think came up with a pretty good syntax for it.   And the
people who invented math itself decided it was worth assigning a 1-character
exponent to it.

So I think there's a clear argument for having a .T attribute.  But ok,
let's say you're right, and a lot of people won't use it.  Fine.   IT WILL
DO THEM ABSOLUTELY NO HARM.  They don't have to use it if they don't like
it!   Just ignore it.  Unlike a t() function, .T doesn't pollute any
namespace users can define symbols in, so you really can just ignore it if
you're not interested in using it.  It won't get in your way.

For the argument that ndarray should be pure like the driven snow, just a
raw container for n-dimensional data, I think that's what the basearray
thing that goes into Python itself should be.  ndarray is part of numpy and
numpy is for numerical computing.

Regards,
--Bill

[*] Full disclosure: I did find two counter-examples -- Maple and
Mathematica.  Maple has only a transpose() function and Mathematica has only
Transpose[] (but you can use [esc]tr[esc] as a shortcut)  However, both of
those packages are primarily known for their _symbolic_ math capabilities,
not their number crunching, so they less are similar to numpy than
R,S,K,Matlab and Scilab in that regard.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20060707/6c5674ce/attachment.html>


More information about the NumPy-Discussion mailing list