[Numpy-discussion] ndarray.T2 for 2D transpose

Thu Apr 7 11:56:37 EDT 2016

On Thu, Apr 7, 2016 at 11:42 AM, Todd <toddrjen at gmail.com> wrote:
> On Thu, Apr 7, 2016 at 11:35 AM, <josef.pktd at gmail.com> wrote:
>>
>> On Thu, Apr 7, 2016 at 11:13 AM, Todd <toddrjen at gmail.com> wrote:
>> > On Wed, Apr 6, 2016 at 5:20 PM, Nathaniel Smith <njs at pobox.com> wrote:
>> >>
>> >> On Wed, Apr 6, 2016 at 10:43 AM, Todd <toddrjen at gmail.com> wrote:
>> >> >
>> >> > My intention was to make linear algebra operations easier in numpy.
>> >> > With
>> >> > the @ operator available, it is now very easy to do basic linear
>> >> > algebra
>> >> > on
>> >> > arrays without needing the matrix class.  But getting an array into
a
>> >> > state
>> >> > where you can use the @ operator effectively is currently pretty
>> >> > verbose
>> >> > and
>> >> > confusing.  I was trying to find a way to make the @ operator more
>> >> > useful.
>> >>
>> >> Can you elaborate on what you're doing that you find verbose and
>> >> confusing, maybe paste an example? I've never had any trouble like
>> >> this doing linear algebra with @ or dot (which have similar semantics
>> >> for 1d arrays), which is probably just because I've had different use
>> >> cases, but it's much easier to talk about these things with a concrete
>> >> example in front of us to put everyone on the same page.
>> >>
>> >
>> > Let's say you want to do a simple matrix multiplication example.  You
>> > create
>> > two example arrays like so:
>> >
>> >    a = np.arange(20)
>> >    b = np.arange(10, 50, 10)
>> >
>> > Now you want to do
>> >
>> >     a.T @ b
>> >
>> > First you need to turn a into a 2D array.  I can think of 10 ways to do
>> > this
>> > off the top of my head, and there may be more:
>> >
>> >     1a) a[:, None]
>> >     1b) a[None]
>> >     1c) a[None, :]
>> >     2a) a.shape = (1, -1)
>> >     2b) a.shape = (-1, 1)
>> >     3a) a.reshape(1, -1)
>> >     3b) a.reshape(-1, 1)
>> >     4a) np.reshape(a, (1, -1))
>> >     4b) np.reshape(a, (-1, 1))
>> >     5) np.atleast_2d(a)
>> >
>> > 5 is pretty clear, and will work fine with any number of dimensions,
but
>> > is
>> > also long to type out when trying to do a simple example.  The
different
>> > variants of 1, 2, 3, and 4, however, will only work with 1D arrays
>> > (making
>> > them less useful for functions), are not immediately obvious to me what
>> > the
>> > result will be (I always need to try it to make sure the result is what
>> > I
>> > expect), and are easy to get mixed up in my opinion.  They also require
>> > people keep a mental list of lots of ways to do what should be a very
>> > simple
>> > task.
>> >
>> > Basically, my argument here is the same as the argument from pep465 for
>> > the
>> > inclusion of the @ operator:
>> >
>> >
https://www.python.org/dev/peps/pep-0465/#transparent-syntax-is-especially-crucial-for-non-expert-programmers
>> >
>> > "A large proportion of scientific code is written by people who are
>> > experts
>> > in their domain, but are not experts in programming. And there are many
>> > university courses run each year with titles like "Data analysis for
>> > social
>> > scientists" which assume no programming background, and teach some
>> > combination of mathematical techniques, introduction to programming,
and
>> > the
>> > use of programming to implement these mathematical techniques, all
>> > within a
>> > 10-15 week period. These courses are more and more often being taught
in
>> > Python rather than special-purpose languages like R or Matlab.
>> >
>> > For these kinds of users, whose programming knowledge is fragile, the
>> > existence of a transparent mapping between formulas and code often
means
>> > the
>> > difference between succeeding and failing to write that code at all."
>>
>> This doesn't work because of the ambiguity between column and row vector.
>>
>> In most cases 1d vectors in statistics/econometrics are column
>> vectors. Sometime it takes me a long time to figure out whether an
>> author uses row or column vector for transpose.
>>
>> i.e. I often need x.T dot y   which works for 1d and 2d to produce
>> inner product.
>> but the outer product would require most of the time a column vector
>> so it's defined as x dot x.T.
>>
>> I think keeping around explicitly 2d arrays if necessary is less error
>> prone and confusing.
>>
>> But I wouldn't mind a shortcut for atleast_2d   (although more often I
>> need atleast_2dcol to translate formulas)
>>
>
> At least from what I have seen, in all cases in numpy where a 1D array is
> treated as a 2D array, it is always treated as a row vector, the examples
I
> can think of being atleast_2d, hstack, vstack, and dstack. So using this
> convention would be in line with how it is used elsewhere in numpy.

AFAIK, linear algebra works differently, 1-D is special

>>> xx = np.arange(20).reshape(4,5)
>>> yy = np.arange(4)
>>> xx.dot(yy)
Traceback (most recent call last):
  File "<pyshell#52>", line 1, in <module>
    xx.dot(yy)
ValueError: objects are not aligned

>>> yy = np.arange(5)
>>> xx.dot(yy)
array([ 30,  80, 130, 180])
>>> xx.dot(yy[:,None])
array([[ 30],
       [ 80],
       [130],
       [180]])

>>> yy[:4].dot(xx)
array([70, 76, 82, 88, 94])

>>> np.__version__
'1.6.1'

I don't think numpy treats 1d arrays as row vectors. numpy has C-order for
axis preference which coincides in many cases with row vector behavior.

>>> np.concatenate(([[1,2,3]], [4,5,6]))
Traceback (most recent call last):
  File "<pyshell#63>", line 1, in <module>
    np.concatenate(([[1,2,3]], [4,5,6]))
ValueError: arrays must have same number of dimensions

It's not an uncommon exception for me.

Josef

>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20160407/98c28813/attachment.html>