Mailman 3 basic matrix object - Python-ideas

newer
Improve error message for trying...

basic matrix object

older
Add parse_duration to datetime - a...

Stefano Borini

Aug. 13, 2020

3:17 p.m.

Excuse me if I am out of the loop and this is already available, but I haven't seen it and googling is not exactly easy as numpy introduces considerable noise. With the introduction of the statistics module, the standard library provides basic statistical functions that can be useful in many scenarios, including teaching. The math module has plenty of mathematical functions that are very interesting, but no Matrix object. When newcomers want to have a matrix object, they end up implementing a list of lists (as suggested by the documentation, e.g. see https://docs.python.org/3/tutorial/datastructures.html?highlight=matrix#nest... and https://docs.python.org/3/faq/programming.html?highlight=matrix#how-do-i-cre...) but it's ambiguous (is each entry a row or a column?), and does not enforce types or equal length of each entry. Generally, when a Matrix object is needed, numpy is the point of reference, but numpy requires understanding pip, installing modules, maybe creating a virtual environment and finally understanding numpy. I would propose a simple, straightforward, low performance object to perform trivial operations such as matrix multiplication, transposition, addition, linear problem solving, determinant. The absolute bare minimum for an introductory linear algebra course. The syntax should mimic numpy to train newcomers to the numpy syntax whenever possible, and of course should implement the @ operator. There is already a similar entity in the "array" module, which is a simple version of a numpy array, except that is only one-dimensional. -- Kind regards, Stefano Borini

Show replies by date

David Mertz

August 2020

4:11 p.m.

This is a lot to add to Python itself to poorly reproduce well-tested functionally in a very popular library. There are many Python distributions that come with that extra battery included, just not the one from the PSF. On Thu, Aug 13, 2020, 6:20 PM Stefano Borini <stefano.borini@gmail.com> wrote:

...

Steven D'Aprano

6:15 p.m.

On Thu, Aug 13, 2020 at 11:17:00PM +0100, Stefano Borini wrote:

...

The math module has plenty of mathematical functions that are very interesting, but no Matrix object.

Funny you mention this, I have been working on a Matrix object for precisely the use-case you discuss (secondary school maths), where performance is not critical and the dimensions of the matrix is typically single digits. I, um, got distracted with some over-engineering and then distracted further by work and life, but perhaps this is a good opportunity for me to get it back on track.

...

And numpy also offers a surprising interface that doesn't match matrices. (Well, surprising to those who aren't heavy users of numpy :-) py> A = numpy.array([[1, 2], [3, 4]]) py> A*A array([[ 1, 4], [ 9, 16]]) To get *matrix multiplication* you have to use the `@` multiplication operator from PEP 465: py> A@A array([[ 7, 10], [15, 22]]) https://www.python.org/dev/peps/pep-0465/ But what's really surprising about numpy arrays is broadcasting: py> A = numpy.array([[1, 2], [3, 4]]) py> B = numpy.array([[10, 20]]) py> A + B array([[11, 22], [13, 24]]) I don't know if broadcasting is actually useful or not, but it's not what you want when doing matrix arithmetic at a secondary school level. -- Steven

Matthias Bussonnier

7:06 p.m.

numpy arrays are ... arrays, if you want * to do matmul, use numpy.matrix (which is soft deprecated since now we have @), and will do what you expect with square matrix times vector. broadcasting is the natural extension of array `op` scalar on... arrays. Say you have an array Pressure with 3 coord : longitude , latitude , time, and a typical Mean value M with coord: latitude , longitude Then you can "just" do `Pressure - M`, if actually your Mean is just function of time then you can same, do P - M and it broadcast on the other axes. So yes, extremely useful. If you use things like xarray then it will not match dimension base on their neame, and not on their size, so it's less "magical" and more accurate when some dimensions have the same size. But of course that's for arrays, not matrices. -- M On Thu, 13 Aug 2020 at 18:17, Steven D'Aprano <steve@pearwood.info> wrote:

...

Guido van Rossum

7:22 p.m.

I was going to say that such a matrix module would be better of in PyPI, but then I recalled how the statistics module got created, and I think that the same reasoning from PEP 450 applies here too ( https://www.python.org/dev/peps/pep-0450/#rationale). So I'd say go for it! I think it would even be okay to deviate from the numpy.matrix API (though not needlessly). On Thu, Aug 13, 2020 at 6:20 PM Steven D'Aprano <steve@pearwood.info> wrote:

...

-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Stephen J. Turnbull

9:04 p.m.

Guido van Rossum writes:

...

I disagree that that rationale applies. Let's consider where the statistics module stopped, and why I think that's the right place to stop. Simple statistics on *single* variables, as proposed by PEP 450, are useful in many contexts to summarize data sets. You see them frequently in newpaper and Wikipedia articles, serious blog posts, and even on Twitter. PEP 450 mentions, but does not propose to provide, linear regression (presumably ordinary least squares -- as with median and mode, there are many ways to compute a regression line). The PEP mentions covariance and correlation coefficients once each, and remarks that the API design is unclear. I think that omission was the better part of valor. Even on Twitter, it's hard to abuse the combination of mean and standard deviation (without outright lies about the values, of course). But most uses of correlation and regression are more or less abusive. That's true even in more serious venues (Murray & Herrnstein's "The Bell Curve" comes immediately to mind). Almost all uses of multiple regression outside of highly technical academic publications are abusive. I don't mean to say "keep power tools out of the reach of the #ToddlerInChief and his minions".[1] Rather, I mean to say that most serious languages and packages for these applications (such as R, and closer to home numpy and pandas) provide substantial documentation suggesting *appropriate* use and pointing to the texts on algorithms and caveats. Steven doesn't do that with statistics -- and correctly so, he doesn't need to. None of the calculations he implements are in any way controversial as calculations. To the extent that different users might want slightly different definitions of mode or median, the ones provided are good enough for stdlib purposes. Nor are the interpretations of the results of the various calculations at all controversial.[2] But even a two-dimensional regression y on x is fraught. Should we include the Anscombe[3] data and require a plotting function so users can see what they're getting into? I think Steven would say that's *way* beyond the scope of his package -- and I agree. Let's not go there. At all. Let users who need that stuff use packages that encourage them and help them do it right. I don't find the "teaching high school linear/matrix algebra" use case persuasive. I taught "MBA Statistics for Liberal Arts Majors" for a decade. Writing simple matrix classes was an assignment, and then they used their own classes to implement covariance, correlation, and OLS. I don't think having a "canned" matrix class would have been of benefit to them -- a substantial fraction (10% < x < 50%) did get some idea of what was going on "inside" those calculations by programming them themselves plus a bit of printf debugging, which neither the linear algebra equations nor the sum operator I wrote on the whiteboard did. I will say I wish I had Steven's implementation of sum() at hand back then to show them to give them some idea of the care that numerical accuracy demands. I cannot speak to engineering uses of matrix computations. If someone produces use cases that fit into the list of operations proposed (addition, negation, multiplication, inverse, transposition, scalar multiplication, solving linear equation systems, and determinants, I will concede it's useful and fall back to +/- 0. However, I think the comparison to multivariate statistics is enlightening. You see many two-dimensional tables in public discussions (even on Twitter!) but they are not treated as matrices. Now, it's true that *every* basic matrix calculation (except multiplication by a scalar) requires the same kind of care that statistics.sum exerts, but having provided that, what have you bought? Not a lot, as far as I can see -- applications of matrix algebra are extremely diverse, and many require just as much attention to detail as the basic operations do. In sum, I suspect that simply providing numerically stable algorithms for those computations isn't enough for useful engineering work -- as with multivariate statistics, you're not even halfway to useful and accurate computations, and the diversity is explosive. How to choose? Footnotes: [1] Any fans of cubic regression models for epidemiology? No? OK, then. [2] They can be abused. I did so myself just this morning, to tease a statistically literate friend. But it takes a bit of effort. [3] https://stat.ethz.ch/R-manual/R-patched/library/datasets/html/anscombe.html

Guido van Rossum

9:33 p.m.

That's food for thought. I have to admit that I have forgotten almost everything about linear algebra that I was ever taught -- and I was never taught numerical accuracy concerns in this context, since we were allowed to use only pencil and paper (in high school as well as college), so the problems were always constructed to ensure that correct answers contained nothing more complicated than 0, 0.5, 1 or sqrt(2). At this point I have a hard time reproducing multiplication for two 2x2 matrices (the only thing I remember is that it was the first example of something where AxB != BxA). What gives me hope though is that Steven has been thinking about this somewhat seriously already, and given that he successfully chose what to include or exclude for the statistics module, I trust him to know how much to put into a Matrix class as well. Certainly I trust him to come up with a reasonable strawman whose tires we can all kick. My own strawman would be to limit a Matrix to 2-dimensionality -- I believe that even my college linear algebra introduction (for math majors!) didn't touch upon higher dimensionality, and I doubt that what I learned in high school about the topic went beyond 3x3 (it might not even have treated non-square matrices). In terms of numerical care (that topic to which I never warmed up), which operations from the OP's list need more than statistics._sum() when limited to NxM matrices for single-digit N and M? (He named "matrix multiplication, transposition, addition, linear problem solving, determinant.") On Thu, Aug 13, 2020 at 9:04 PM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

...

Guido van Rossum writes:

...
I was going to say that such a matrix module would be better of in PyPI, but then I recalled how the statistics module got created, and I think that the same reasoning from PEP 450 applies here too (https://www.python.org/dev/peps/pep-0450/#rationale).

So I'd say go for it!

I disagree that that rationale applies. Let's consider where the statistics module stopped, and why I think that's the right place to stop. Simple statistics on *single* variables, as proposed by PEP 450, are useful in many contexts to summarize data sets. You see them frequently in newpaper and Wikipedia articles, serious blog posts, and even on Twitter.

PEP 450 mentions, but does not propose to provide, linear regression (presumably ordinary least squares -- as with median and mode, there are many ways to compute a regression line). The PEP mentions covariance and correlation coefficients once each, and remarks that the API design is unclear. I think that omission was the better part of valor. Even on Twitter, it's hard to abuse the combination of mean and standard deviation (without outright lies about the values, of course). But most uses of correlation and regression are more or less abusive. That's true even in more serious venues (Murray & Herrnstein's "The Bell Curve" comes immediately to mind). Almost all uses of multiple regression outside of highly technical academic publications are abusive.

I don't mean to say "keep power tools out of the reach of the #ToddlerInChief and his minions".[1] Rather, I mean to say that most serious languages and packages for these applications (such as R, and closer to home numpy and pandas) provide substantial documentation suggesting *appropriate* use and pointing to the texts on algorithms and caveats. Steven doesn't do that with statistics -- and correctly so, he doesn't need to. None of the calculations he implements are in any way controversial as calculations. To the extent that different users might want slightly different definitions of mode or median, the ones provided are good enough for stdlib purposes. Nor are the interpretations of the results of the various calculations at all controversial.[2]

But even a two-dimensional regression y on x is fraught. Should we include the Anscombe[3] data and require a plotting function so users can see what they're getting into? I think Steven would say that's *way* beyond the scope of his package -- and I agree. Let's not go there. At all. Let users who need that stuff use packages that encourage them and help them do it right.

I don't find the "teaching high school linear/matrix algebra" use case persuasive. I taught "MBA Statistics for Liberal Arts Majors" for a decade. Writing simple matrix classes was an assignment, and then they used their own classes to implement covariance, correlation, and OLS. I don't think having a "canned" matrix class would have been of benefit to them -- a substantial fraction (10% < x < 50%) did get some idea of what was going on "inside" those calculations by programming them themselves plus a bit of printf debugging, which neither the linear algebra equations nor the sum operator I wrote on the whiteboard did. I will say I wish I had Steven's implementation of sum() at hand back then to show them to give them some idea of the care that numerical accuracy demands.

I cannot speak to engineering uses of matrix computations. If someone produces use cases that fit into the list of operations proposed (addition, negation, multiplication, inverse, transposition, scalar multiplication, solving linear equation systems, and determinants, I will concede it's useful and fall back to +/- 0.

However, I think the comparison to multivariate statistics is enlightening. You see many two-dimensional tables in public discussions (even on Twitter!) but they are not treated as matrices. Now, it's true that *every* basic matrix calculation (except multiplication by a scalar) requires the same kind of care that statistics.sum exerts, but having provided that, what have you bought? Not a lot, as far as I can see -- applications of matrix algebra are extremely diverse, and many require just as much attention to detail as the basic operations do.

In sum, I suspect that simply providing numerically stable algorithms for those computations isn't enough for useful engineering work -- as with multivariate statistics, you're not even halfway to useful and accurate computations, and the diversity is explosive. How to choose?

Footnotes: [1] Any fans of cubic regression models for epidemiology? No? OK, then.

[2] They can be abused. I did so myself just this morning, to tease a statistically literate friend. But it takes a bit of effort.

[3] https://stat.ethz.ch/R-manual/R-patched/library/datasets/html/anscombe.html

-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Steven D'Aprano

12:23 a.m.

On Thu, Aug 13, 2020 at 09:33:46PM -0700, Guido van Rossum wrote:

...

That's food for thought. I have to admit that I have forgotten almost everything about linear algebra that I was ever taught

In Australia, at least, secondary schools don't spend a lot of time teaching matrices for linear algebra. It is, at most, only a portion of the matrix course. The main use for matrices is for (basic) financial problems and networks/graphs. Here is the topic list from one of my students: - Matrix arithmetic - Applications of matrix arithmetic - Binary, permutation, communication and dominance matrices - Solving simultaneous linear equations - Transition matrices Very little of the course is the traditional 1980s style linear algebra that you (Guido) and I would have learned. Two examples of practical questions taught these days: (1) Five students compete against each other in judo. The dominance matrix D shows the winner of each bout ... Rank the students with a one-step dominance score. Calculate the two-step dominance matrix. Determine the matrix T = D + D**2 and use it to rank the players. (This is an example where the matrices are all integer values, starting with just zeroes and ones. Numerical precision doesn't come into it.) (2) A car insurance company finds that 22% of car drivers involved in an accident this year are also expected to be involved in a accident next year; 9% of drivers who are not involved in an accident are expected to be involved in an accident next year. If the company insures 80,000 drivers, how many of them are expected to be involved in an accident during the next five years?

...

What gives me hope though is that Steven has been thinking about this somewhat seriously already, and given that he successfully chose what to include or exclude for the statistics module, I trust him to know how much to put into a Matrix class as well. Certainly I trust him to come up with a reasonable strawman whose tires we can all kick.

Thank you for the vote of confidence :-)

...

My own strawman would be to limit a Matrix to 2-dimensionality -- I believe that even my college linear algebra introduction (for math majors!) didn't touch upon higher dimensionality, and I doubt that what I learned in high school about the topic went beyond 3x3 (it might not even have treated non-square matrices).

Totally agree -- matrices are by definition limited to two dimensional arrays, including the limiting case of 1xN or Nx1 row/column matrices (vectors). Tensors generalise matrices to an arbitrary number of dimensions, 3D and above. https://medium.com/@quantumsteinke/whats-the-difference-between-a-matrix-and... I have never studied tensors and wouldn't know where to even begin a tensor library :-) -- Steven

Stephen J. Turnbull

10:29 a.m.

Guido van Rossum writes:

...

That's food for thought.

Thank you. Let me confirm to the proponents that "food for thought" is all I intended. I know a fair amount about statistics, and almost as much about linear algebra, but nothing about physics or engineering.

...

Certainly I trust [Steven D'Aprano] to come up with a reasonable strawman whose tires we can all kick.

I do, too. It's only fair to give him a preview of (some?) of the arguments against inclusion in the stdlib, that's all. ;-)

...

[W]hich operations from the OP's list need more than statistics._sum() when limited to NxM matrices for single-digit N and M? (He named "matrix multiplication, transposition, addition, linear problem solving, determinant.")

I believe determinant can be efficiently implemented with statistics._sum. Linear problem solving (to which I would add the closely related operation of square matrix inversion) involves the same kind of principle, but I don't think it can be implemented with statistics._sum. The same methods that one would use for solving/ inversion for small M and N will work efficiently for large M and N. And the algorithms are as obvious as statistics._sum (in hindsight). I'm not sure whether matrices should try to implement all the different types that statistics._sum does, although I can imagine it might be pedagogically useful to support Fraction and Decimal. Steve

Stefano Borini

4:11 p.m.

On Fri, 14 Aug 2020 at 05:37, Guido van Rossum <guido@python.org> wrote:

...

My own strawman would be to limit a Matrix to 2-dimensionality -- I believe that even my college linear algebra introduction (for math majors!) didn't touch upon higher dimensionality, and I doubt that what I learned in high school about the topic went beyond 3x3 (it might not even have treated non-square matrices).

Absolutely agree, dimensionality 2 is a must. For two reasons. First, it's easy to understand and implement. Second it actively discourages those who would like to use this matrix for real calculations. The main point is basic and simple: teaching, geometry computation (e.g. on point rotation/translation) etc. Things like that. Anything else is beyond the scope of such a module and should be implemented using numpy. Calculations about determinant, element-wise operations, matrix multiplication, inverse, etc are also basic operations that are likely to come up during either a course or trivial computations for e.g. geometry. I would not add things like QR or Cholesky factorization. -- Kind regards, Stefano Borini

Tobias Kohn

9:10 a.m.

+1 for the idea of a module for matrices and vectors in the stdlib. One aspect that I would like to highlight is that NumPy is not always so easily available. I got feedback from many schools that they were using tablet computers or something like Chrome books and thus basically limited to anything available in the browser. Projekts like Skulpt have done a tremendous deal here to make Python available even under such restricted circumstances. However, porting the entire NumPy library to an "alternative" implementation is a huge project and usually well beyond the possible. A simple module for vectors and matrices, on the other hand, might be readily portable and thus make a big difference! Vectors and matrices are also very versatile tools for a lot of applications. I have actually seen quite a number of programs that start all by defining some sort of vector class (particularly in graphics, as Greg Ewing pointed out). But as a physics teacher, say, you usually simply don't have time to start defining vectors and matrices in your course, but just want to use them. And if Python came out of the box with it, that would in turn help make a point for learning Python at schools. Long story short: I fully support the idea and think there are valid use cases where NumPy is not really an option (for whatever reason).

David Mertz

10:48 a.m.

The more this discussion goes on, the more convinced I am of my initial opinion. A matrix library doesn't belong in the standard library. What is happening in the discussion is that each potential user who comments finds some related capability that is relevant to their particular needs. But the union of all those wishes is something substantially large... Maybe not NumPy large, but heading in that direction. On the other hand, trimming such a thing to a bare minimum would mean that everyone using it would quickly hit limits that make it not really practical. Different limits for different people, but limits certainly. If the purpose is JUST teaching a certain kind of course in linear algebra, that didn't seem like sufficient motivation for standard library, but it does sound like an excellent purpose for a third party library.

Christopher Barker

5:01 p.m.

On Sun, Aug 16, 2020 at 10:50 AM David Mertz <mertz@gnosis.cx> wrote:

...

What is happening in the discussion is that each potential user who comments finds some related capability that is relevant to their particular needs. But the union of all those wishes is something substantially large... Maybe not NumPy large, but heading in that direction.

...

On the other hand, trimming such a thing to a bare minimum would mean

Indeed. that everyone using it would quickly hit limits that make it not really practical. Different limits for different people, but limits certainly. which is why numpy is as large as it is :-) -- and the "scipy stack" even larger. If the purpose is JUST teaching a certain kind of course in linear algebra,

...

that didn't seem like sufficient motivation for standard library, but it does sound like an excellent purpose for a third party library.

I think it goes a bit beyond this -- a matrix / linear algebra system would be useful for other (also pedagogical) uses, like physics classes mentioned. In fact, I see really two packages being asked for in this thread: 1) The OP's Matrix object, and associated classes 2) a pure python and/or stdlib nd-array package -- i.e. numpy-lite I've thought (2) would be a good addition to the stdlib for years -- as long as it is pretty compatible with numpy -- there really are some good reasons for 2 or 3 D arrays that aren't just nested lists. But the fact is, that as David alluded to, folks will hit limits, and then we'll want to expand it, and before you know it, you've got numpy all over again :-) And frankly it's just not that hard to get and install numpy. NOTE: there is desire (3) -- something like numpy but with a different API -- to that, I say: please don't! Final point: (1) and (2) above really are different. General purpose ND arrays for computation, and matrices for linear algebra are NOT the same -- and trying to cram them together does not work out great: MATLAB is a Matrix library that has been extended for general purpose computations -- numpy is a computational array library that has a few matrix features (and has a deprecated Matrix class). I find the numpy approach a lot more useful, and the fact that the numpy Matrix class has been depreciated indicates that I"m not the only one. Anyway, I would like to see a nice linear algebra lib -- but not 'cause I'd use it, only because I find it interesting. Honestly, I'm quite surprised that no one has done it yet -- at least not in a widely used package. -CHB PS: it may be hard to find, but way back when (~15 years ago, maybe?) someone wrote a prototype of a version of numpy (maybe Numeric?) written in pure python. The experiment then was to run it through Psyco and see how well it performed -- it actually did pretty well. It used an array.array as the internal storage. Once upon a time I started building on it to make a pure python numpy clone, for use in some of my code that required numpy, but didn't really need the performance. But it never got to a usable state. There was also an effort to make a numpy clone for PyPy -- though I think they abandoned that in favor of being able to call numpy directly -- turns out that the whole numpy ecosystem depends VERY heavily on C and Foirtran extensions, so you really need it to be compatible at the binary level, not just the Python level. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Stephen J. Turnbull

11:12 p.m.

Christopher Barker writes:

...

Anyway, I would like to see a nice linear algebra lib -- but not 'cause I'd use it, only because I find it interesting.

SymPy. Except that in this conversation, "linear algebra" is likely neither restricted to linearity nor so much algebraic as computational, so SymPy likely won't do. :-/

Oscar Benjamin

8:34 a.m.

On Mon, 17 Aug 2020 at 07:14, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

...

Christopher Barker writes:

...
Anyway, I would like to see a nice linear algebra lib -- but not 'cause I'd use it, only because I find it interesting.

SymPy.

Except that in this conversation, "linear algebra" is likely neither restricted to linearity nor so much algebraic as computational, so SymPy likely won't do. :-/

SymPy is exactly what I would recommend for the OP use case of high school students doing basic linear algebra (although perhaps as maintainer I would say that...). SymPy's matrix class doesn't have the complicated broadcasting rules of numpy's ndarray and uses exact calculations rather than floating point (or limited precision integers). I'm not sure why it would be considered unsuitable for this. Here's a demonstration:

...

...
...
from sympy import Matrix

...

...
...
M = Matrix([[1, 2], [3, 4]])

...

...
...
M Matrix([ [1, 2], [3, 4]])

...

...
...
M * M Matrix([ [ 7, 10], [15, 22]])

...

...
...
3 * M Matrix([ [3, 6], [9, 12]])

...

...
...
M ** -1 Matrix([ [ -2, 1], [3/2, -1/2]])

...

...
...
M.rank() 2

...

...
...
M.det() -2

...

...
...
M.eigenvals() {5/2 - sqrt(33)/2: 1, 5/2 + sqrt(33)/2: 1}

There are no funny broadcasting rules. You can multiply two matrices with * (or @) and if the shapes are compatible in the standard mathematical rules of matrix multiplication then you will get the matrix product. You can also multiply a matrix and a scalar as you would in an equation. No other kind of multiplication is permitted. You can add two matrices iff they have the same shape. Using ** gives matrix powers so M**2 or M**-1 are as you would expect when seeing superscripts in a mathematical equation. If you don't put floats in the matrix then all of the calculations will be exact with no overflow and no rounding error. The only thing to be careful of is Python's division so 1/2 should be sympy's Rational(1, 2). If you enter the above in a jupyter notebook (after calling sympy's init_printing()) then the matrices will all display nicely using latex->Mathjax under the hood. The same also works in a Qt console e.g. you can see a nice mathematical display when using a "sympy console" in the spyder editor. Of course as the matrices get larger then sympy will be noticeably slower than numpy. Even the algorithms for computing the eigenvalues symbolically can breakdown for matrices larger than 4x4 so there are good reasons why numerical libraries are more commonly used in many areas. Simple calculations are still faster than the blink of an eye though and I'm sure some of the high-school students would benefit from being able to do calculations with things like sqrt(2) or symbols like x, y, z at some point. If your high-school students are okay with using Python then I would certainly recommend SymPy for this kind of calculation. On the other hand adding something like this to the stdlib is a very slippery slope. SymPy's matrices package has 30k lines of code and counting even though it leverages much of the rest of the sympy codebase to do the calculations it does. There are a lot of different features you can add once you have a matrix class. Oscar

Stephen J. Turnbull

7:03 p.m.

Oscar Benjamin writes:

...

On Mon, 17 Aug 2020 at 07:14, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

...

...
SymPy.

Except that in this conversation, "linear algebra" is likely neither restricted to linearity nor so much algebraic as computational, so SymPy likely won't do. :-/

...

SymPy is exactly what I would recommend for the OP use case of high school students doing basic linear algebra (although perhaps as maintainer I would say that...). SymPy's matrix class doesn't have the complicated broadcasting rules of numpy's ndarray and uses exact calculations rather than floating point (or limited precision integers). I'm not sure why it would be considered unsuitable for this.

I didn't say it was unsuitable for high school students. It is, that's why I suggested it. My point was that most of the folks in this conversation don't really want linear algebra, they want fewer comprehensions and 'for' statements and maybe numerical stability (although Ricky's boring friction example doesn't need any special care for stability). Take broadcasting. It is really convenient, for example in computing central moments: vector = [ ... ] # known non-empty mean = sum(vector) / len(vector) variance = ((vector - mean) @ (vector - mean)) / len(vector) I'm not sure you want to allow that in a linear algebra class (although it's a mild and straightforward generalization of scalar multiplication, so maybe a little deemphasis on the "linear"?)

...

Here's a demonstration:

Very nice! <img src="che.png" alt="clapping hands emoji">

...

If your high-school students are okay with using Python then I would certainly recommend SymPy for this kind of calculation.

Sure. I just think they should write their own programs for four functions (3 binary operations and inversion, or 2 binary operations and both inversions) first. Once they've got that it's not obvious to me that at that level they need SymPy. (Although it's easy to imagine that in a class of junior high students the ones getting into it will run into numerical instability issues pretty quick, and of course anybody can mess up on conformance.)

...

On the other hand adding something like this to the stdlib is a very slippery slope. SymPy's matrices package has 30k lines of code and counting even though it leverages much of the rest of the sympy codebase to do the calculations it does. There are a lot of different features you can add once you have a matrix class.

Thank you for pointing that out! Steve

Ricky Teachey

2:54 a.m.

On Fri, Aug 14, 2020, 12:05 AM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

...

I cannot speak to engineering uses of matrix computations. If someone

produces use cases that fit into the list of operations proposed

...

(addition, negation, multiplication, inverse, transposition, scalar multiplication, solving linear equation systems, and determinants, I will concede it's useful and fall back to +/- 0.

I'll try to speak to that below. On Fri, Aug 14, 2020, 1:48 AM Marco Sulla <Marco.Sulla.Python@gmail.com> wrote: On Fri, 14 Aug 2020 at 03:16, Steven D'Aprano <steve@pearwood.info> wrote:

...

Funny you mention this, I have been working on a Matrix object for precisely the use-case you discuss (secondary school maths), where performance is not critical and the dimensions of the matrix is typically single digits.

This is really good. I think that numpy is a practical project, but I feel it hard to understand for people that come from Python and not from Matlab. Maybe the API for such a module can be released initially as a provisional API, as at the pathlib and asyncio beginnings? I agree this could be great. I would characterize the steps to get to matrix arithmetic in python as currently something like: 1. Learn python, with typical experience of: - I learned it last night! Everything is so simple! 2. Learn numpy, with the typical experience of: - Wait, is this array thing really what I need? I am looking for matrix. - This is kind of intimidating and after a few hours I barely feel confident I can add matrices together correctly... also, why did this tutorial cause me to spend half a day figuring out virtual environments? And this other tutorial doesn't even mention them? And this other one says never to install things without using them? Did I hurt my computer by putting numpy on it? It's tough. However, engineers aren't usually doing a lot of linear algebra. Linear algebra isn't even a required course in ABET accredited university engineer programs. I walked out of my linear algebra course the first day, saying to myself, "nope, I have written my last proof." What is used is mostly NOT even matrix arithmetic of the high school variety (although it's occasionally useful to solve a system of equations). I'll give a longish example below. The other thing I would say is there are some nice third party tools out there that assume numpy is being used (I use pint a lot for units, and of course there's notebooks like jupyter). Might be a good thing to see if the API for basic matrix objects can be made to work well with some of these existing tools. Here's my example code-- I utilize a lot of this sort of thing. Skip down this large block of code to see the lines that exemplify the reason I'm using a numpy array in the first place. # Nominal skin friction capacity, Pn, of drilled shafts/piers ## depths below grade to 20 ft in 1 foot increments z = np.linspace(1, 20, num=20) ## create array containing unit weight of soil at each depth ### unit weight log ### for looking up unit weights (pcf) based on depth (ft) ### of form {depth: unit_weight} log = {3: 120, 9: 118, 12: 122, 15: 106, 18: 102, 21: 110} ### fill in gaps in unit weight log #### assume first layer extends to ground surface log[0] = log[next(iter(log))] #### assume unit weight change occurs after mid point between each layer log_depths = sorted(list(log.keys())) log_filled = {} for depth, next_depth in zip(log_depths[:-1], log_depths[1:]): # layer boundary log_filled[depth] = log[depth] # layer change occurs after midpoint of boundary log_filled[ (depth+next_depth)/2 ] = log[depth] #### don't forget bottom of last layer log_filled[next_depth] = log[next_depth] #### final per foot unit weights, γ γ_per_ft = np.array([]) for depth in z: log_depth = next(d for d in log_filled if d >= depth) γ_per_ft = np.append(uwt_per_ft, log_filled[log_depth]) ## calculate per foot stress ### unit weight of water (pcf) γ_w = 62.4 ### effective unit weight given high water table γ_prime = γ_per_foot - γ_w q_s = γ_prime * z ## Nominal capacity ### shaft diameter (ft) D_shaft = 6 ### shaft circumference C_shaft = D_shaft * np.pi # friction coefficient friction_angle = 17 # degrees μ_s = np.tan( friction_angle * np.pi/180 ) # nominal vertical capacity P_n_per_foot = μ_s * q_s * C_shaft P_n = np.sum(P_n_per_foot) This operation: q_s = γ_prime * z And these operations: P_n_per_foot = μ_s * q_s * C_shaft P_n = np.sum(P_n_per_foot) ...which are just scalar or element-wise multiplication and summation, are pretty much the extent of the kinds of mathematical things an engineer is doing with numpy arrays- at least the majority of the time. We just generally are not doing linear algebra. That being said, I do think it would be nice to have a "matrix" type in the std lib that supported both element-wise operations like this and basic matrix arithmetic.

...

fribl＠fastmail.fm

7:03 a.m.

In my role as an engineering professor and instructor, I find linear algebra to be essential. To my knowledge, almost any book on engineering mathematics contains a large section on linear algebra. numpy is an extremely useful package, but is overwhelming for some novices (e.g. dealing with the complexity of various broadcasting rules and indexing methods). So, if the python standard library had a matrix type and associated capabilities to do basic operations, it would help me as an instructor. -Brad

Greg Ewing

8:32 a.m.

Another class of folks who might find something like this useful is those playing around with computer graphics using pygame, pyglet, etc. where coordinate transformations are used a lot. Bringing in numpy for that can seem like massive overkill for a tiny game. -- Greg

Stephen J. Turnbull

10:32 a.m.

Ricky Teachey writes:

...

I'll try to speak to [what would engineers use a matrix library for] below.

I don't get it. Among other things, the object uwt_per_ft isn't defined. I think it's a typo for the one-dimensional array γ_per_foot, but not sure. I don't see any advantage to numpy or a matrix library in this code, unless you're desperate for speed (see my last version where z is a Python list).

...

This operation:

q_s = γ_prime * z

And these operations:

P_n_per_foot = μ_s * q_s * C_shaft P_n = np.sum(P_n_per_foot)

Are you saying that the operations above would be clearer in a four-function calculator version of a matrix library? I don't see how the above is less clear than q_s = γ_prime * z P_n_per_foot = μ_s * q_s * C_shaft P_n = math.fsum(P_n_per_foot) or either is more clear than z = list(range(1, 21)) q_s = [γ_prime * x for x in z] P_n_per_foot = [μ_s * x * C_shaft for x in q_s] P_n = math.fsum(P_n_per_foot) although they might be faster for large enough values of "20".

...

That being said, I do think it would be nice to have a "matrix" type in the std lib that supported both element-wise operations like this and basic matrix arithmetic.

If the answer to the question above is "no", then nice for what? If "yes", do you think the stdlib 'matrix' module should have a 'linspace' class/function? Or that z = matrix.Matrix(columns=1, data=range(1, 21)) is in general an adequate substitute?

Ricky Teachey

11:37 a.m.

On Fri, Aug 14, 2020 at 1:32 PM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

...

Ricky Teachey writes:

...
I'll try to speak to [what would engineers use a matrix library for] below.

I don't get it. Among other things, the object uwt_per_ft isn't defined. I think it's a typo for the one-dimensional array γ_per_foot, but not sure. I don't see any advantage to numpy or a matrix library in this code, unless you're desperate for speed (see my last version where z is a Python list).

Sorry, yes, the code I wrote had a couple typos. I'll put the correct version (tested) at the very bottom for clarity.

...

This operation:

...
q_s = γ_prime * z

And these operations:

P_n_per_foot = μ_s * q_s * C_shaft P_n = np.sum(P_n_per_foot)

Are you saying that the operations above would be clearer in a four-function calculator version of a matrix library? I don't see how the above is less clear than

q_s = γ_prime * z P_n_per_foot = μ_s * q_s * C_shaft P_n = math.fsum(P_n_per_foot)

or either is more clear than

z = list(range(1, 21)) q_s = [γ_prime * x for x in z] P_n_per_foot = [μ_s * x * C_shaft for x in q_s] P_n = math.fsum(P_n_per_foot)

although they might be faster for large enough values of "20".

Well keep in mind that generally an engineer submits calculations-- not code-- to be reviewed. This leaves me with a few options: 1. Write code the works and then submit a "cleaned up" version of the code, with parts that don't look like math omitted 2. Write the code, and then write MORE code that prints out the clean, looks-like-math version 3. Use a notebook (in practice this is not much different than #1) I often end up doing #1 because it is actually easier and faster. Hiding the portion of code that looks too much like programming (and not enough like equations) takes some time, but if I do not do this it will inevitably lead to a 40 minute phone call from an official from the NY Department of Transportation, delays, and more hours billed to my client. One easy way to do this code hiding is to simply copy and paste the *result* of the calculation and delete the code that created it, kind of like this: print(f"γ_per_ft = ({', '.join(f'{v:.0f}' for v in γ_per_ft)}) pcf") Anyway typing the mathematical equations, and not a list comprehension, is far clearer for the engineers who will be reading my calculations, and who are not programmers. Otherwise I end up typing everything twice, just like I end up having to do using Excel: the code that actually does the calculation, and a "pretty version" to display to the reviewer what the calculation is doing. Something like the equation P_n = μ_s * q_s * C_shaft is directly out of the AASHTO LRFD and NY DoT highway code: # this looks like gobbledegook to, for example, New York State DoT official reviewing my calculations # so i will hide this part of the code, somehow P_n_per_foot = [μ_s * x * C_shaft for x in q_s] # this is what the official is expecting to see P_n_per_foot = μ_s * q_s * C_shaft Furthermore, the same NYDoT reviewer might come back and say: *"Rejected: calculation does not take into account change in friction angle with depth."* The official would be saying that friction (μ_s) is not a scalar, it should be another array variable, because the soils apply different friction along the shaft depth. So now, because the code expects μ_s to be a scalar, I have to go change not just the value of μ_s but also the code to turn it into a comprehension that will work for multiple lists, instead of one list: μ_s = [ <some list of effective friction coefficients> ] P_n_per_foot = [μ_s * x * C_shaft for f,x in zip(μ_s, q_s)] If I could write the code like this, I could just change μ_s to an array, and everything works fine as-is, AND I would not have to take steps to hide the code, I can just print it out with no changes, which would be wonderful: P_n_per_foot = μ_s * q_s * C_shaft

...

...
That being said, I do think it would be nice to have a "matrix" type in the std lib that supported both element-wise operations like this and basic matrix arithmetic.

If the answer to the question above is "no", then nice for what? If "yes", do you think the stdlib 'matrix' module should have a 'linspace' class/function? Or that

z = matrix.Matrix(columns=1, data=range(1, 21))

is in general an adequate substitute?

The answer is yes. It's a good question on the linspace() function -- I don't think so. Something like this would do fine: ## depths below grade to 20 ft in 1 foot increments z = matrix(range(1, 21)) However I would certainly hide this part of the code, and include the copy and pasted result from this: print(f"γ_per_ft = ({', '.join(f'{v:.0f}' for v in z)}) pcf") ------------ Sidebar: it would be pretty fantastic if a native matrix library could display values using f-strings like this:

...

...
...
x = 1.1 print(f"{x=:.0f} pcf") 1 pcf γ_per_ft = matrix( [120, 122, 118, 117] ) print(f"{γ_per_ft=:.0f} pcf") (120, 122, 118, 117) pcf

The idea here is that python knows the datatype of γ_per_ft is float, so it can format all the members of the array *one at a time*. End of sidebar. ------------ Here is the (now tested) code snippet. I'm sorry it was incorrect before: import numpy as np # Nominal skin friction capacity, Pn, of drilled shafts/piers ## depths below grade to 20 ft in 1 foot increments z = np.linspace(1, 20, num=20) ## create array containing unit weight of soil at each depth ### unit weight log ### for looking up unit weights (pcf) based on depth (ft) ### of form {depth: unit_weight} log = {3: 120, 9: 118, 12: 122, 15: 106, 18: 102, 21: 110} ### fill in gaps in unit weight log #### assume first layer extends to ground surface log[0] = log[next(iter(log))] #### assume unit weight change occurs after mid point between each layer log_depths = sorted(list(log.keys())) log_filled = {} for depth, next_depth in zip(log_depths[:-1], log_depths[1:]): # layer boundary log_filled[depth] = log[depth] # layer change occurs after midpoint of boundary log_filled[ (depth+next_depth)/2 ] = log[depth] #### don't forget bottom of last layer log_filled[next_depth] = log[next_depth] #### final per foot unit weights, γ γ_per_ft = np.array([]) for depth in z: log_depth = next(d for d in log_filled if d >= depth) γ_per_ft = np.append(γ_per_ft, log_filled[log_depth]) ## calculate per foot stress ### unit weight of water (pcf) γ_w = 62.4 ### effective unit weight given high water table γ_prime = γ_per_ft - γ_w q_s = γ_prime * z ## Nominal capacity ### shaft diameter (ft) D_shaft = 6 ### shaft circumference C_shaft = D_shaft * np.pi # friction coefficient friction_angle = 17 # degrees μ_s = np.tan( friction_angle * np.pi/180 ) # nominal vertical capacity P_n_per_foot = μ_s * q_s * C_shaft P_n = np.sum(P_n_per_foot) print(f"{P_n/2000:0.0f} tonf") Output:

...

...
...
30 tonf

--- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

Stephen J. Turnbull

8:20 a.m.

Ricky Teachey writes:

...

Well keep in mind that generally an engineer submits calculations-- not code-- to be reviewed. This leaves me with a few options:

[...]

...

Hiding the portion of code that looks too much like programming (and not enough like equations) takes some time, but if I do not do this it will inevitably lead to a 40 minute phone call [...].

Ah, I misinterpreted your statement that engineers don't much use linear algebra. You *do* use it for *presentation*, but it's not very useful in the programming that gets your results.

...

Furthermore, the same NYDoT reviewer might come back and say:

*"Rejected: calculation does not take into account change in friction angle with depth."*

The official would be saying that friction (μ_s) is not a scalar, it should be another array variable, because the soils apply different friction along the shaft depth.

I don't think Steven's proposed matrix class can substitute for numpy in general. It will work for your case because '*' can be both scalar multiplication and matrix multiplication. That is, the scalar μ_s (float, I presume?) doesn't know how to self.__mul__(q_s), but q_s.__rmul__(μ_s) works fine. Then tmp.__mul__(C_shaft) (C_shaft a scalar or a vector of the same length) works. But if you have *two* variables, and the reviewer claims they have different (scalar) coefficients, you're going to need numpy-style broadcasting. Also, you're going to run into the issue that Steven is probably going to use '@' for matrix multiplication for numpy compatibility as requested by several people, while * will be elementwise multiplication. Finally you lose the convenience function like linspace. Overall, it may still be worthwhile, and Steven may give you the operator you want (whether it's '*' or '@' for matrix multiplication).

...

The answer is yes. It's a good question on the linspace() function -- I don't think so. [...] However I would certainly hide this part of the code, and include the copy and pasted result from this:

print(f"γ_per_ft = ({', '.join(f'{v:.0f}' for v in z)}) pcf")

Aaarrrrrggggggh! What we do for money and to please reviewers ....

...

Sidebar: it would be pretty fantastic if a native matrix library could display values using f-strings like this:

...
...
...
x = 1.1 print(f"{x=:.0f} pcf") 1 pcf γ_per_ft = matrix( [120, 122, 118, 117] ) print(f"{γ_per_ft=:.0f} pcf") (120, 122, 118, 117) pcf

The idea here is that python knows the datatype of γ_per_ft is float, so it can format all the members of the array *one at a time*.

I think all you need to do is provide an appropriate __format__ method on matrix objects. That method will have access to the format spec (ie, everything after the colon), and if there isn't a parser you can just import, I'm sure there are plenty you can copy. At least for this case you can just apply it to the individual elements. It's not obvious to me that elementwise is the "right" way to implement matrix.__format__ (there may be "matrix-level" formatting that takes precedence, for example how to treat rows wider than the page width), but it's quite plausible. Of course you can also derive a simple subclass with elementwise __format__.

Ricky Teachey

9:26 a.m.

On Sat, Aug 15, 2020, 11:20 AM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

...

Ricky Teachey writes:

...
Well keep in mind that generally an engineer submits calculations-- not code-- to be reviewed. This leaves me with a few options:

[...]

...
Hiding the portion of code that looks too much like programming (and not enough like equations) takes some time, but if I do not do this it will inevitably lead to a 40 minute phone call [...].

Ah, I misinterpreted your statement that engineers don't much use linear algebra. You *do* use it for *presentation*, but it's not very useful in the programming that gets your results.

Yes I'd say this is generally true most of the time. There are certainly instances where I've needed to used matrices to solve a system of equations in an automated way. But most of time it's simply not needed, and if I do need to do it, I can always use Mathcad. General comment: a lot of my "I wish" thoughts regarding python syntax have to do with the idea that code could look more like handwritten mathematical calculations. But I recognize that a balance has to be maintained.

...

...
Furthermore, the same NYDoT reviewer might come back and say:

*"Rejected: calculation does not take into account change in friction angle with depth."*

The official would be saying that friction (μ_s) is not a scalar, it should be another array variable, because the soils apply different friction along the shaft depth.

I don't think Steven's proposed matrix class can substitute for numpy in general. It will work for your case because '*' can be both scalar multiplication and matrix multiplication. That is, the scalar μ_s (float, I presume?) doesn't know how to self.__mul__(q_s), but q_s.__rmul__(μ_s) works fine. Then tmp.__mul__(C_shaft) (C_shaft a scalar or a vector of the same length) works. But if you have *two* variables, and the reviewer claims they have different (scalar) coefficients, you're going to need numpy-style broadcasting.

I actually started trying to write a proof of concept for this last night. I'm sure it will be far far less usable and correct then Steven's but if I get it working I'll share it just for comparison of ideas.

...

...
...
...
γ_per_ft = matrix( [120, 122, 118, 117] )

...
print(f"{γ_per_ft=:.0f} pcf") (120, 122, 118, 117) pcf

The idea here is that python knows the datatype of γ_per_ft is float, so it can format all the members of the array *one at a time*.

I think all you need to do is provide an appropriate __format__ method on matrix objects. That method will have access to the format spec (ie, everything after the colon), and if there isn't a parser you can just import, I'm sure there are plenty you can copy. At least for this case you can just apply it to the individual elements.

It's not obvious to me that elementwise is the "right" way to implement matrix.__format__ (there may be "matrix-level" formatting that takes precedence, for example how to treat rows wider than the page width), but it's quite plausible. Of course you can also derive a simple subclass with elementwise __format__.

I'm going to try to solve this with my toy implementation, too. If it get it working I'll be sure to pass it along for comment.

...

Greg Ewing

5:06 p.m.

On 16/08/20 4:26 am, Ricky Teachey wrote:

...

If we're going to have a class that supports matrix multiplication, I think we should at least have the ability to invert a square matrix. And once you have the machinery for that, it's only a small step to use it for solving systems of equations, so we might as well provide that too. It would cost very little extra, and might be useful. -- Greg

Stephen J. Turnbull

11:18 p.m.

Greg Ewing writes:

...

Nobody disagrees that a "toy" (sorry-not-sorry) or "educational" matrix class should have those. But those use cases are quite well served by modules on PyPI, and arguably are far more educational in most cases if the student implements their own. It's getting the fiddly stuff right (numerical stability and accuracy, catching edge cases in algorithms) so that you can use it with confidence for work that has consequences if you get it wrong that (to me, anyway) would justify inclusion in the stdlib. That's assuming that such a class would be featureful enough, which is why I ask "how would engineers use a 'four-function' matrix class?" Ricky's post was a fair-enough answer I think, but I also agree with the posters whose feeling is that almost everybody would want something more, and we'd end up with a kudzu emulation of NumPy in the stdlib. But that's just a feeling. (Enough for me to still be -1, but also willing to continue exploring the subject.) Steve

Christopher Barker

11:49 p.m.

On Sun, Aug 16, 2020 at 11:20 PM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

...

And for that, you want a robust, well supported library, like, say LAPACK, or EIGEN, or ...., you know like what scipy uses ;-) The python standard library really isn't the place for that kind of code. However, a pure python implementation could, optionally, reach out to an external lib for specific functionality like that though. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Christopher Barker

10:18 p.m.

Well, I was trained and an engineer, but now call myself an oceanographer, but in any case, I don't need to submit my calculations for review to anyone. Though I do need to present algorithms / calculations / methods in written form that others can understand -- so a similar problem. But frankly, Python is simply not the right tool for the job of presenting calculations that look iike math -- Ricky Teachey mentioned MathCAD -- which is indeed a tool for that job. Let's not try to squeeze Python into that role. So I don't think that trying to make Python look more like math is a reason to add a Matrix library to Python. I'm also very confused about why folks don't want to use numpy for this stuff? numpy is the right tool for the job, is robust and well supported, and virtually anyone can pip install it -- yes it took a couple decades to get to that point, but we really are there now. But: numpy has a Matrix object, which is pretty much deprecated, and there is a good reason for that. In a nutshell, the reason is that the only thing that Matrix gave you for "real work" was a nice operator for matrix multiplication, and what it lost you was compatibility with the rest of numpy -- it was really pretty painful to switch between Matrix and regular numpy arrays. -- when we finally realized that matmult really was the only important thing -- then adding the @ operator was an excellent solution -- and here we are. However, that does NOT mean that a Matrix class isn't useful, just that it's not useful for doing calculations within teh numpy ecosystem, and when you are doing 'real work", the rest of the numpy ecosystem is very important. I highly encourage anyone interested in this topic to search the archives of the numpy lists to find the many long discussions of this issue -- it will be very informative. But I'll try to hit a couple highlights, from my memory: 1) The primary motivator for a "proper" matrix object is pedagogical: being able to write code that is more or less a direct translation of what one sees in a linear algebra textbook. Why is that? Because in production code there may be one or two lines that do a classic linear algebra calculation, and hundreds of lines that do other things with arrays -- and a numpy-style ndarray is better suited for those other hundred lines. And when you add @ -- now it's maybe one, rather than two out of hundreds :-). (I saw a note earlier in this thread about how horrible broadcasting is: I'm sorry, they are "just wrong" -- broadcasting is really, really, useful -- far more useful than one or two "proper" linear algebra operations.) 2) The other reason the numpy.Matrix object didn't really work out is that it didn't quite go far enough: I highly suggest that if one writes a linear algebra package that seeks to model what we learn in textbooks, that you have not just a Matrix, but also a Scalar (duh, but MATLAB, for instance didn't really make that distinction), and a RowVector and ColumnVector objects -- you could do what numpy Matrix and MATLAB did, and have all "things" be 2D, and a 1xN matrix is a row vector, and an (Nx1) Matrix is a column vector -- but I don't recommend it. This was a source of problems for the numpy Matrix: numpy has a 1d array (i.e. vector) but no way to distinguish Row vs Column -- it gets awkward. There may be other examples I can't think of now, but in short: if you are going to do it, really go for it, make it all about linear algebra, and not try to be a general purpose array computation system -- we have numpy for that, and we know that mixing and matching those two does not work out well. Another suggestion: to leave the door open to better performance (with C or Cython), I'd consider using an array.array for internal storage, and support the buffer protocol for interchange with numpy, and image processing libs, and .... If you want to support "non-standard" numeric types, like Decimal and Fraction, you could have a generic number type and use a list to store those. On the other hand, maybe performance simply isn't the point here. So: does such a thing belong in the standard library? I don't think so. But then again, I didn't think statistics did either. Though this really is less useful that the statistics module -- as said above it's really about teaching more than anything. Yes, a generally useful array computation lib would be nice -- but if we went that way, a "numpy-lite" would be the thing to do -- not a Matrix object. But in any case, it sure as heck could start out as a PyPi package to hammer out the wrinkles. BTW: There MUST be a Python Matrix lib out there already! Here's one that's not too old. https://pypi.org/project/matrix7/ I have no idea if it's any good but looking for prior art certainly makes sense. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Ricky Teachey

1:10 p.m.

On Sun, Aug 16, 2020 at 1:19 AM Christopher Barker <pythonchb@gmail.com> wrote: Well, I was trained and an engineer, but now call myself an oceanographer, but in any case, I don't need to submit my calculations for review to anyone. Though I do need to present algorithms / calculations / methods in written form that others can understand -- so a similar problem. But frankly, Python is simply not the right tool for the job of presenting calculations that look iike math -- Ricky Teachey mentioned MathCAD -- which is indeed a tool for that job. Let's not try to squeeze Python into that role. So I don't think that trying to make Python look more like math is a reason to add a Matrix library to Python. If you are right, I find it sad. I love python to pieces. It's so very close to being what I want it to be in this aspect (displaying "just math, man"). I will say that at least python is much much closer to doing the job of presenting calculations than the most popular tool actually used for this purpose in the industry: that's right, it's Excel. Oh the devastating ennui. And python is also free. I'm the only person I know with a copy of mathcad on their machine and as a result I can't share any of my calculation tools with clients and colleagues-- all the tools I write that will ever be used (rather than just read) by others are currently written in Excel. I've been creating workflows to turn python into that tool for quite a while now. Still not there. The closest I have come is what I described in my previous message: write code that works, copy and paste it into a separate text file and replace the bits that don't look enough like math. I'm also very confused about why folks don't want to use numpy for this stuff? numpy is the right tool for the job, is robust and well supported, and virtually anyone can pip install it -- yes it took a couple decades to get to that point, but we really are there now. It's certainly what I use for all those reasons! But there is still the occasional downside. Example: I just downloaded 3.9 rc1 yesterday and tried pip installing numpy, and building the wheel for the virtual env took forever (and btw pandas would not install at all). And there are still situations where putting numpy on a machine is difficult or impossible (Tobias Kohn described one). A native, non performance critical library that doesn't have to be installed would just be really cool. But: numpy has a Matrix object, which is pretty much deprecated, and there is a good reason for that. In a nutshell, the reason is that the only thing that Matrix gave you for "real work" was a nice operator for matrix multiplication, and what it lost you was compatibility with the rest of numpy -- it was really pretty painful to switch between Matrix and regular numpy arrays. -- when we finally realized that matmult really was the only important thing -- then adding the @ operator was an excellent solution -- and here we are. However, that does NOT mean that a Matrix class isn't useful, just that it's not useful for doing calculations within teh numpy ecosystem, and when you are doing 'real work", the rest of the numpy ecosystem is very important. I highly encourage anyone interested in this topic to search the archives of the numpy lists to find the many long discussions of this issue -- it will be very informative. But I'll try to hit a couple highlights, from my memory: 1) The primary motivator for a "proper" matrix object is pedagogical: being able to write code that is more or less a direct translation of what one sees in a linear algebra textbook. Why is that? Because in production code there may be one or two lines that do a classic linear algebra calculation, and hundreds of lines that do other things with arrays -- and a numpy-style ndarray is better suited for those other hundred lines. And when you add @ -- now it's maybe one, rather than two out of hundreds :-). This was really helpful for me to understand why numpy works the way it does. Thanks. (I saw a note earlier in this thread about how horrible broadcasting is: I'm sorry, they are "just wrong" -- broadcasting is really, really, useful -- far more useful than one or two "proper" linear algebra operations.) Totally agree. Broadcasting is not something I use a lot personally but I do think it is critical to the basic utility of numpy for many others. May not be critical to bring to a native python matrix class. Just not sure. Another suggestion: to leave the door open to better performance (with C or Cython), I'd consider using an array.array for internal storage, and support the buffer protocol for interchange with numpy, and image processing libs, and .... If you want to support "non-standard" numeric types, like Decimal and Fraction, you could have a generic number type and use a list to store those. On the other hand, maybe performance simply isn't the point here. I don't think it is. Maybe down the road. So: does such a thing belong in the standard library? I don't think so. But then again, I didn't think statistics did either. Though this really is less useful that the statistics module -- as said above it's really about teaching more than anything. I'm far from the right person to say whether it belongs or not but if it was there, I am pretty sure I'd use it a lot for creation of calculations. Even if it didn't have broadcasting (which I only need sometimes). Just because of the convenience of not having to install it. Yes, a generally useful array computation lib would be nice -- but if we went that way, a "numpy-lite" would be the thing to do -- not a Matrix object. But in any case, it sure as heck could start out as a PyPi package to hammer out the wrinkles. I agree a numpy lite is probably better, so long as it implements matmul to make the educators happy. BTW: There MUST be a Python Matrix lib out there already! Here's one that's not too old. https://pypi.org/project/matrix7/ I have no idea if it's any good but looking for prior art certainly makes sense. -CHB I spent last night throwing together my own numpy lite thing. I'll probably share it later this week once I polish it a bit (if I can get over the fear of actually submitting something intended for others to look at). --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

Marco Sulla

10:43 p.m.

On Fri, 14 Aug 2020 at 03:16, Steven D'Aprano <steve@pearwood.info> wrote:

...

David Mertz

August 2020

4:11 p.m.

...

Steven D'Aprano

6:15 p.m.

On Thu, Aug 13, 2020 at 11:17:00PM +0100, Stefano Borini wrote:

...

The math module has plenty of mathematical functions that are very interesting, but no Matrix object.

...

Matthias Bussonnier

7:06 p.m.

...

Guido van Rossum

7:22 p.m.

...

-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Stephen J. Turnbull

9:04 p.m.

Guido van Rossum writes:

...

Guido van Rossum

9:33 p.m.

...

Guido van Rossum writes:

...
I was going to say that such a matrix module would be better of in PyPI, but then I recalled how the statistics module got created, and I think that the same reasoning from PEP 450 applies here too (https://www.python.org/dev/peps/pep-0450/#rationale).

So I'd say go for it!

I disagree that that rationale applies. Let's consider where the statistics module stopped, and why I think that's the right place to stop. Simple statistics on *single* variables, as proposed by PEP 450, are useful in many contexts to summarize data sets. You see them frequently in newpaper and Wikipedia articles, serious blog posts, and even on Twitter.

PEP 450 mentions, but does not propose to provide, linear regression (presumably ordinary least squares -- as with median and mode, there are many ways to compute a regression line). The PEP mentions covariance and correlation coefficients once each, and remarks that the API design is unclear. I think that omission was the better part of valor. Even on Twitter, it's hard to abuse the combination of mean and standard deviation (without outright lies about the values, of course). But most uses of correlation and regression are more or less abusive. That's true even in more serious venues (Murray & Herrnstein's "The Bell Curve" comes immediately to mind). Almost all uses of multiple regression outside of highly technical academic publications are abusive.

I don't mean to say "keep power tools out of the reach of the #ToddlerInChief and his minions".[1] Rather, I mean to say that most serious languages and packages for these applications (such as R, and closer to home numpy and pandas) provide substantial documentation suggesting *appropriate* use and pointing to the texts on algorithms and caveats. Steven doesn't do that with statistics -- and correctly so, he doesn't need to. None of the calculations he implements are in any way controversial as calculations. To the extent that different users might want slightly different definitions of mode or median, the ones provided are good enough for stdlib purposes. Nor are the interpretations of the results of the various calculations at all controversial.[2]

But even a two-dimensional regression y on x is fraught. Should we include the Anscombe[3] data and require a plotting function so users can see what they're getting into? I think Steven would say that's *way* beyond the scope of his package -- and I agree. Let's not go there. At all. Let users who need that stuff use packages that encourage them and help them do it right.

I don't find the "teaching high school linear/matrix algebra" use case persuasive. I taught "MBA Statistics for Liberal Arts Majors" for a decade. Writing simple matrix classes was an assignment, and then they used their own classes to implement covariance, correlation, and OLS. I don't think having a "canned" matrix class would have been of benefit to them -- a substantial fraction (10% < x < 50%) did get some idea of what was going on "inside" those calculations by programming them themselves plus a bit of printf debugging, which neither the linear algebra equations nor the sum operator I wrote on the whiteboard did. I will say I wish I had Steven's implementation of sum() at hand back then to show them to give them some idea of the care that numerical accuracy demands.

I cannot speak to engineering uses of matrix computations. If someone produces use cases that fit into the list of operations proposed (addition, negation, multiplication, inverse, transposition, scalar multiplication, solving linear equation systems, and determinants, I will concede it's useful and fall back to +/- 0.

However, I think the comparison to multivariate statistics is enlightening. You see many two-dimensional tables in public discussions (even on Twitter!) but they are not treated as matrices. Now, it's true that *every* basic matrix calculation (except multiplication by a scalar) requires the same kind of care that statistics.sum exerts, but having provided that, what have you bought? Not a lot, as far as I can see -- applications of matrix algebra are extremely diverse, and many require just as much attention to detail as the basic operations do.

In sum, I suspect that simply providing numerically stable algorithms for those computations isn't enough for useful engineering work -- as with multivariate statistics, you're not even halfway to useful and accurate computations, and the diversity is explosive. How to choose?

Footnotes: [1] Any fans of cubic regression models for epidemiology? No? OK, then.

[2] They can be abused. I did so myself just this morning, to tease a statistically literate friend. But it takes a bit of effort.

[3] https://stat.ethz.ch/R-manual/R-patched/library/datasets/html/anscombe.html

-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Steven D'Aprano

August 2020

7:23 a.m.

On Thu, Aug 13, 2020 at 09:33:46PM -0700, Guido van Rossum wrote:

...

That's food for thought. I have to admit that I have forgotten almost everything about linear algebra that I was ever taught

...

What gives me hope though is that Steven has been thinking about this somewhat seriously already, and given that he successfully chose what to include or exclude for the statistics module, I trust him to know how much to put into a Matrix class as well. Certainly I trust him to come up with a reasonable strawman whose tires we can all kick.

Thank you for the vote of confidence :-)

...

My own strawman would be to limit a Matrix to 2-dimensionality -- I believe that even my college linear algebra introduction (for math majors!) didn't touch upon higher dimensionality, and I doubt that what I learned in high school about the topic went beyond 3x3 (it might not even have treated non-square matrices).

Stephen J. Turnbull

5:29 p.m.

Guido van Rossum writes:

...

That's food for thought.

...

Certainly I trust [Steven D'Aprano] to come up with a reasonable strawman whose tires we can all kick.

I do, too. It's only fair to give him a preview of (some?) of the arguments against inclusion in the stdlib, that's all. ;-)

...

[W]hich operations from the OP's list need more than statistics._sum() when limited to NxM matrices for single-digit N and M? (He named "matrix multiplication, transposition, addition, linear problem solving, determinant.")

Stefano Borini

11:11 p.m.

On Fri, 14 Aug 2020 at 05:37, Guido van Rossum <guido@python.org> wrote:

...

My own strawman would be to limit a Matrix to 2-dimensionality -- I believe that even my college linear algebra introduction (for math majors!) didn't touch upon higher dimensionality, and I doubt that what I learned in high school about the topic went beyond 3x3 (it might not even have treated non-square matrices).

Tobias Kohn

4:10 p.m.

David Mertz

5:48 p.m.

Christopher Barker

12:01 a.m.

On Sun, Aug 16, 2020 at 10:50 AM David Mertz <mertz@gnosis.cx> wrote:

...

What is happening in the discussion is that each potential user who comments finds some related capability that is relevant to their particular needs. But the union of all those wishes is something substantially large... Maybe not NumPy large, but heading in that direction.

...

On the other hand, trimming such a thing to a bare minimum would mean

...

that didn't seem like sufficient motivation for standard library, but it does sound like an excellent purpose for a third party library.

Stephen J. Turnbull

August 2020

6:12 a.m.

Christopher Barker writes:

...

Anyway, I would like to see a nice linear algebra lib -- but not 'cause I'd use it, only because I find it interesting.

SymPy. Except that in this conversation, "linear algebra" is likely neither restricted to linearity nor so much algebraic as computational, so SymPy likely won't do. :-/

Oscar Benjamin

3:34 p.m.

On Mon, 17 Aug 2020 at 07:14, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

...

Christopher Barker writes:

...
Anyway, I would like to see a nice linear algebra lib -- but not 'cause I'd use it, only because I find it interesting.

SymPy.

Except that in this conversation, "linear algebra" is likely neither restricted to linearity nor so much algebraic as computational, so SymPy likely won't do. :-/

...

...
...
from sympy import Matrix

...

...
...
M = Matrix([[1, 2], [3, 4]])

...

...
...
M Matrix([ [1, 2], [3, 4]])

...

...
...
M * M Matrix([ [ 7, 10], [15, 22]])

...

...
...
3 * M Matrix([ [3, 6], [9, 12]])

...

...
...
M ** -1 Matrix([ [ -2, 1], [3/2, -1/2]])

...

...
...
M.rank() 2

...

...
...
M.det() -2

...

...
...
M.eigenvals() {5/2 - sqrt(33)/2: 1, 5/2 + sqrt(33)/2: 1}

Stephen J. Turnbull

2:03 a.m.

Oscar Benjamin writes:

...

On Mon, 17 Aug 2020 at 07:14, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

...

...
SymPy.

Except that in this conversation, "linear algebra" is likely neither restricted to linearity nor so much algebraic as computational, so SymPy likely won't do. :-/

...

SymPy is exactly what I would recommend for the OP use case of high school students doing basic linear algebra (although perhaps as maintainer I would say that...). SymPy's matrix class doesn't have the complicated broadcasting rules of numpy's ndarray and uses exact calculations rather than floating point (or limited precision integers). I'm not sure why it would be considered unsuitable for this.

...

Here's a demonstration:

Very nice! <img src="che.png" alt="clapping hands emoji">

...

If your high-school students are okay with using Python then I would certainly recommend SymPy for this kind of calculation.

...

On the other hand adding something like this to the stdlib is a very slippery slope. SymPy's matrices package has 30k lines of code and counting even though it leverages much of the rest of the sympy codebase to do the calculations it does. There are a lot of different features you can add once you have a matrix class.

Thank you for pointing that out! Steve

Ricky Teachey

9:54 a.m.

On Fri, Aug 14, 2020, 12:05 AM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

...

I cannot speak to engineering uses of matrix computations. If someone

produces use cases that fit into the list of operations proposed

...

(addition, negation, multiplication, inverse, transposition, scalar multiplication, solving linear equation systems, and determinants, I will concede it's useful and fall back to +/- 0.

I'll try to speak to that below. On Fri, Aug 14, 2020, 1:48 AM Marco Sulla <Marco.Sulla.Python@gmail.com> wrote: On Fri, 14 Aug 2020 at 03:16, Steven D'Aprano <steve@pearwood.info> wrote:

...

Funny you mention this, I have been working on a Matrix object for precisely the use-case you discuss (secondary school maths), where performance is not critical and the dimensions of the matrix is typically single digits.

...

fribl＠fastmail.fm

2:03 p.m.

Greg Ewing

3:32 p.m.

Stephen J. Turnbull

August 2020

10:32 a.m.

Ricky Teachey writes:

...

I'll try to speak to [what would engineers use a matrix library for] below.

...

This operation:

q_s = γ_prime * z

And these operations:

P_n_per_foot = μ_s * q_s * C_shaft P_n = np.sum(P_n_per_foot)

...

That being said, I do think it would be nice to have a "matrix" type in the std lib that supported both element-wise operations like this and basic matrix arithmetic.

Ricky Teachey

11:37 a.m.

On Fri, Aug 14, 2020 at 1:32 PM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

...

Ricky Teachey writes:

...
I'll try to speak to [what would engineers use a matrix library for] below.

I don't get it. Among other things, the object uwt_per_ft isn't defined. I think it's a typo for the one-dimensional array γ_per_foot, but not sure. I don't see any advantage to numpy or a matrix library in this code, unless you're desperate for speed (see my last version where z is a Python list).

Sorry, yes, the code I wrote had a couple typos. I'll put the correct version (tested) at the very bottom for clarity.

...

This operation:

...
q_s = γ_prime * z

And these operations:

P_n_per_foot = μ_s * q_s * C_shaft P_n = np.sum(P_n_per_foot)

Are you saying that the operations above would be clearer in a four-function calculator version of a matrix library? I don't see how the above is less clear than

q_s = γ_prime * z P_n_per_foot = μ_s * q_s * C_shaft P_n = math.fsum(P_n_per_foot)

or either is more clear than

z = list(range(1, 21)) q_s = [γ_prime * x for x in z] P_n_per_foot = [μ_s * x * C_shaft for x in q_s] P_n = math.fsum(P_n_per_foot)

although they might be faster for large enough values of "20".

...

...
That being said, I do think it would be nice to have a "matrix" type in the std lib that supported both element-wise operations like this and basic matrix arithmetic.

If the answer to the question above is "no", then nice for what? If "yes", do you think the stdlib 'matrix' module should have a 'linspace' class/function? Or that

z = matrix.Matrix(columns=1, data=range(1, 21))

is in general an adequate substitute?

...

...
...
x = 1.1 print(f"{x=:.0f} pcf") 1 pcf γ_per_ft = matrix( [120, 122, 118, 117] ) print(f"{γ_per_ft=:.0f} pcf") (120, 122, 118, 117) pcf

...

...
...
30 tonf

--- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

Stephen J. Turnbull

8:20 a.m.

Ricky Teachey writes:

...

Well keep in mind that generally an engineer submits calculations-- not code-- to be reviewed. This leaves me with a few options:

[...]

...

Hiding the portion of code that looks too much like programming (and not enough like equations) takes some time, but if I do not do this it will inevitably lead to a 40 minute phone call [...].

Ah, I misinterpreted your statement that engineers don't much use linear algebra. You *do* use it for *presentation*, but it's not very useful in the programming that gets your results.

...

Furthermore, the same NYDoT reviewer might come back and say:

*"Rejected: calculation does not take into account change in friction angle with depth."*

The official would be saying that friction (μ_s) is not a scalar, it should be another array variable, because the soils apply different friction along the shaft depth.

...

The answer is yes. It's a good question on the linspace() function -- I don't think so. [...] However I would certainly hide this part of the code, and include the copy and pasted result from this:

print(f"γ_per_ft = ({', '.join(f'{v:.0f}' for v in z)}) pcf")

Aaarrrrrggggggh! What we do for money and to please reviewers ....

...

Sidebar: it would be pretty fantastic if a native matrix library could display values using f-strings like this:

...
...
...
x = 1.1 print(f"{x=:.0f} pcf") 1 pcf γ_per_ft = matrix( [120, 122, 118, 117] ) print(f"{γ_per_ft=:.0f} pcf") (120, 122, 118, 117) pcf

The idea here is that python knows the datatype of γ_per_ft is float, so it can format all the members of the array *one at a time*.

Ricky Teachey

9:26 a.m.

On Sat, Aug 15, 2020, 11:20 AM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

...

Ricky Teachey writes:

...
Well keep in mind that generally an engineer submits calculations-- not code-- to be reviewed. This leaves me with a few options:

[...]

...
Hiding the portion of code that looks too much like programming (and not enough like equations) takes some time, but if I do not do this it will inevitably lead to a 40 minute phone call [...].

Ah, I misinterpreted your statement that engineers don't much use linear algebra. You *do* use it for *presentation*, but it's not very useful in the programming that gets your results.

...

...
Furthermore, the same NYDoT reviewer might come back and say:

*"Rejected: calculation does not take into account change in friction angle with depth."*

The official would be saying that friction (μ_s) is not a scalar, it should be another array variable, because the soils apply different friction along the shaft depth.

I don't think Steven's proposed matrix class can substitute for numpy in general. It will work for your case because '*' can be both scalar multiplication and matrix multiplication. That is, the scalar μ_s (float, I presume?) doesn't know how to self.__mul__(q_s), but q_s.__rmul__(μ_s) works fine. Then tmp.__mul__(C_shaft) (C_shaft a scalar or a vector of the same length) works. But if you have *two* variables, and the reviewer claims they have different (scalar) coefficients, you're going to need numpy-style broadcasting.

...

...
...
...
γ_per_ft = matrix( [120, 122, 118, 117] )

...
print(f"{γ_per_ft=:.0f} pcf") (120, 122, 118, 117) pcf

The idea here is that python knows the datatype of γ_per_ft is float, so it can format all the members of the array *one at a time*.

I think all you need to do is provide an appropriate __format__ method on matrix objects. That method will have access to the format spec (ie, everything after the colon), and if there isn't a parser you can just import, I'm sure there are plenty you can copy. At least for this case you can just apply it to the individual elements.

It's not obvious to me that elementwise is the "right" way to implement matrix.__format__ (there may be "matrix-level" formatting that takes precedence, for example how to treat rows wider than the page width), but it's quite plausible. Of course you can also derive a simple subclass with elementwise __format__.

I'm going to try to solve this with my toy implementation, too. If it get it working I'll be sure to pass it along for comment.

...