Proposal: loosen the coupling between scipy.sparse and np.matrix
Dear scipy developers, TL;DR: instead of big changes to the scipy.sparse interface to become more array-like, I propose to reduce the exposure of users specifically to np.matrix. Firstly, I'm new to the list, and therefore I apologize in advance if I overlooked relevant previous discussions—I checked the issue tracker, the roadmap, and the last few months of the mailing list. Having found nothing that contradicts my take, I hope my proposal will improve the state of scipy.sparse. --- # Current state of the art For historical reasons, scipy.sparse was designed to mimic the np.matrix interface. However, due to various interface warts, numpy matrices became unpopular over time, and hence less known to users. They are slowly removed from various codebases, and numpy itself does not recommend using matrices (although it doesn't quite issue a deprecation warning yet). I assume that there's an overall community preference to get rid of numpy matrices over time, also from the scipy side. There are two main ways in which scipy.sparse.spmatrix relates to np.matrix: 1. The interface of spmatrix itself is similar to np.matrix. Some examples of this behavior are: - slicing a spmatrix returns another spmatrix even in cases when numpy would return a vector - __mul__ is __matmul__, and not element-wise multiplication - properties .H, .A, etc mimic np.matrix 2. spmatrix produces numpy matrices. Some examples of this behavior are the spmatrix.todense method and additions of ndarray and spmatrix. While changing the first aspect is hard both for reasons of backwards compatibility, and the amount of required changes in the codebase, the second takes less work but it is likely to improve the usability of spmatrix. A natural step would be to deprecate spmatrix.todense (or potentially switch it to return an array). Shortly after I opened an issue [1] proposing to do so, I learned that spmatrix.todense continues confusing new users [2], and even some contributors [3]. I think the current usage pattern capturing these cases is as follows (I certainly went through these steps a bunch). 1. Do something with sparse.spmatrix that produces an np.matrix without realizing it 2. See a bug, exception, or deprecation warning. 3. Figure out what's going wrong, play whack-a-mole and modify the code to produce an array instead. The other part of the interface (replicating np.matrix behavior) is also not without its problems. For example, carbon copying the semantics of matrix.A implements silent conversion of an spmatrix to array on attribute access, and therefore hides computational complexity. # Proposal I recognize that scipy.sparse is a mature and widely used codebase, and backwards compatibility is extremely important. At the same time, I believe that exposing users and developers to np.matrix also has a continued user and maintainer code. Therefore I propose to deprecate and then remove the ability to produce np.matrix from scipy.sparse codebase. I also propose to revisit the parts of the np.matrix interface that were copied, and that can cause troubles, spmatrix.A being a prime example [4]. What do you all think? Best, Anton [1]: https://github.com/scipy/scipy/issues/14494 [2]: https://github.com/scipy/scipy/issues/14131 [3]: https://github.com/scipy/scipy/pull/14488#discussion_r678451919 [4]: https://github.com/scipy/scipy/issues/14503
participants (1)
-
Anton Akhmerov