[SciPy-Dev] Sketching-based Matrix Computations

Jordi Montes jomsdev at gmail.com
Wed Mar 22 02:07:05 EDT 2017


*# I started this discussion on GitHub
<https://github.com/scipy/scipy/issues/7122>. This email is a digested
version of the conversation to check what people think about this on the
mailing list.*


TL;DR: *I think having an implementation for some randomize matrix
algorithms would be great. I would like to discuss if the community finds
it useful too.*


Hello,

I am Jordi. I have been working on a xdata
<http://www.darpa.mil/program/xdata> open source project for the last year
called libSkylark <https://xdata-skylark.github.io/libskylark/>. The
library is suitable for general statistical data analysis and optimization
applications, but it is heavily focused on distributed systems.

*Randomized matrix algorithms* have been a hot topic in research the last
years. Recent developments have shown their utility in large-scale machine
learning and statistical data analysis applications.

Although many ML applications would take advantage of the implementation of
these algorithms the scope of them goes far beyond. I believe it would be a
good idea integrate some of them in Scipy and I would like to know the
opinion of others to check if it would be worth it.

To my eyes, Randomized Numerical Linear Algebra is big/useful enough to has
its submodule. However, this is something that the community has to decide.

Meanwhile, to demonstrate that these methods can be useful I would
implement them under, scipy.linalg where the scipy linear algebra functions
live. The idea is to bring value since the first moment. I will implement
Blendenpik <http://epubs.siam.org/doi/10.1137/090767911>, a least-squares
solver   based on these techniques that outperforms LAPACK by significant
factors and scales significantly better than any QR-based solver.

The proposal is:

   -

   Implement CWT
   <http://delivery.acm.org/10.1145/2490000/2488620/p81-clarkson.pdf?ip=198.4.83.52&id=2488620&acc=ACTIVE%20SERVICE&key=90259622823F2C32%2E66F8B2C43167A3DD%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&CFID=736713469&CFTOKEN=50269727&__acm__=1489010918_f1abc3d5c4650844f4503dc1a27484ad>
   . CWT lets us create a matrix *S* such as *SA* is shorter than *A
   but conservating some properties*.
   -

   Compute *QR = SA* using scipy.
   -

   From here, there are two potential features:
   -

      Implement Blendenpik <http://epubs.siam.org/doi/10.1137/090767911> for
      solving Linear Square problems faster without losing any accuracy.
      -

      Compute the squared row norms to obtain leverage scores.

Would be better to start with Blendenpik, because it solves a more common
problem by far.

I am thinking about doing it for sparse matrices first because CWT can take
advantage of it. However, I wonder if the QR method on script takes
advantage of sparsity too.

I have already set up the development environment so I could start as soon
as we decide what to do. As I said, I am open to discussion, so anyone who
could be interested is welcome.

Thank you,

Jordi.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20170321/463e63cd/attachment.html>


More information about the SciPy-Dev mailing list