scipy.sparse: add save and load functions for sparse matrices

Hallo, I would like to propose a new save and load functionality for sparse matrices in SciPy. So far, the scipy.io.savemat/loadmat functions allow to save and load sparse matrices in MATLAB file format (version 4 and 5). However, this has some serious drawbacks. Big (sparse) matrices are not storable in a mat file (version 4 and 5) since maximal 2^31 bytes per variable are supported. Besides sparse matrices are stored in a mat file always in csc matrix format. Thus, the original matrix format is not preserved. If another matrix format is used, the format has to be converted from the original format to csc before saving and back to the original format after loading. For large matrices this can take a lot of time. In addition, the indices must be sorted in a mat file. Which can take a lot of additional time. Since the sparse matrices are always stored in csc format, the advantages of other matrix formats regarding disk consumption can not be exploited. For example, some suitable block matrices can be stored with much less disk consumption in bsr matrix format as in csc matrix format. I propose to store directly the data arrays of the sparse matrics together with the matrix format in one file using NumPys savez and savez_compressed functions. The reconstruction while loading is then possible without much effort. This can be done easily for the (csc, csr, bsr, dia and coo) formats. (The remaining dok and lil formats should only be used for construction sparse matrices anyway and than be converted to another matrix format.) This would allow to store big sparse matrices and to benefit from the advantages of the different matrix formats. A pull request (for the csc, csr and bsr matrix formats) is here: https://github.com/scipy/scipy/pull/6394 Best regards, Joscha Reimer
participants (1)
-
Joscha Reimer