Fast versions of scipy.io.mmread/write

Hi, I'm the author of fast_matrix_market, a package that reads and writes Matrix Market files: https://github.com/alugowski/fast_matrix_market/tree/main/python The Python binding is API compatible with scipy.io.mmio, but about 25x faster. A few other handy improvements include 64-bit indices (only if necessary), direct csc/csr writes with no coo intermediary, longdouble. It passes the SciPy mmio test suite. Would there be any interest in somehow integrating this library into SciPy? The speed difference really does make a big difference for large files. The package is written in C++17 and pybind11. Threading is with a simple thread pool based on c++11 threads, but can be changed. A significant part of the speed increase is parsing with std::from_chars instead of strto*, which alone contributes a massive improvement.

On Wed, Apr 12, 2023 at 9:57 AM <alugowski@gmail.com> wrote:
Hi,
I'm the author of fast_matrix_market, a package that reads and writes Matrix Market files: https://github.com/alugowski/fast_matrix_market/tree/main/python
The Python binding is API compatible with scipy.io.mmio, but about 25x faster. A few other handy improvements include 64-bit indices (only if necessary), direct csc/csr writes with no coo intermediary, longdouble. It passes the SciPy mmio test suite.
Would there be any interest in somehow integrating this library into SciPy?
The speed difference really does make a big difference for large files.
The package is written in C++17 and pybind11. Threading is with a simple thread pool based on c++11 threads, but can be changed. A significant part of the speed increase is parsing with std::from_chars instead of strto*, which alone contributes a massive improvement.
Thanks for this package and proposal Adam! I had a quick look at your package, and it looks good to me. This kind of upstreaming of code, when there's a clear performance benefit and the code is maintained, seems nice to me. Just to make sure: are you planning to continue maintaining this code? Either within SciPy only, or also as a separate package and keeping the two in sync? The C++17 should be fine as far as I can tell - our http://scipy.github.io/devdocs/dev/toolchain.html has an upgrade to C++17 marked for 2022 already, so as long as things build with what MSVC provides (the usual limiting factor), that should be good. Cheers, Ralf

Great! Ralf Gommers wrote:
On Wed, Apr 12, 2023 at 9:57 AM alugowski@gmail.com wrote: Just to make sure: are you planning to continue maintaining this code? Either within SciPy only, or also as a separate package and keeping the two in sync?
Yes I plan to keep maintaining it. I do support C++ bindings as well (Eigen, GraphBLAS, etc) so it would likely have to be both inside SciPy and separate and kept in sync.
The C++17 should be fine as far as I can tell - our http://scipy.github.io/devdocs/dev/toolchain.html has an upgrade to C++17 marked for 2022 already, so as long as things build with what MSVC provides (the usual limiting factor), that should be good. Cheers, Ralf
I already went through the MSVC pain to build Windows wheels using cibuildwheel. So that shouldn't be an issue. -Adam

Great! Ralf Gommers wrote:
Hi, I'm the author of fast_matrix_market, a package that reads and writes Matrix Market files: https://github.com/alugowski/fast_matrix_market/tree/main/python The Python binding is API compatible with scipy.io.mmio, but about 25x faster. A few other handy improvements include 64-bit indices (only if necessary), direct csc/csr writes with no coo intermediary, longdouble. It passes the SciPy mmio test suite. Would there be any interest in somehow integrating this library into SciPy? The speed difference really does make a big difference for large files. The package is written in C++17 and pybind11. Threading is with a simple thread pool based on c++11 threads, but can be changed. A significant part of the speed increase is parsing with std::from_chars instead of strto*, which alone contributes a massive improvement. Thanks for this package and proposal Adam! I had a quick look at your
On Wed, Apr 12, 2023 at 9:57 AM alugowski@gmail.com wrote: package, and it looks good to me. This kind of upstreaming of code, when there's a clear performance benefit and the code is maintained, seems nice to me. Just to make sure: are you planning to continue maintaining this code? Either within SciPy only, or also as a separate package and keeping the two in sync?
Yes, I'm seeing increasing interest in the package so I'm going to keep maintaining it. I do have a lot of use of the C++ bindings, so it makes sense to keep it as a separate package in addition to within SciPy.
The C++17 should be fine as far as I can tell - our http://scipy.github.io/devdocs/dev/toolchain.html has an upgrade to C++17 marked for 2022 already, so as long as things build with what MSVC provides (the usual limiting factor), that should be good. Cheers, Ralf
I've gone through the MSVC pains already to build Windows wheels with cibuildwheel, so that shouldn't be an issue. -Adam

On Fri, Apr 14, 2023 at 9:38 AM <alugowski@gmail.com> wrote:
Great!
Ralf Gommers wrote:
Hi, I'm the author of fast_matrix_market, a package that reads and writes Matrix Market files: https://github.com/alugowski/fast_matrix_market/tree/main/python The Python binding is API compatible with scipy.io.mmio, but about 25x faster. A few other handy improvements include 64-bit indices (only if necessary), direct csc/csr writes with no coo intermediary, longdouble. It passes the SciPy mmio test suite. Would there be any interest in somehow integrating this library into SciPy? The speed difference really does make a big difference for large files. The package is written in C++17 and pybind11. Threading is with a simple thread pool based on c++11 threads, but can be changed. A significant part of the speed increase is parsing with std::from_chars instead of strto*, which alone contributes a massive improvement. Thanks for this package and proposal Adam! I had a quick look at your
On Wed, Apr 12, 2023 at 9:57 AM alugowski@gmail.com wrote: package, and it looks good to me. This kind of upstreaming of code, when there's a clear performance benefit and the code is maintained, seems nice to me. Just to make sure: are you planning to continue maintaining this code? Either within SciPy only, or also as a separate package and keeping the two in sync?
Yes, I'm seeing increasing interest in the package so I'm going to keep maintaining it. I do have a lot of use of the C++ bindings, so it makes sense to keep it as a separate package in addition to within SciPy.
The C++17 should be fine as far as I can tell - our http://scipy.github.io/devdocs/dev/toolchain.html has an upgrade to C++17 marked for 2022 already, so as long as things build with what MSVC provides (the usual limiting factor), that should be good. Cheers, Ralf
I've gone through the MSVC pains already to build Windows wheels with cibuildwheel, so that shouldn't be an issue.
That all sounds good, thanks Adam. I think the next steps are then to proceed with your proposal integration and open a PR. Here is what I suggest: - add your code in a new private directory `scipy/io/_fast_matrix_market/src/` with a README explaining where the code comes from next to the `src` - integrate it in the SciPy build system in `scipy/io/meson.build - you have to (unfortunately) also still deal with the setup.py-based build in one way or the other. We're going to throw it out soon, so you can either ignore it and keep the slow `_mmio.py` implementation for that, or integrate the C++ code in `scipy/io/setup.py`. The latter is a little nicer, but in case it's tricky to get to work then keeping the slow code is also okay. The only reason we will still keep it around for the 1.11.0 release (probably) is conda-forge on Windows, so that's a limited number of users who'd get the slow code. Cheers, Ralf

Hi Ralf, On Fri, Apr 14, 2023, at 08:57, Ralf Gommers wrote:
- you have to (unfortunately) also still deal with the setup.py-based build in one way or the other. We're going to throw it out soon, so you can either ignore it and keep the slow `_mmio.py` implementation for that, or integrate the C++ code in `scipy/io/setup.py`. The latter is a little nicer, but in case it's tricky to get to work then keeping the slow code is also okay. The only reason we will still keep it around for the 1.11.0 release (probably) is conda-forge on Windows, so that's a limited number of users who'd get the slow code.
I'd be curious to learn more about the build issues on conda-forge; is there a related issue? skimage just did a cross-compiled conda-forge release using the new meson machinery, but of course it's a significantly simpler setup than SciPy. Stéfan

On Fri, Apr 14, 2023 at 5:56 PM Stefan van der Walt <stefanv@berkeley.edu> wrote:
Hi Ralf,
On Fri, Apr 14, 2023, at 08:57, Ralf Gommers wrote:
- you have to (unfortunately) also still deal with the setup.py-based build in one way or the other. We're going to throw it out soon, so you can either ignore it and keep the slow `_mmio.py` implementation for that, or integrate the C++ code in `scipy/io/setup.py`. The latter is a little nicer, but in case it's tricky to get to work then keeping the slow code is also okay. The only reason we will still keep it around for the 1.11.0 release (probably) is conda-forge on Windows, so that's a limited number of users who'd get the slow code.
I'd be curious to learn more about the build issues on conda-forge; is there a related issue? skimage just did a cross-compiled conda-forge release using the new meson machinery, but of course it's a significantly simpler setup than SciPy.
Oh that's the usual "there is no suitable Fortran compiler on Windows". See https://github.com/conda-forge/scipy-feedstock/issues/213 for details. We have a custom Mingw-w64 based toolchain for our own wheels, but that doesn't translate too well to any other packaging ecosystem where the compiler itself is distributed as a package and things have to be built in as uniform a way as possible. Cheers, Ralf

I opened a PR doing just this: https://github.com/scipy/scipy/pull/18631 The PR is for the meson build only, mostly because that's the process that the wonderful docs cover. If folks are happy with it and the setup.py build is still necessary I'm happy to work on that.

On Tue, Apr 11, 2023, at 17:28, alugowski@gmail.com wrote:
The Python binding is API compatible with scipy.io.mmio, but about 25x faster. A few other handy improvements include 64-bit indices (only if necessary), direct csc/csr writes with no coo intermediary, longdouble. It passes the SciPy mmio test suite.
Would there be any interest in somehow integrating this library into SciPy?
That looks like a good upgrade without any obvious downsides; thanks for the proposal! I'm curious also to see how memory consumption compares during load. Stéfan

Stefan van der Walt wrote:
On Tue, Apr 11, 2023, at 17:28, alugowski@gmail.com wrote:
The Python binding is API compatible with scipy.io.mmio, but about 25x faster. A few other handy improvements include 64-bit indices (only if necessary), direct csc/csr writes with no coo intermediary, longdouble. It passes the SciPy mmio test suite. Would there be any interest in somehow integrating this library into SciPy? That looks like a good upgrade without any obvious downsides; thanks for the proposal! I'm curious also to see how memory consumption compares during load. Stéfan
I take the same approach as the current SciPy mmio routines, allocate the final data structure then write values directly there. The main difference is that I effectively have a larger I/O buffer. I fread() large chunks (default 1MB works well, but configurable) then parse the chunks in parallel. I keep a queue of ready chunks to make sure the parser threads don't have to wait for I/O. -Adam

On Fri, Apr 14, 2023 at 1:33 AM <alugowski@gmail.com> wrote:
What is the best way to proceed?
It seems like various scipy maintainers are in agreement that including your library is a good idea, so now the ball is in your court to open a PR (following the steps Ralf suggested). If you get stuck on this or need assistance, please reach out and we can help!

Thanks for the support everyone, we now have a PR: https://github.com/scipy/scipy/pull/18631 I included a memory benchmark for those wondering about memory usage. The measured peak RSS is the same as the current Python version, except it's lower when writing coordinate matrices. This is because _mmio copies the values array, FMM does not. Feedback welcome!

On Tue, Jul 11, 2023 at 5:20 AM <alugowski@gmail.com> wrote:
Thanks for the support everyone, we now have a PR: https://github.com/scipy/scipy/pull/18631
I included a memory benchmark for those wondering about memory usage. The measured peak RSS is the same as the current Python version, except it's lower when writing coordinate matrices. This is because _mmio copies the values array, FMM does not.
Feedback welcome!
Thanks Adam, this looks pretty good! To everyone else: this is a fairly sizeable amount of C++ code which adds a little bit to our binary size (probably ~2%) and build time (probably ~8%), but the performance gains, extra features and better maintained code are probably worth it. I'm still reviewing and would like to get build time impact a little lower still; if anyone else is interested please weigh in though. Cheers, Ralf
participants (4)
-
alugowski@gmail.com
-
CJ Carey
-
Ralf Gommers
-
Stefan van der Walt