Out of Core Sparse Matrices
Hi, I work for Brightcloud and part of my work required me to write Out of Core Sparse Matrices. I was thinking of submitting these to Scipy as it currently has Sparse Matrices, but not out of core. I was wondering if this code would be a desired addition to SciPy. Also, currently it uses the Python Sqlite3 library. Is it okay to use the Sqlite3 package? Could I also get documentation about writing test cases? I notice that you use NOSE. It there any specification about integrating my test cases into SciPy. I would only need a simple modification to the existing sparse test cases. Thank you, Aidan Macdonald aidan@brightcloud.com aidan.plenert.macdonald@gmail.com
Hi Aidan, You should talk to Matthew Rocklin, author of dask. It's a very promising project implementing out of core arrays compatible with the scientific Python stack: http://dask.pydata.org It currently doesn't do sparse matrices, but that could be a pretty straightforward extension (possibly building on your work): https://github.com/ContinuumIO/dask/issues/174 Cheers, Stephan On Wed, May 27, 2015 at 5:48 PM, Aidan Macdonald <aidan@brightcloud.com> wrote:
Hi,
I work for Brightcloud and part of my work required me to write Out of Core Sparse Matrices. I was thinking of submitting these to Scipy as it currently has Sparse Matrices, but not out of core.
I was wondering if this code would be a desired addition to SciPy. Also, currently it uses the Python Sqlite3 library. Is it okay to use the Sqlite3 package?
Could I also get documentation about writing test cases? I notice that you use NOSE. It there any specification about integrating my test cases into SciPy. I would only need a simple modification to the existing sparse test cases.
Thank you,
Aidan Macdonald aidan@brightcloud.com aidan.plenert.macdonald@gmail.com
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
Hi Aidan, On Thu, May 28, 2015 at 2:48 AM, Aidan Macdonald <aidan@brightcloud.com> wrote:
Hi,
I work for Brightcloud and part of my work required me to write Out of Core Sparse Matrices. I was thinking of submitting these to Scipy as it currently has Sparse Matrices, but not out of core.
That's beyond the current scope of Scipy, so we'd have to seriously think about it. Is the API built on or similar to scipy.sparse? Do you have your code already somewhere public, so we can have a look at it.
I was wondering if this code would be a desired addition to SciPy. Also, currently it uses the Python Sqlite3 library. Is it okay to use the Sqlite3 package?
Anything in the Python stdlib is OK to use in Scipy from the point of view of dependencies. So it depends on performance and other technical aspects like maintainability/scalability only.
Could I also get documentation about writing test cases? I notice that you use NOSE. It there any specification about integrating my test cases into SciPy. I would only need a simple modification to the existing sparse test cases.
The testing guidelines are at https://github.com/numpy/numpy/blob/master/doc/TESTS.rst.txt. Cheers, Ralf
On Wed, May 27, 2015 at 5:48 PM, Aidan Macdonald <aidan@brightcloud.com> wrote:
Hi,
I work for Brightcloud and part of my work required me to write Out of Core Sparse Matrices. I was thinking of submitting these to Scipy as it currently has Sparse Matrices, but not out of core.
I was wondering if this code would be a desired addition to SciPy. Also, currently it uses the Python Sqlite3 library. Is it okay to use the Sqlite3 package?
Hi Aidan, I think we'd be able to give you better advice/suggestions if you could give us a pointer to the code and/or docs, to get a sense of what kind of general approach, public API, dependencies, etc. that you're talking about? -n -- Nathaniel J. Smith -- http://vorpus.org
Hi, I push one piece of the code here <https://github.com/aidan-plenert-macdonald/scipy/tree/master/scipy/dsparse>. It is the PYX file. Stephan Hoyer recommended that I push this into he Dask project. I looked at their code and it looks like it is more of what my work is. I think I will talk with him. My company is looking at building a good distributed computing framework for big data machine learning purposes. Most of the old code is in C++, but I am porting it over into Python for better maintainability. As seen in the code, I simply use the SciPy existing source for the dok_matrix (easier than rewriting one) and provide an out of core dictionary using Sqlite. I am looking into not using Sqlite and doing a sort of memmap interface as that is what our C++ would do, but I am unsure of the speed/complexity/maintainability benefit. At the end of the day, all the SciPy sparse matrix tests should work with minimal changes (adding file names). It is PYX because I was compiling with Cython. There is currently minimal speed gain from compilation, but I was going to go through and optimize later. Thank you, Aidan Macdonald 805 418 0174 aidan@brightcloud.com aidan.plenert.macdonald@gmail.com On Wed, May 27, 2015 at 11:25 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Wed, May 27, 2015 at 5:48 PM, Aidan Macdonald <aidan@brightcloud.com> wrote:
Hi,
I work for Brightcloud and part of my work required me to write Out of
Core Sparse Matrices. I was thinking of submitting these to Scipy as it currently has Sparse Matrices, but not out of core.
I was wondering if this code would be a desired addition to SciPy. Also,
currently it uses the Python Sqlite3 library. Is it okay to use the Sqlite3 package?
Hi Aidan,
I think we'd be able to give you better advice/suggestions if you could give us a pointer to the code and/or docs, to get a sense of what kind of general approach, public API, dependencies, etc. that you're talking about?
-n
-- Nathaniel J. Smith -- http://vorpus.org _______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
On Thu, May 28, 2015 at 4:07 PM, Aidan Macdonald <aidan@brightcloud.com> wrote:
Hi,
I push one piece of the code here <https://github.com/aidan-plenert-macdonald/scipy/tree/master/scipy/dsparse>. It is the PYX file. Stephan Hoyer recommended that I push this into he Dask project. I looked at their code and it looks like it is more of what my work is. I think I will talk with him.
That sounds good, fits better there than in scipy.sparse it looks like. Ralf
My company is looking at building a good distributed computing framework for big data machine learning purposes. Most of the old code is in C++, but I am porting it over into Python for better maintainability.
As seen in the code, I simply use the SciPy existing source for the dok_matrix (easier than rewriting one) and provide an out of core dictionary using Sqlite. I am looking into not using Sqlite and doing a sort of memmap interface as that is what our C++ would do, but I am unsure of the speed/complexity/maintainability benefit.
At the end of the day, all the SciPy sparse matrix tests should work with minimal changes (adding file names). It is PYX because I was compiling with Cython. There is currently minimal speed gain from compilation, but I was going to go through and optimize later.
Thank you,
Aidan Macdonald 805 418 0174 aidan@brightcloud.com aidan.plenert.macdonald@gmail.com
On Wed, May 27, 2015 at 11:25 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Wed, May 27, 2015 at 5:48 PM, Aidan Macdonald <aidan@brightcloud.com> wrote:
Hi,
I work for Brightcloud and part of my work required me to write Out of
Core Sparse Matrices. I was thinking of submitting these to Scipy as it currently has Sparse Matrices, but not out of core.
I was wondering if this code would be a desired addition to SciPy.
Also, currently it uses the Python Sqlite3 library. Is it okay to use the Sqlite3 package?
Hi Aidan,
I think we'd be able to give you better advice/suggestions if you could give us a pointer to the code and/or docs, to get a sense of what kind of general approach, public API, dependencies, etc. that you're talking about?
-n
-- Nathaniel J. Smith -- http://vorpus.org _______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
participants (4)
-
Aidan Macdonald -
Nathaniel Smith -
Ralf Gommers -
Stephan Hoyer