
Hi, I've been looking a bit at making sparse matrices work with 64-bit indices: https://github.com/pv/scipy-work/commits/ticket/1307 The motivation is that 32-bit indices on 64-bit machines don't allow representing sparse matrices with large nnz. One option A (currently there) is to allow both int32 and int64 as indices, and use the larger one only when required by nnz. The second option B would be to just use intp for everything. The problem with A is that I'm far from certain that I found all the corner cases yet, and I'm fairly certain there are some undiscovered bugs still somewhere. The test suite doesn't yet have the level of coverage on this issue I'd be comfortable with. The problem with B is that on 64-bit systems, it it increases the memory needs of sparse matrices by about 50%. However, as a solution it's more robust and elegant. Opinions on how it should work? Pauli

On Fri, Dec 14, 2012 at 9:37 AM, Pauli Virtanen <pav@iki.fi> wrote:
Hi,
I've been looking a bit at making sparse matrices work with 64-bit indices:
https://github.com/pv/scipy-work/commits/ticket/1307
The motivation is that 32-bit indices on 64-bit machines don't allow representing sparse matrices with large nnz.
One option A (currently there) is to allow both int32 and int64 as indices, and use the larger one only when required by nnz.
The second option B would be to just use intp for everything.
The problem with A is that I'm far from certain that I found all the corner cases yet, and I'm fairly certain there are some undiscovered bugs still somewhere. The test suite doesn't yet have the level of coverage on this issue I'd be comfortable with.
The problem with B is that on 64-bit systems, it it increases the memory needs of sparse matrices by about 50%. However, as a solution it's more robust and elegant.
One problem with B is if there is code out there which "knows" that sparse matrices use 32-bit indices. E.g. I can adapt scikits.sparse.cholmod to handle 64-bit indices, but it will require code changes, because you have to use different flags when calling the underlying routines and so far there was no point in it. It looks like I was paranoid enough that switching to option B would just require changing ~4 lines of code, and that if you somehow passed 64-bit indices to the current version then it will downcast and keep going (not sure if this is better than crashing or not!). But there may well be other code out there that passes scipy.sparse matrices to C/Fortran, and if indices suddenly become 64-bit, then that code may start simply returning nonsense... I'd be concerned, anyway. I guess this is a problem with option A as well, but at least existing code working on matrices that currently work, would keep working. OTOH option A also means that any future C/Fortran code has to be prepared to handle both cases. Not really a big deal when working in Cython, but I hear that some people still use other tools... Do all the sparse matrix kernels we care about even handle 64-bit indices? CHOLMOD does, but it takes special setup, and I don't know if all kernel authors are so careful. -n

It may be less elegant to write, but I am sort of a fan of option A. On Fri, Dec 14, 2012 at 9:54 AM, Nathaniel Smith <njs@pobox.com> wrote:
On Fri, Dec 14, 2012 at 9:37 AM, Pauli Virtanen <pav@iki.fi> wrote:
Hi,
I've been looking a bit at making sparse matrices work with 64-bit indices:
https://github.com/pv/scipy-work/commits/ticket/1307
The motivation is that 32-bit indices on 64-bit machines don't allow representing sparse matrices with large nnz.
One option A (currently there) is to allow both int32 and int64 as indices, and use the larger one only when required by nnz.
The second option B would be to just use intp for everything.
The problem with A is that I'm far from certain that I found all the corner cases yet, and I'm fairly certain there are some undiscovered bugs still somewhere. The test suite doesn't yet have the level of coverage on this issue I'd be comfortable with.
The problem with B is that on 64-bit systems, it it increases the memory needs of sparse matrices by about 50%. However, as a solution it's more robust and elegant.
One problem with B is if there is code out there which "knows" that sparse matrices use 32-bit indices. E.g. I can adapt scikits.sparse.cholmod to handle 64-bit indices, but it will require code changes, because you have to use different flags when calling the underlying routines and so far there was no point in it. It looks like I was paranoid enough that switching to option B would just require changing ~4 lines of code, and that if you somehow passed 64-bit indices to the current version then it will downcast and keep going (not sure if this is better than crashing or not!). But there may well be other code out there that passes scipy.sparse matrices to C/Fortran, and if indices suddenly become 64-bit, then that code may start simply returning nonsense... I'd be concerned, anyway.
I guess this is a problem with option A as well, but at least existing code working on matrices that currently work, would keep working. OTOH option A also means that any future C/Fortran code has to be prepared to handle both cases. Not really a big deal when working in Cython, but I hear that some people still use other tools...
Do all the sparse matrix kernels we care about even handle 64-bit indices? CHOLMOD does, but it takes special setup, and I don't know if all kernel authors are so careful.
-n _______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev

On Fri, Dec 14, 2012 at 3:54 PM, Nathaniel Smith <njs@pobox.com> wrote:
I guess this is a problem with option A as well, but at least existing code working on matrices that currently work, would keep working. OTOH option A also means that any future C/Fortran code has to be prepared to handle both cases. Not really a big deal when working in Cython, but I hear that some people still use other tools...
Actually on re-reading your mail, I guess the options you're suggesting are: A: both 32- and 64-bit indices are possible, which is used depends on nnz B: both 32- and 64-bit indices are possible, which is used depends on python's architecture ? So I withdraw the above comment -- both options require some sort of annoying type parametrization, it isn't really a disadvantage of option A. -n (Also surely it's only a 33% memory overhead compared to now? But still I have the feeling people really do work with sparse matrices right up to the limit of their available memory.)

Nathaniel Smith <njs <at> pobox.com> writes: [clip]
Actually on re-reading your mail, I guess the options you're suggesting are:
A: both 32- and 64-bit indices are possible, which is used depends on nnz B: both 32- and 64-bit indices are possible, which is used depends on python's architecture
?
Precisely.
So I withdraw the above comment -- both options require some sort of annoying type parametrization, it isn't really a disadvantage of option A.
Yep, except that in B the "type parameterization" just means using the intp data type and can be done with a single typedef. But I see your point, B completely breaks backward compatibility, whereas A does not.
(Also surely it's only a 33% memory overhead compared to now? But still I have the feeling people really do work with sparse matrices right up to the limit of their available memory.)
Yeah, this might be the case. I guess I'll try to make A to work then. This is mostly just a matter of ensuring the test suite coverage is good enough. Pauli

FYI: https://github.com/scipy/scipy/pull/442 If you're interested in sparse matrices with nnz > 2**31, if you can play around with the above code, it would be appreciated. -- Pauli Virtanen
participants (3)
-
Anthony Scopatz
-
Nathaniel Smith
-
Pauli Virtanen