Backwards Compatibility for low level LAPACK routines

Due their historical evolution, there are certain LAPACK wrappers that are not standardized. Some work with minimum lwork variables instead of their optimal values. Also these routines often return quite big arrays during the lwork queries, to demonstrate : import scipy.linalg as la la.lapack.dgeqrf(a=np.random.rand(400,400), lwork=-1) is a workspace size query (via lwork=-1). The current default size is "3*a.shape[0] + 1" hence 1201. However the optimal workspace size is 12800 on my machine. Therefore the mismatch is sometimes quite dramatic especially in some other routines. Notice also that to obtain this number the routine actually returns a 400-long array tau and requires the input matrix to be transferred back and forth. Moreover, they can't be handled via scipy.linalg.lapack._compute_lwork function. There are a few routines like this and I feel like they should be fixed and I'm willing to. However this means that their output signature is going to change which imply backwards compatibility breaks. I tried to see whether we could deprecate them with new wrappers with modified names, but to be honest, that would create too many duplicates. On the other hand I don't have a feeling of how much break this would mean out there in the wild. Is this break an acceptable one or not? (well, none is acceptable preferably, but in despair...) Any other alternatives, thoughts are most welcome. best, ilhan

On Thu, Aug 2, 2018 at 2:37 PM, Ilhan Polat <ilhanpolat@gmail.com> wrote:
Due their historical evolution, there are certain LAPACK wrappers that are not standardized. Some work with minimum lwork variables instead of their optimal values. Also these routines often return quite big arrays during the lwork queries, to demonstrate :
import scipy.linalg as la la.lapack.dgeqrf(a=np.random.rand(400,400), lwork=-1)
is a workspace size query (via lwork=-1). The current default size is "3*a.shape[0] + 1" hence 1201. However the optimal workspace size is 12800 on my machine. Therefore the mismatch is sometimes quite dramatic especially in some other routines. Notice also that to obtain this number the routine actually returns a 400-long array tau and requires the input matrix to be transferred back and forth. Moreover, they can't be handled via scipy.linalg.lapack._compute_lwork function.
There are a few routines like this and I feel like they should be fixed and I'm willing to. However this means that their output signature is going to change which imply backwards compatibility breaks. I tried to see whether we could deprecate them with new wrappers with modified names, but to be honest, that would create too many duplicates. On the other hand I don't have a feeling of how much break this would mean out there in the wild.
Is this break an acceptable one or not? (well, none is acceptable preferably, but in despair...)
Any other alternatives, thoughts are most welcome.
Is this only for the python functions or also for the cython wrappers to LAPACK? Binary incompatibilities are pretty painful. We just got rid of the compatibility with scipy's old cython wrapper code in statsmodels. Both distributing binaries and conditional compilation depending on the installed scipy versions is fragile and there doesn't seem to be good packaging support for it. Josef
best, ilhan
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev

Cython wrappers won't be affected. This would only affect the people accessing LAPACK routines manually via scipy.linalg.lapack.<funcname> API directly. On Thu, Aug 2, 2018, 23:17 <josef.pktd@gmail.com> wrote:
Due their historical evolution, there are certain LAPACK wrappers that are not standardized. Some work with minimum lwork variables instead of their optimal values. Also these routines often return quite big arrays during
On Thu, Aug 2, 2018 at 2:37 PM, Ilhan Polat <ilhanpolat@gmail.com> wrote: the
lwork queries, to demonstrate :
import scipy.linalg as la la.lapack.dgeqrf(a=np.random.rand(400,400), lwork=-1)
is a workspace size query (via lwork=-1). The current default size is "3*a.shape[0] + 1" hence 1201. However the optimal workspace size is 12800 on my machine. Therefore the mismatch is sometimes quite dramatic especially in some other routines. Notice also that to obtain this number the routine actually returns a 400-long array tau and requires the input matrix to be transferred back and forth. Moreover, they can't be handled via scipy.linalg.lapack._compute_lwork function.
There are a few routines like this and I feel like they should be fixed and I'm willing to. However this means that their output signature is going to change which imply backwards compatibility breaks. I tried to see whether we could deprecate them with new wrappers with modified names, but to be honest, that would create too many duplicates. On the other hand I don't have a feeling of how much break this would mean out there in the wild.
Is this break an acceptable one or not? (well, none is acceptable preferably, but in despair...)
Any other alternatives, thoughts are most welcome.
Is this only for the python functions or also for the cython wrappers to LAPACK?
Binary incompatibilities are pretty painful. We just got rid of the compatibility with scipy's old cython wrapper code in statsmodels. Both distributing binaries and conditional compilation depending on the installed scipy versions is fragile and there doesn't seem to be good packaging support for it.
Josef
best, ilhan
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev

Oh also the ones who use get_lapack_funcs machinery On Fri, Aug 3, 2018, 00:04 Ilhan Polat <ilhanpolat@gmail.com> wrote:
Cython wrappers won't be affected. This would only affect the people accessing LAPACK routines manually via scipy.linalg.lapack.<funcname> API directly.
On Thu, Aug 2, 2018, 23:17 <josef.pktd@gmail.com> wrote:
Due their historical evolution, there are certain LAPACK wrappers that are not standardized. Some work with minimum lwork variables instead of
On Thu, Aug 2, 2018 at 2:37 PM, Ilhan Polat <ilhanpolat@gmail.com> wrote: their
optimal values. Also these routines often return quite big arrays during the lwork queries, to demonstrate :
import scipy.linalg as la la.lapack.dgeqrf(a=np.random.rand(400,400), lwork=-1)
is a workspace size query (via lwork=-1). The current default size is "3*a.shape[0] + 1" hence 1201. However the optimal workspace size is 12800 on my machine. Therefore the mismatch is sometimes quite dramatic especially in some other routines. Notice also that to obtain this number the routine actually returns a 400-long array tau and requires the input matrix to be transferred back and forth. Moreover, they can't be handled via scipy.linalg.lapack._compute_lwork function.
There are a few routines like this and I feel like they should be fixed and I'm willing to. However this means that their output signature is going to change which imply backwards compatibility breaks. I tried to see whether we could deprecate them with new wrappers with modified names, but to be honest, that would create too many duplicates. On the other hand I don't have a feeling of how much break this would mean out there in the wild.
Is this break an acceptable one or not? (well, none is acceptable preferably, but in despair...)
Any other alternatives, thoughts are most welcome.
Is this only for the python functions or also for the cython wrappers to LAPACK?
Binary incompatibilities are pretty painful. We just got rid of the compatibility with scipy's old cython wrapper code in statsmodels. Both distributing binaries and conditional compilation depending on the installed scipy versions is fragile and there doesn't seem to be good packaging support for it.
Josef
best, ilhan
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev

On Thu, Aug 2, 2018 at 11:37 AM, Ilhan Polat <ilhanpolat@gmail.com> wrote:
Due their historical evolution, there are certain LAPACK wrappers that are not standardized. Some work with minimum lwork variables instead of their optimal values. Also these routines often return quite big arrays during the lwork queries, to demonstrate :
import scipy.linalg as la la.lapack.dgeqrf(a=np.random.rand(400,400), lwork=-1)
is a workspace size query (via lwork=-1). The current default size is "3*a.shape[0] + 1" hence 1201. However the optimal workspace size is 12800 on my machine. Therefore the mismatch is sometimes quite dramatic especially in some other routines. Notice also that to obtain this number the routine actually returns a 400-long array tau and requires the input matrix to be transferred back and forth. Moreover, they can't be handled via scipy.linalg.lapack._compute_lwork function.
There are a few routines like this and I feel like they should be fixed and I'm willing to. However this means that their output signature is going to change which imply backwards compatibility breaks.
What would the output change to? Currently it returns: qr : rank-2 array('d') with bounds (m,n) and a storage tau : rank-1 array('d') with bounds (MIN(m,n)) work : rank-1 array('d') with bounds (MAX(lwork,1)) info : int Ralf
I tried to see whether we could deprecate them with new wrappers with modified names, but to be honest, that would create too many duplicates. On the other hand I don't have a feeling of how much break this would mean out there in the wild.
Is this break an acceptable one or not? (well, none is acceptable preferably, but in despair...)
Any other alternatives, thoughts are most welcome.
best, ilhan
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev

It will gain a dgeqrf_lwork function as usual to return the necessary workspace size as lwork, info = dgeqrf_lwork(m,n) Then the "work" variable will be removed from dgeqrf signature and will be made hidden. For example for the previous example I gave before, the optimal size is 12800 and work array is returning an 12800-long array for a 400-400 array computation. On Sat, Aug 4, 2018 at 3:25 AM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Thu, Aug 2, 2018 at 11:37 AM, Ilhan Polat <ilhanpolat@gmail.com> wrote:
Due their historical evolution, there are certain LAPACK wrappers that are not standardized. Some work with minimum lwork variables instead of their optimal values. Also these routines often return quite big arrays during the lwork queries, to demonstrate :
import scipy.linalg as la la.lapack.dgeqrf(a=np.random.rand(400,400), lwork=-1)
is a workspace size query (via lwork=-1). The current default size is "3*a.shape[0] + 1" hence 1201. However the optimal workspace size is 12800 on my machine. Therefore the mismatch is sometimes quite dramatic especially in some other routines. Notice also that to obtain this number the routine actually returns a 400-long array tau and requires the input matrix to be transferred back and forth. Moreover, they can't be handled via scipy.linalg.lapack._compute_lwork function.
There are a few routines like this and I feel like they should be fixed and I'm willing to. However this means that their output signature is going to change which imply backwards compatibility breaks.
What would the output change to? Currently it returns:
qr : rank-2 array('d') with bounds (m,n) and a storage tau : rank-1 array('d') with bounds (MIN(m,n)) work : rank-1 array('d') with bounds (MAX(lwork,1)) info : int
Ralf
I tried to see whether we could deprecate them with new wrappers with modified names, but to be honest, that would create too many duplicates. On the other hand I don't have a feeling of how much break this would mean out there in the wild.
Is this break an acceptable one or not? (well, none is acceptable preferably, but in despair...)
Any other alternatives, thoughts are most welcome.
best, ilhan
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev

On Sat, Aug 4, 2018 at 12:49 AM, Ilhan Polat <ilhanpolat@gmail.com> wrote:
It will gain a dgeqrf_lwork function as usual to return the necessary workspace size as
lwork, info = dgeqrf_lwork(m,n)
Then the "work" variable will be removed from dgeqrf signature and will be made hidden.
For example for the previous example I gave before, the optimal size is 12800 and work array is returning an 12800-long array for a 400-400 array computation.
Ah okay. Then the alternative is to just leave the work parameter, ignore it in the code if it's passed in (or give a warning/error) and document it as not being used. Right? If you're removing "work" from both the signature and the return value, that's a bigger change indeed, that can't be handled well that way. I'm not 100% sure, but I think I agree that a backwards incompatible change here will be better than introducing a bunch of new functions with worse names. We could introduce a Python wrapper for these to give a proper FutureWarning first. Cheers, Ralf
On Sat, Aug 4, 2018 at 3:25 AM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Thu, Aug 2, 2018 at 11:37 AM, Ilhan Polat <ilhanpolat@gmail.com> wrote:
Due their historical evolution, there are certain LAPACK wrappers that are not standardized. Some work with minimum lwork variables instead of their optimal values. Also these routines often return quite big arrays during the lwork queries, to demonstrate :
import scipy.linalg as la la.lapack.dgeqrf(a=np.random.rand(400,400), lwork=-1)
is a workspace size query (via lwork=-1). The current default size is "3*a.shape[0] + 1" hence 1201. However the optimal workspace size is 12800 on my machine. Therefore the mismatch is sometimes quite dramatic especially in some other routines. Notice also that to obtain this number the routine actually returns a 400-long array tau and requires the input matrix to be transferred back and forth. Moreover, they can't be handled via scipy.linalg.lapack._compute_lwork function.
There are a few routines like this and I feel like they should be fixed and I'm willing to. However this means that their output signature is going to change which imply backwards compatibility breaks.
What would the output change to? Currently it returns:
qr : rank-2 array('d') with bounds (m,n) and a storage tau : rank-1 array('d') with bounds (MIN(m,n)) work : rank-1 array('d') with bounds (MAX(lwork,1)) info : int
Ralf
I tried to see whether we could deprecate them with new wrappers with modified names, but to be honest, that would create too many duplicates. On the other hand I don't have a feeling of how much break this would mean out there in the wild.
Is this break an acceptable one or not? (well, none is acceptable preferably, but in despair...)
Any other alternatives, thoughts are most welcome.
best, ilhan
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev

la, 2018-08-04 kello 06:06 -0700, Ralf Gommers kirjoitti: [clip]
Ah okay. Then the alternative is to just leave the work parameter, ignore it in the code if it's passed in (or give a warning/error) and document it as not being used. Right?
If you're removing "work" from both the signature and the return value, that's a bigger change indeed, that can't be handled well that way. I'm not 100% sure, but I think I agree that a backwards incompatible change here will be better than introducing a bunch of new functions with worse names.
We could introduce a Python wrapper for these to give a proper FutureWarning first.
Or, perhaps you can leave the `work` return variable in, but make it an 1-element array? Its value can be filled in from the `callstatement`, cf eg https://github.com/scipy/scipy/blob/master/scipy/linalg/flapack_gen.pyf.src#... The actual work array is then made an intent(hide,cache) variable. Pauli
participants (4)
-
Ilhan Polat
-
josef.pktd@gmail.com
-
Pauli Virtanen
-
Ralf Gommers