[scikit-learn] Truncated svd not working for complex matrices

Andreas Mueller t3kcit at gmail.com
Fri Aug 11 12:37:12 EDT 2017


I opened https://github.com/scikit-learn/scikit-learn/issues/9528

I suggest to first error everywhere and then fix those for which it seems
easy and worth it, as Joel said, probably mostly in decomposition.

Though adding support even in a few places seems like dangerous feature 
creep.

On 08/11/2017 03:16 AM, Raphael C wrote:
> Although the first priority should be correctness (in implementation
> and documentation) and it makes sense to explicitly test for inputs
> for which code will give the wrong answer, it would be great if we
> could support complex data types, especially where it is very little
> extra work.
>
> Raphael
>
> On 11 August 2017 at 05:41, Joel Nothman <joel.nothman at gmail.com> wrote:
>> Should we be more explicitly forbidding complex data in most estimators, and
>> perhaps allow it in a few where it is tested (particularly decomposition)?
>>
>> On 11 August 2017 at 01:08, André Melo <andre.nascimento.melo at gmail.com>
>> wrote:
>>> Actually, it makes more sense to change
>>>
>>>      B = safe_sparse_dot(Q.T, M)
>>>
>>> To
>>>      B = safe_sparse_dot(Q.T.conj(), M)
>>>
>>> On 10 August 2017 at 16:56, André Melo <andre.nascimento.melo at gmail.com>
>>> wrote:
>>>> Hi Olivier,
>>>>
>>>> Thank you very much for your reply. I was convinced it couldn't be a
>>>> fundamental mathematical issue because the singular values were coming
>>>> out exactly right, so it had to be a problem with the way complex
>>>> values were being handled.
>>>>
>>>> I decided to look at the source code and it turns out the problem is
>>>> when the following transformation is applied:
>>>>
>>>> U = np.dot(Q, Uhat)
>>>>
>>>> Replacing this by
>>>>
>>>> U = np.dot(Q.conj(), Uhat)
>>>>
>>>> solves the issue! Should I report this on github?
>>>>
>>>> On 10 August 2017 at 16:13, Olivier Grisel <olivier.grisel at ensta.org>
>>>> wrote:
>>>>> I have no idea whether the randomized SVD method is supposed to work
>>>>> for
>>>>> complex data or not (from a mathematical point of view). I think that
>>>>> all
>>>>> scikit-learn estimators assume real data (or integer data for class
>>>>> labels)
>>>>> and our input validation utilities will cast numeric values to float64
>>>>> by
>>>>> default. This might be the cause of your problem. Have a look at the
>>>>> source
>>>>> code to confirm. The reference to the paper can also be found in the
>>>>> docstring of those functions.
>>>>>
>>>>> --
>>>>> Olivier
>>>>>
>>>>> _______________________________________________
>>>>> scikit-learn mailing list
>>>>> scikit-learn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn



More information about the scikit-learn mailing list