scipy.sparse.coo_matrix a new method drop_zero_columns?

Hello, I am writing to follow up on the discussion https://github.com/scipy/ scipy/issues/6754 I would be happy to get your opinions. Thank you for your time, Evgeny

On Fri, Nov 4, 2016 at 11:40 PM, Evgeny Nekrasov < evgeny.nekrasov@phystech.edu> wrote:
Hello,
I am writing to follow up on the discussion https://github.com/scipy/scipy /issues/6754
I would be happy to get your opinions.
Hi Evgeny, it would be helpful to add to the issue description some links to code / examples where this method is used or would be useful. Right now there's just an assertion "this is useful", which is hard to evaluate. Other questions I would have: 1. The issue has a method for COO, but I'd think we want this for all or none of the formats? 2. Is it only drop column, or would drop row also make sense? Ralf

Dear Ralf, Thank you for your response. The popular use case for drop_zero_columns is feature selection before applying machine learning algorithm. For example a very similar technique VarianceThreshold is implemented in sklearn ( http://scikit-learn.org/stable/modules/feature_selection.html). The problem with current implementations is that it fails or takes much more resources than actually needed if the amount of zero columns is really huge. Such sparse data representations often produced by popular techniques such as feature hashing, bag of words, bag of content_ids or similar. Such techniques are implemented in sklearn ( http://scikit-learn.org/stable/modules/feature_extraction.html). Nevertheless, custom implementations often needed, and here drop_zero_columns is valuable. Other questions: 1. It would be great to have drop_zero_columns for all matrix types. I wrote about this method for COO due to it is sufficient to process data with huge amount of zero columns in efficient way. 2. I don't know popular use cases for rows. Best regards, Evgeny 2016-11-06 0:44 GMT+03:00 Ralf Gommers <ralf.gommers@gmail.com>:
On Fri, Nov 4, 2016 at 11:40 PM, Evgeny Nekrasov < evgeny.nekrasov@phystech.edu> wrote:
Hello,
I am writing to follow up on the discussion https://github.com/scipy/scipy/issues/6754
I would be happy to get your opinions.
Hi Evgeny, it would be helpful to add to the issue description some links to code / examples where this method is used or would be useful. Right now there's just an assertion "this is useful", which is hard to evaluate.
Other questions I would have: 1. The issue has a method for COO, but I'd think we want this for all or none of the formats? 2. Is it only drop column, or would drop row also make sense?
Ralf
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org https://mail.scipy.org/mailman/listinfo/scipy-dev

On Mon, Nov 7, 2016 at 7:10 AM, Evgeny Nekrasov < evgeny.nekrasov@phystech.edu> wrote:
Dear Ralf,
Thank you for your response. The popular use case for drop_zero_columns is feature selection before applying machine learning algorithm. For example a very similar technique VarianceThreshold is implemented in sklearn ( http://scikit-learn.org/stable/modules/feature_selection.html). The problem with current implementations is that it fails or takes much more resources than actually needed if the amount of zero columns is really huge. Such sparse data representations often produced by popular techniques such as feature hashing, bag of words, bag of content_ids or similar. Such techniques are implemented in sklearn (http://scikit-learn.org/ stable/modules/feature_extraction.html). Nevertheless, custom implementations often needed, and here drop_zero_columns is valuable. Other questions: 1. It would be great to have drop_zero_columns for all matrix types. I wrote about this method for COO due to it is sufficient to process data with huge amount of zero columns in efficient way. 2. I don't know popular use cases for rows.
Thanks Evgeny, clear. Looks fine to me to add drop_zero_columns for all sparse matrix types. Cheers, Ralf
2016-11-06 0:44 GMT+03:00 Ralf Gommers <ralf.gommers@gmail.com>:
On Fri, Nov 4, 2016 at 11:40 PM, Evgeny Nekrasov < evgeny.nekrasov@phystech.edu> wrote:
Hello,
I am writing to follow up on the discussion https://github.com/scipy/scipy/issues/6754
I would be happy to get your opinions.
Hi Evgeny, it would be helpful to add to the issue description some links to code / examples where this method is used or would be useful. Right now there's just an assertion "this is useful", which is hard to evaluate.
Other questions I would have: 1. The issue has a method for COO, but I'd think we want this for all or none of the formats? 2. Is it only drop column, or would drop row also make sense?
Ralf
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org https://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org https://mail.scipy.org/mailman/listinfo/scipy-dev
participants (2)
-
Evgeny Nekrasov
-
Ralf Gommers