[scikit-learn] f_classif function confusion

Trenton Bricken trenton.bricken at duke.edu
Thu Jul 26 15:01:41 EDT 2018


I am very confused by the f_classif function using feature.selection.f_classif()

The function under the User Guide says that it can be used for classification tasks and under the documentation it claims to use ANOVA. However, ANOVA takes a categorical input and continuous output. Why can we provide it with a continuous input and categorical output here?

Also looking at the source code, there are warnings for not using ANOVA if your feature is not normally distributed. I think these should be more visible to warn the unaware before they start using this method for feature analysis.

Thank you in advance for any help and explanations that can be provided about why f_classif can be used with categorical outputs.

Trenton

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20180726/1b23438b/attachment.html>


More information about the scikit-learn mailing list