[scikit-learn] SVC data normalisation
mamunbabu2001 at gmail.com
Mon May 8 05:45:26 EDT 2017
I am testing two classifiers [ 1. Random forest 2. SVC with radial basis kernel ] on a data set via 5 fold cross validation.
The feature matrix contains :
A. 80% features are binary [ 0 or 1 ]
B. 10% are integer values representing counts / occurrences.
C. 10% are continuous values between different ranges.
My prior understanding was that decision tree based algorithms work better on mixed data types. In this particular case I am noticing
SVC is performing much better than Random forest.
I Z-score normalise the data before I sent it to support vector classifier.
- Binary features ( type A) are left as it it.
- Integer and Continuous features are Z-score normalised [ ( feat - mean(feat) ) / sd(feat) ) .
I was wondering if anyone can tell me if this normalisation approach it correct for SVC run.
Thanks in advance for your help.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the scikit-learn