[scikit-learn] SVC data normalisation
Mamun Rashid
mamunbabu2001 at gmail.com
Mon May 8 05:45:26 EDT 2017
Hi All,
I am testing two classifiers [ 1. Random forest 2. SVC with radial basis kernel ] on a data set via 5 fold cross validation.
The feature matrix contains :
A. 80% features are binary [ 0 or 1 ]
B. 10% are integer values representing counts / occurrences.
C. 10% are continuous values between different ranges.
My prior understanding was that decision tree based algorithms work better on mixed data types. In this particular case I am noticing
SVC is performing much better than Random forest.
I Z-score normalise the data before I sent it to support vector classifier.
- Binary features ( type A) are left as it it.
- Integer and Continuous features are Z-score normalised [ ( feat - mean(feat) ) / sd(feat) ) .
I was wondering if anyone can tell me if this normalisation approach it correct for SVC run.
Thanks in advance for your help.
Regards,
Mamun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170508/f991b267/attachment.html>
More information about the scikit-learn
mailing list