[scikit-learn] SVC data normalisation

Mamun Rashid mamunbabu2001 at gmail.com
Mon May 8 05:45:26 EDT 2017

Hi All,
I am testing two classifiers  [ 1. Random forest 2. SVC with radial basis kernel ] on a data set via 5 fold cross validation. 

The feature matrix contains :

A. 80% features are binary [ 0 or 1 ]
B. 10% are integer values representing counts / occurrences.
C. 10% are continuous values between different ranges.

My prior understanding was that decision tree based algorithms work better on mixed data types. In this particular case I am noticing 
SVC is performing much better than Random forest. 

I Z-score normalise the data before I sent it to support vector classifier. 
- Binary features ( type A) are left as it it. 
- Integer and Continuous features are Z-score normalised [   ( feat - mean(feat) ) / sd(feat)   ) . 

I was wondering if anyone can tell me if this normalisation approach it correct for SVC run.

Thanks in advance for your help. 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170508/f991b267/attachment.html>

More information about the scikit-learn mailing list