[scikit-learn] Retracting model from the 'blackbox' SVM

Sebastian Raschka mail at sebastianraschka.com
Fri May 4 05:51:26 EDT 2018


Dear Wouter,

for the SVM, scikit-learn wraps the LIBSVM and LIBLINEAR. I think the scikit-learn class SVC uses LIBSVM for every kernel. Since you are using the linear kernel, you could use the more efficient LinearSVC scikit-learn class to get similar results. I guess this in turn is easier to handle in terms of

>  Is there a way to get the underlying formula for the model out of scikit instead of having it as a 'blackbox' in my svm function.

More specifically, LinearSVC uses the _fit_liblinear code available here: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/base.py

And more info on the LIBLINEAR library it is using can be found here: https://www.csie.ntu.edu.tw/~cjlin/liblinear/ (they have links to technical reports and implementation details there)

Best,
Sebastian

> On May 4, 2018, at 5:12 AM, Wouter Verduin <wouterverduin at gmail.com> wrote:
> 
> Dear developers of Scikit,
> 
> I am working on a scientific paper on a predictionmodel predicting complications in major abdominal resections. I have been using scikit to create that model and got good results (score of 0.94). This makes us want to see what the model is like that is made by scikit.
> 
> As for now we got 100 input variables but logically these arent all as usefull as the others and we want to reduce this number to about 20 and see what the effects on the score are.
> 
> My question: Is there a way to get the underlying formula for the model out of scikit instead of having it as a 'blackbox' in my svm function.
> 
> At this moment i am predicting a dichtomous variable with 100 variables, (continuous, ordinal and binair).
> 
> My code:
> 
> import numpy as
>  np
> 
> from numpy import *
> import pandas as
>  pd
> 
> from sklearn import tree, svm, linear_model, metrics,
>  preprocessing
> 
> import
>  datetime
> 
> from sklearn.model_selection import KFold, cross_val_score, ShuffleSplit, GridSearchCV
> from time import gmtime,
>  strftime
> 
> 
> #database openen en voorbereiden
> 
> file 
> = "/home/wouter/scikit/DB_SCIKIT.csv"
> 
> DB 
> = pd.read_csv(file, sep=";", header=0, decimal= ',').as_matrix()
> 
> DBT 
> =
>  DB
> 
> print "Vorm van de DB: ", DB.
> shape
> target 
> = []
> for i in range(len(DB[:,-1])):
> 
>         target
> .append(DB[i,-1])
> 
> DB 
> = delete(DB,s_[-1],1) #Laatste kolom verwijderen
> AantalOutcome = target.count(1)
> print "Aantal outcome:", AantalOutcome
> print "Aantal patienten:", len(target)
> 
> 
> A 
> =
>  DB
> b 
> =
>  target
> 
> 
> print len(DBT)
> 
> 
> svc
> =svm.SVC(kernel='linear', cache_size=500, probability=True)
> 
> indices 
> = np.random.permutation(len(DBT))
> 
> 
> rs 
> = ShuffleSplit(n_splits=5, test_size=.15, random_state=None)
> 
> scores 
> = cross_val_score(svc, A, b, cv=rs)
> 
> A 
> = ("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
> print
>  A
> 
> X_train 
> = DBT[indices[:-302]]
> 
> y_train 
> = []
> for i in range(len(X_train[:,-1])):
> 
>         y_train
> .append(X_train[i,-1])
> 
> X_train 
> = delete(X_train,s_[-1],1) #Laatste kolom verwijderen
> 
> 
> X_test 
> = DBT[indices[-302:]]
> 
> y_test 
> = []
> for i in range(len(X_test[:,-1])):
> 
>         y_test
> .append(X_test[i,-1])
> 
> X_test 
> = delete(X_test,s_[-1],1) #Laatste kolom verwijderen
> 
> 
> model 
> = svc.fit(X_train,y_train)
> print
>  model
> 
> uitkomst 
> = model.score(X_test, y_test)
> print
>  uitkomst
> 
> voorspel 
> = model.predict(X_test)
> print voorspel
> And output:
> 
> Vorm van de DB:  (2011, 101)
> Aantal outcome: 128
> Aantal patienten: 2011
> 2011
> Accuracy: 0.94 (+/- 0.01)
> 
> SVC
> (C=1.0, cache_size=500, class_weight=None, coef0=0.0,
> 
>   decision_function_shape
> ='ovr', degree=3, gamma='auto', kernel='linear',
> 
>   max_iter
> =-1, probability=True, random_state=None, shrinking=True,
> 
>   tol
> =0.001, verbose=False)
> 0.927152317881
> [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
> 
>  
> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
> 
>  
> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
> 
>  
> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
> 
>  
> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
> 
>  
> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
> 
>  
> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
> 
>  
> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
> 
>  
> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
> 
>  
> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
> 
>  
> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
> 
>  
> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
> 
>  
> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
> Thanks in advance!
> 
> with kind regards,
> 
> Wouter Verduin
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn



More information about the scikit-learn mailing list