[scikit-learn] Retracting model from the 'blackbox' SVM (Sebastian Raschka)

Fri May 4 12:47:20 EDT 2018

Hi Sebastian,

If you are looking to reduce the feature space for your model, I suggest 
you look at the scikit-learn page on doing just that

http://scikit-learn.org/stable/modules/feature_selection.html

David

On 2018-05-04 12:00 PM, scikit-learn-request at python.org wrote:
> Send scikit-learn mailing list submissions to
> 	scikit-learn at python.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	https://mail.python.org/mailman/listinfo/scikit-learn
> or, via email, send a message with subject or body 'help' to
> 	scikit-learn-request at python.org
>
> You can reach the person managing the list at
> 	scikit-learn-owner at python.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of scikit-learn digest..."
>
>
> Today's Topics:
>
>     1. Re: Retracting model from the 'blackbox' SVM (Sebastian Raschka)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 4 May 2018 05:51:26 -0400
> From: Sebastian Raschka <mail at sebastianraschka.com>
> To: Scikit-learn mailing list <scikit-learn at python.org>
> Subject: Re: [scikit-learn] Retracting model from the 'blackbox' SVM
> Message-ID:
> 	<5331A676-D6C6-4F01-8A4D-EDDE9318E08F at sebastianraschka.com>
> Content-Type: text/plain;	charset=us-ascii
>
> Dear Wouter,
>
> for the SVM, scikit-learn wraps the LIBSVM and LIBLINEAR. I think the scikit-learn class SVC uses LIBSVM for every kernel. Since you are using the linear kernel, you could use the more efficient LinearSVC scikit-learn class to get similar results. I guess this in turn is easier to handle in terms of
>
>>   Is there a way to get the underlying formula for the model out of scikit instead of having it as a 'blackbox' in my svm function.
> More specifically, LinearSVC uses the _fit_liblinear code available here: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/base.py
>
> And more info on the LIBLINEAR library it is using can be found here: https://www.csie.ntu.edu.tw/~cjlin/liblinear/ (they have links to technical reports and implementation details there)
>
> Best,
> Sebastian
>
>> On May 4, 2018, at 5:12 AM, Wouter Verduin <wouterverduin at gmail.com> wrote:
>>
>> Dear developers of Scikit,
>>
>> I am working on a scientific paper on a predictionmodel predicting complications in major abdominal resections. I have been using scikit to create that model and got good results (score of 0.94). This makes us want to see what the model is like that is made by scikit.
>>
>> As for now we got 100 input variables but logically these arent all as usefull as the others and we want to reduce this number to about 20 and see what the effects on the score are.
>>
>> My question: Is there a way to get the underlying formula for the model out of scikit instead of having it as a 'blackbox' in my svm function.
>>
>> At this moment i am predicting a dichtomous variable with 100 variables, (continuous, ordinal and binair).
>>
>> My code:
>>
>> import numpy as
>>   np
>>
>> from numpy import *
>> import pandas as
>>   pd
>>
>> from sklearn import tree, svm, linear_model, metrics,
>>   preprocessing
>>
>> import
>>   datetime
>>
>> from sklearn.model_selection import KFold, cross_val_score, ShuffleSplit, GridSearchCV
>> from time import gmtime,
>>   strftime
>>
>>
>> #database openen en voorbereiden
>>
>> file
>> = "/home/wouter/scikit/DB_SCIKIT.csv"
>>
>> DB
>> = pd.read_csv(file, sep=";", header=0, decimal= ',').as_matrix()
>>
>> DBT
>> =
>>   DB
>>
>> print "Vorm van de DB: ", DB.
>> shape
>> target
>> = []
>> for i in range(len(DB[:,-1])):
>>
>>          target
>> .append(DB[i,-1])
>>
>> DB
>> = delete(DB,s_[-1],1) #Laatste kolom verwijderen
>> AantalOutcome = target.count(1)
>> print "Aantal outcome:", AantalOutcome
>> print "Aantal patienten:", len(target)
>>
>>
>> A
>> =
>>   DB
>> b
>> =
>>   target
>>
>>
>> print len(DBT)
>>
>>
>> svc
>> =svm.SVC(kernel='linear', cache_size=500, probability=True)
>>
>> indices
>> = np.random.permutation(len(DBT))
>>
>>
>> rs
>> = ShuffleSplit(n_splits=5, test_size=.15, random_state=None)
>>
>> scores
>> = cross_val_score(svc, A, b, cv=rs)
>>
>> A
>> = ("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
>> print
>>   A
>>
>> X_train
>> = DBT[indices[:-302]]
>>
>> y_train
>> = []
>> for i in range(len(X_train[:,-1])):
>>
>>          y_train
>> .append(X_train[i,-1])
>>
>> X_train
>> = delete(X_train,s_[-1],1) #Laatste kolom verwijderen
>>
>>
>> X_test
>> = DBT[indices[-302:]]
>>
>> y_test
>> = []
>> for i in range(len(X_test[:,-1])):
>>
>>          y_test
>> .append(X_test[i,-1])
>>
>> X_test
>> = delete(X_test,s_[-1],1) #Laatste kolom verwijderen
>>
>>
>> model
>> = svc.fit(X_train,y_train)
>> print
>>   model
>>
>> uitkomst
>> = model.score(X_test, y_test)
>> print
>>   uitkomst
>>
>> voorspel
>> = model.predict(X_test)
>> print voorspel
>> And output:
>>
>> Vorm van de DB:  (2011, 101)
>> Aantal outcome: 128
>> Aantal patienten: 2011
>> 2011
>> Accuracy: 0.94 (+/- 0.01)
>>
>> SVC
>> (C=1.0, cache_size=500, class_weight=None, coef0=0.0,
>>
>>    decision_function_shape
>> ='ovr', degree=3, gamma='auto', kernel='linear',
>>
>>    max_iter
>> =-1, probability=True, random_state=None, shrinking=True,
>>
>>    tol
>> =0.001, verbose=False)
>> 0.927152317881
>> [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
>>
>>   
>> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
>>
>>   
>> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
>>
>>   
>> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
>>
>>   
>> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
>>
>>   
>> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
>>
>>   
>> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
>>
>>   
>> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
>>
>>   
>> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
>>
>>   
>> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
>>
>>   
>> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
>>
>>   
>> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
>>
>>   
>> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
>> Thanks in advance!
>>
>> with kind regards,
>>
>> Wouter Verduin
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
> ------------------------------
>
> End of scikit-learn Digest, Vol 26, Issue 5
> *******************************************