AUCPR of individual features using Random Forest (Error: unhashable Type)
mscs15059 at itu.edu.pk
mscs15059 at itu.edu.pk
Mon Jul 10 07:47:34 EDT 2017
I have a data set of 19 features (v1---v19) and one class label (c1) , I can eaily get the precision recall value of all variables with the class label, but I want the AUCPR of individual features with the class label The data is in this form
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 C1
4182 4182 4182 1 2 0 0 0 4 1 1 0 5 0 1 1 24 4.4654 28.18955043 1
11396 3798.6 3825 3 1 0 1 0 0 3 3 1 0 1 1 3 5 4.452 11.90765492 0
60416 5034.66 5393.5 12 1 0 0 0 0 12 12 3 6 1 4 12 2 4.4711 35.11543135 0
34580 4940 5254 7 1 4 0 2 0 10 12 8 0 1 1 10 45 4.4689 32.44228433 1
8667 4333.5 4333.5 2 1 0 1 0 0 2 2 1 0 1 0 2 1 4.4659 28.79708384 0
4011 4011 4011 1 1 30 0 0 0 2 2 1 8 1 0 2 1 4.4634 25.75941677 0
691347 5083.43 5300 136 2 0 0 0 9 44 44 12 0 1 12 44 32 4.4693 32.92831106 1
So far I have done this
from collections import defaultdict
from sklearn.cross_validation import train_test_split
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
import numpy as np
from sklearn.metrics import average_precision_score
mydata = pd.read_csv("TEST_2.csv")
y = mydata["C1"] #provided your csv has header row, and the label column is named "Label"
##select all but the last column as data
X = mydata.ix[:,:-1]
X=X.iloc[:,:]
names = X.iloc[:,:].columns.tolist()
# -- Gridsearched parameters
model_rf = RandomForestClassifier(n_estimators=500,
class_weight="auto",
criterion='gini',
bootstrap=True,
max_features=10,
min_samples_split=1,
min_samples_leaf=6,
max_depth=3,
n_jobs=-1)
scores = defaultdict(list)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5,
random_state=0)
# -- Fit the model (could be cross-validated)
for i in range(X_train.shape[1]):
X_t = X_test.copy()
rf = model_rf.fit(X_train[:,i], y_train)
scores[names[i]] = average_precision_score(y_test, rf.predict(X_t[:,i))
print("Features sorted by their score:")
print(sorted([(round(np.mean(score), 4), feat) for
feat, score in scores.items()], reverse=True))
It gives the error unhashable type
The output should be something like that
V1: 0. 82
V2: 0.74
:
:
V19: 0.55
More information about the Python-list
mailing list