[scikit-learn] Query Regarding Model Scoring using scikit learn's joblib library
Roman Yurchak
rth.yurchak at gmail.com
Tue Dec 27 04:51:39 EST 2016
Hi Debu,
On 27/12/16 08:18, Andrew Howe wrote:
> 5. I got a prediction result with True Positive Rate (TPR) as 10-12
> % on probability thresholds above 0.5
Getting a high True Positive Rate (recall) is not a sufficient condition
for a well behaved model. Though 0.1 recall is still pretty bad. You
could look at the precision at the same time (or consider, for instance,
the F1 score).
> 7. I reloaded the model in a different python instance from the
> pickle file mentioned above and did my scoring , i.e., used
> joblib library load method and then instantiated prediction
> (predict_proba method) on the entire set of my original 600 K
> records
> Another question – is there an alternate model scoring
> library (apart from joblib, the one I am using) ?
Joblib is not a scoring library; once you load a model from disk with
joblib you should get ~ the same RandomForestClassifier estimator object
as before saving it.
> 8. Now when I am running (scoring) my model using
> joblib.predict_proba on the entire set of original data (600 K),
> I am getting a True Positive rate of around 80%.
That sounds normal, considering what you are doing. Your entire set
consists of 80% of training set (for which the recall, I imagine, would
be close to 1.0) and 20 % test set (with a recall of 0.1), so on
average you would get a recall close to 0.8 for the complete set. Unless
I missed something.
> 9. I did some further analysis and figured out that during the
> training process, when the model was predicting on the test
> sample of 120K it could only predict 10-12% of 120K data beyond
> a probability threshold of 0.5. When I am now trying to score my
> model on the entire set of 600 K records, it appears that the
> model is remembering some of it’s past behavior and data and
> accordingly throwing 80% True positive rate
It feels like your RandomForestClassifier is not properly tuned. A
recall of 0.1 on the test set is quite low. It could be worth trying to
tune it better (cf. https://stackoverflow.com/a/36109706 ), using some
other metric than the recall to evaluate the performance.
Roman
More information about the scikit-learn
mailing list