Problem using boxplots to compare significance of model performance
Hi folks! I'm using scikit-learn to build two neural networks using 10% holdout, and compare their performance using precision. To compare statistical significance in the variance of precision, i'm using scikit's boxplots. My problem is twofold - 1) The standard deviation in the precision of the two models (obtained using precision.std()) is always 0.0. I'm assuming that's a problem. 2) My boxplot is meant to display bars for the two models, but always displays only the first model (nn01) My outcomes for this dataset is binary (0 or 1) since the models assume average=binary by default, is that a problem? For those who'd like to look, my source code can be seen at http://pastebin.com/yvE2T1Sw The code produces the following plot - which is of course only ONE of the bars that I need :( -- Best Regards, Suranga
Hi, Suranga,
1) The standard deviation in the precision of the two models (obtained using precision.std()) is always 0.0. I'm assuming that's a problem.
That’s weird. You are sure that “precision” has more than one value? E.g.,
np.array([0.89]).std() 0.0
2) My boxplot is meant to display bars for the two models, but always displays only the first model (nn01)
Also here, your input array or list for the boxplot function may not be formatted correctly. What you want is two_models = [ 1Darray_of_model1_results, 1Darray_of_model2_results ] plt.boxplot(two_models, notch=False, # box instead of notch shape sym='rs', # red squares for outliers vert=True) # vertical box aligmnent PS: If you are comparing specifically 2 neural network models, have you considered McNemar’s test? E.g., see https://github.com/rasbt/mlxtend/blob/master/docs/sources/user_guide/evaluat... Best Sebastian
On Oct 30, 2016, at 3:24 PM, Suranga Kasthurirathne <surangakas@gmail.com> wrote:
Hi folks!
I'm using scikit-learn to build two neural networks using 10% holdout, and compare their performance using precision. To compare statistical significance in the variance of precision, i'm using scikit's boxplots.
My problem is twofold -
1) The standard deviation in the precision of the two models (obtained using precision.std()) is always 0.0. I'm assuming that's a problem. 2) My boxplot is meant to display bars for the two models, but always displays only the first model (nn01)
My outcomes for this dataset is binary (0 or 1) since the models assume average=binary by default, is that a problem?
For those who'd like to look, my source code can be seen at http://pastebin.com/yvE2T1Sw
The code produces the following plot - which is of course only ONE of the bars that I need :(
<Screen Shot 2016-10-30 at 12.17.22 PM.png>
-- Best Regards, Suranga _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
participants (2)
-
Sebastian Raschka -
Suranga Kasthurirathne