<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    Hi Andy,<br>

    <br>

    I now have results for LinearDiscriminantAnalysis and the

    SGDClassifier. I updated the results online. <br>

    <br>

    The LinearDiscriminantAnalysis had<br>

    <ul>

      <li>an infinity of NaN for data that approaches MAXDOUBLE and<br>

      </li>

      <li>problems with an internal array size computation for data for

        several tests, i.e., data that is very close to zero and cannot

        be expressed by 32bit floats as well as for data that is all

        zero. <br>

      </li>

    </ul>

    <p>The SGD had</p>

    <ul>

      <li>an over/underflow for data that approaches MAXDOUBLE</li>

      <li>differences in the classifications if we added one to the

        numeric features</li>

      <li>differences in the classification if we reordered the

        instances. <br>

      </li>

    </ul>

    Best,<br>

    Steffen<br>

    <br>

    [1]

<a class="moz-txt-link-freetext" href="http://user.informatik.uni-goettingen.de/~sherbold/atoml-results/test-export-scikit.xml">http://user.informatik.uni-goettingen.de/~sherbold/atoml-results/test-export-scikit.xml</a><br>

    <br>

    <div class="moz-cite-prefix">Am 23.08.2018 um 13:39 schrieb Steffen

      Herbold:<br>

    </div>

    <blockquote type="cite"

      cite="mid:47fe57e8-d139-844e-b303-79e20adfb584@cs.uni-goettingen.de">Hi

      Andy,

      <br>

      <br>

      thanks for your detailed feedback.

      <br>

      <br>

      The random states are fixed, and set immediately before calling

      the fit function. Here is a gist with the code for one smoke tests

      and a metamorphic test [1].

      <br>

      <br>

      I will run the tests for LinearDiscriminantAnalysis and the

      SGDClassifier. I somehow missed them when I scanned the

      documentation.

      <br>

      <br>

      I know that these problems should sometimes be expected. However,

      I was actually not sure what to expect, especially after I started

      to look at the results for different ML libraries in comparison.

      The random forest you brought up are good example. I also expected

      them to be dependent on feature/instance order. However, they are

      not in Weka, only in scikit-learn and Spark MLlib. There are more

      such examples, like logistic regression that exihibits different

      behavior in all three libraries.

      <br>

      <br>

      I already have a comparison regarding expected differences between

      machine learning frameworks planned as a topic for future work.

      <br>

      <br>

      Best,

      <br>

      Steffen

      <br>

      <br>

      [1]

      <a class="moz-txt-link-freetext" href="https://gist.github.com/sherbold/570c9399e9bc39dd980d6c2bdbf3b64a">https://gist.github.com/sherbold/570c9399e9bc39dd980d6c2bdbf3b64a</a>

      <br>

      <br>

      Am 22.08.2018 um 17:49 schrieb Andreas Mueller:

      <br>

      <blockquote type="cite">Hi Steffen.

        <br>

        <br>

        Thanks for sharing your analysis. We really need more work in

        this direction.

        <br>

        I assume you fixed the random states everywhere?

        <br>

        <br>

        I consider these tests helpful but not all your expectations are

        warranted depending on the model.

        <br>

        <br>

        If you add one to each feature, there is no expectations that

        results will be the same, unless for the tree models.

        <br>

        For tree-based models with fixed random states, however, it's

        expected that reordering features will change the result.

        <br>

        For non-convex optimization it's expected that results are not

        symmetric (i.e. the MLPClassifier will not flip

        <br>

        the decision function because the optimization is initialized in

        an asymetric way), and reordering features will

        <br>

        also change the result. If using mini-batches (the default) the

        results will also change when instances are reordered.

        <br>

        I assume you didn't test SGDClassifier or any of it's

        derivatives because it doesn't show up here. Did you test

        LinearDiscriminantAnalysis?

        <br>

        <br>

        For the invariance tests it would be interesting to know if they

        are due to tie-breaking or numerical issues.

        <br>

        There is some numerical issues that are very hard to control,

        and I'm pretty sure we have asymmetric tie-breaking

        <br>

        (multiclass libsvm is "always predict the first class"

        <a class="moz-txt-link-freetext" href="https://github.com/scikit-learn/scikit-learn/issues/8276">https://github.com/scikit-learn/scikit-learn/issues/8276</a> )

        <br>

        <br>

        I would looks at QuadraticDiscriminantAnalysis a bit more

        closely as a consequence of your tests.

        <br>

        Maybe check if the SVM, RF and KNN issues are due to

        tie-breaking.

        <br>

        <br>

        We could try and document all the cases where the result will

        not fulfill these invariances, but I think that might be too

        much.

        <br>

        At some point we need the users to understand what's going on.

        If you look at the random forest algorithm and you fix

        <br>

        the random state it's obvious that feature order matters.

        <br>

        <br>

        A big question here is how big the differences are. Some

        algorithms are randomized (I think the coordinate descent in

        <br>

        some of the linear models uses random orders), but the results

        are expected to be near-identical, independent of the ordering.

        <br>

        <br>

        Cheers,

        <br>

        <br>

        Andy

        <br>

        <br>

        <br>

        On 8/22/18 7:12 AM, Steffen Herbold wrote:

        <br>

        <blockquote type="cite">Dear developers,

          <br>

          <br>

          I am writing you because I applied an approach for the

          automated testing of classification algorithms to scikit-learn

          and would like to forward the results to you.

          <br>

          <br>

          The approach is a combination of smoke testing and metamorphic

          testing. The smoke tests try to find problems by executing the

          training and prediction functions of classifiers with

          different data. These smoke tests should ensure the basic

          functioning of classifiers. I defined 20 different data sets,

          some very simple (uniform features in [0,1]), some with

          extreme distributions, e.g., data close to machine precision.

          The metamorphic tests determine if classification results

          change as expected if the training data is modified, e.g., by

          reordering features, flipping class labels, or reordering

          instances.

          <br>

          <br>

          I generated 70 different Python unittest tests for eleven

          different scikit-learn classifiers. In summary, I found the

          following potential problems:

          <br>

          - Two errors due to possibly infinite loops for the

          LogisticRegressionClassifier for data that approaches

          MAXDOUBLE.

          <br>

          - The classification of LogisticRegression, MLPClassifier,

          QuadraticDiscriminantAnalysis, and SVM with a polynomial

          kernel changed if one is added to each feature value.

          <br>

          - The classification of DecisionTreeClassifier,

          LogisticRegression, MLPClassifier,

          QuadraticDiscriminantAnalysis, RandomForestClassifier, and SVM

          with a linear and a polynomial kernel were not inverted when

          all binary class labels are flipped.

          <br>

          - The classification of LogisticRegression, MLPClassifier,

          QuadraticDiscriminantAnalysis, and RandomForestClassifier

          sometimes changed when the features are reordered.

          <br>

          - The classification of KNeighborsClassifier, MLPClassifier,

          QuadraticDiscriminantAnalysis, RandomForestClassifier, and SVM

          with a linear kernel sometimes changed when the instances are

          reordered.

          <br>

          <br>

          You can find details of our results online [1]. The provided

          resources include the current draft of the paper that

          describes the tests as well as detailed results in detail.

          Moreover, we provide an executable test suite with all tests

          we executed, as well as the export of our test results as XML

          file that contains all details of the test execution,

          including stack traces in case of exceptions. The preprint and

          online materials also contain the results for two other

          machine learning libraries, i.e., Weka and Spark MLlib.

          Additionally, you can find the atoml tool used to generate the

          tests on GitHub [2].

          <br>

          <br>

          I hope that these tests may help with the future development

          of scikit-learn. You could help me a lot by answering the

          following questions:

          <br>

          - Do you consider the tests helpful?

          <br>

          - Do you consider any source code or documentation changes due

          to our findings?

          <br>

          - Would you be interested in a pull request or any other type

          of integration of (a subset of) the tests into your project?

          <br>

          - Would you be interested in more such tests, e.g., for the

          consideration of hyper parameters, other algorithm types like

          clustering, or more complex algorithm specific metamorphic

          tests?

          <br>

          <br>

          I am looking forward to your feedback.

          <br>

          <br>

          Best regards,

          <br>

          Steffen Herbold

          <br>

          <br>

          [1]

          <a class="moz-txt-link-freetext" href="http://user.informatik.uni-goettingen.de/~sherbold/atoml-results/">http://user.informatik.uni-goettingen.de/~sherbold/atoml-results/</a>

          <br>

          [2] <a class="moz-txt-link-freetext" href="https://github.com/sherbold/atoml">https://github.com/sherbold/atoml</a>

          <br>

          <br>

        </blockquote>

        _______________________________________________

        <br>

        scikit-learn mailing list

        <br>

        <a class="moz-txt-link-abbreviated" href="mailto:scikit-learn@python.org">scikit-learn@python.org</a>

        <br>

        <a class="moz-txt-link-freetext" href="https://mail.python.org/mailman/listinfo/scikit-learn">https://mail.python.org/mailman/listinfo/scikit-learn</a>

        <br>

      </blockquote>

      <br>

    </blockquote>

    <br>

    <pre class="moz-signature" cols="72">-- 

Dr. Steffen Herbold

Institute of Computer Science

University of Goettingen

Goldschmidtstraße 7

37077 Göttingen, Germany

mailto. <a class="moz-txt-link-abbreviated" href="mailto:herbold@cs.uni-goettingen.de">herbold@cs.uni-goettingen.de</a>

tel. +49 551 39-172037</pre>

  </body>

</html>