[scikit-learn] Cross-validation & cross-testing

Sebastian Raschka se.raschka at gmail.com
Sun Jun 4 23:27:16 EDT 2017


> Is it possible for somebody to take a look and give any feedback?

Just looked over your repo would have some feedback:

Definitely cite the original research paper that your implementation is based on. Right now it just says "The cross-validation & cross-testing method was developed by Korjus et al." (year, journal, title, ... are missing)
Like Joel mentioned, I'd add unit tests and also consider CI services like Travis to check if the code indeed works (produces the same results) for package versions newer than the one you listed since you use ">=" 
Maybe a good, explanatory figure would help -- often, a good figure can make things much more clear and intuitive for a user. For new algorithms, it is also helpful to explain them in a procedural way using a numeric list of steps. In addition to describing the package, also consider stating the problem this approach is going to address.


Just a few general comment on the paper (which I only skimmed over I have to admit). Not sure what to think of this, it might be an interesting idea, but showing empirical results on only 2 datasets and a simulated one does not convince me that this is useful in practice, yet. Also, a discussion/analysis on bias and variance seems to be missing from that paper. Another thing is that I think in practice, one would also consider LOOCV or bootstrap approaches for "very" small datasets, which is not even mentioned in this paper. While I think there might be some interesting idea here, I'd say there needs to be additional research to make a judgement whether this approach should be used in practice or not -- I would say it's a bit too early too include something like this in scikit-learn?

Best,
Sebastian


> On Jun 4, 2017, at 9:53 PM, Joel Nothman <joel.nothman at gmail.com> wrote:
> 
> And when I mean testing it, I mean writing tests that live with the code so that they can be re-executed, and so that someone else can see what your tests assert about your code's correctness.
> 
> On 5 June 2017 at 11:52, Joel Nothman <joel.nothman at gmail.com> wrote:
> Hi Rain,
> 
> I would suggest that you start by documenting what your code is meant to do (the structure of the Korjus et al paper makes it pretty difficult to even determine what this technique is, for you to then not to describe it in your own words in your repository), testing it with diverse inputs and ensuring that it is correct. At a glance I can see at least two sources of bugs, and some API design choices which I think could be improved.
> 
> Cheers,
> 
> Joel
> 
> On 5 June 2017 at 07:04, Rain Vagel <rain.vagel at gmail.com> wrote:
> Hey,
> 
> I am a bachelor’s student and for my thesis I implemented a cross-testing function in a scikit-learn compatible way and published it on Github. The paper on which I based my own thesis can be found here: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0161788.
> 
> My project can be found here: https://github.com/RainVagel/cross-val-cross-test.
> 
> Our original plan was to try and get the algorithm into scikit-learn, but it doesn’t meet the requirements yet. So instead we thought about maybe having it listed in the “Related Projects” page. Is it possible for somebody to take a look and give any feedback?
> 
> Sincerely,
> Rain
> 
> 
> 
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> 
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn



More information about the scikit-learn mailing list