[scikit-learn] Scikit-learn porting strategy

Wed Feb 6 00:22:21 EST 2019

I haven’t looked at Ruby in a long time. I do wonder what people mean by PORTING to another language or environment that already has their own way of doing things.

I did most of my  recent work in native R enhanced by packages  and have been learning how to do similar things in modules on top of modules … on top of native python.

R chose lots of built-in functionality up-front that python did not, and vice versa. If someone wanted to port some machine learning tools to R from python, there would not necessarily be much point in porting numpy or pandas as a whole. If you did, there would be even more duplication than there is now. On the other hand, I have seen people port things to R like a dict datatype which is not quite the same as the environments objects R uses. 

So if RUBY already has available much of what is needed, it could make sense to rewrite algorithms around them and only add what is needed. For efficiency, sure, you might want to link in C/C++/FORTRAN libraries.

As mentioned, there are already ways to run some languages within/from others. R and python can be run with either one being the initiator.  If you want RUBY to completely have the new functionality, do you want to slavishly copy entire packages or have your own new one designed eclectically? There are many ways to do these things and each time I compare a few, I see differences that make some more easy or intuitive than others and other times reversed. 

And how far do you expect to port? What does RUBY provide for graphics for example? R had base graphics and added lattice and then ggplot. I use them all, depending on the task and how much detail I want to tweak.  They are quite different as is the matplotlib that seems to be used quite a bit in python. Making plots is definitely a part of the process but if a function  expects certain data structures then would your version of numpy and pandas data structures interface well with that?

As Andreas says (and I am coincidentally in middle of the book he wrote with a Guido, albeit that is her last name unlike the python founder) you may find that a part of what you would do is create wrappers that accept one function interface and massage things to call a different interface. Calling a graphics program that expects a list using an array won’t work unless you quietly convert first …

From: scikit-learn <scikit-learn-bounces+avigross=verizon.net at python.org> On Behalf Of Andreas Mueller
Sent: Tuesday, February 5, 2019 11:40 AM
To: scikit-learn at python.org
Subject: Re: [scikit-learn] Scikit-learn porting strategy

There's some stuff already:
https://github.com/SciRuby/

And in terms of strategy:
No, you can go estimator by estimator and at some point implement cross-validation and grid-search and pipelines and metrics pretty independently.

It looks like daru is written in ruby which I expect to be too slow.
nmatrix is written in C++, so I guess you'd have to write many of the algorithms in C++.

At that point it might be easier to wrap an existing C++ library like mlpack or shogun.

On 2/5/19 6:12 AM, Joel Nothman wrote:

If you count things in Scipy and NumPy (and Joblib and Cython?) that Scikit-learn depends on and which may be lacking or hard to find in SciRuby, it's much much more than 39 years. PyCall, and potentially some Scikit-learn-specific wrappers around it, seems a much more sensible approach.

_______________________________________________
scikit-learn mailing list
scikit-learn at python.org <mailto:scikit-learn at python.org> 
https://mail.python.org/mailman/listinfo/scikit-learn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190206/6f1803d4/attachment-0001.html>