[scikit-learn] how to create and execute a machine learning models in Java/JVM based application (in production) using Python

Gaurav gupta gupta.gaurav0125 at gmail.com
Sun Jun 12 13:24:40 EDT 2016


Hi All,



Could you please guide me on how to *create and execute *a machine learning
models/statistical models (regression, Decision tree, K means clustering,
Naive bayes, scorecard/linear/logistic regression etc. and GBM, GLM )
in *Java/JVM
based application* (in production).



We have an ETL sort of Java based product where one can do most of data
Preparation steps for machine learning, like data ingestion from JDBC,
files, HDFS, No SQL etc., joins and aggregations etc.(which are required
for Feature engineering) and now we want to add Analytics capabilities
using machine learning/statistical modeling.



Right now, we are using JPMML- evaluator
<https://github.com/jpmml/jpmml-evaluator> to score the models created in
PMML format using R and python (and Knime) but it needs three separate and
unconnected steps:-

 1- first step for data preparation in our Java/JVM application and save
the sampling data (training and test) data in csv file or in DB, - *<JAVA/JVM
BASED application>*

 *2-  Create a machine learning Model in R and python (and Knime) and
export it in PMML 4.2 format -  <NON JAVA BASED >*

 3- Import/deploy the PMML in our Java based application and use JPMML
evaluator to execute it in production. *<JAVA BASED>*



I am sure it's a common problem in machine learning as generally in
Production JAVA is preferred over Python or R. Could you suggest what is
the better approach(s) to *create* as well as *execute* a python/scikit
based machine learning model in JVM based application.



What are your thought to achieve the steps # 2 and #3 more seamlessly in a
JVM based application, without compromising *performance and usability*:-



1-     Call a java program which internally calls the python scikit script
<http://stackoverflow.com/questions/12738827/how-can-i-call-scikit-learn-classifiers-from-java>(under
the hood) to create a model in PMML <https://github.com/jpmml/jpmml-sklearn>
and then use JPMML evaluator. It will pretend to the user that he is in a
single JVM based application (better usability). I am not sure what are the
limitations and short coming of using PMML as not all features are
supported in jpmml-sklearn <https://github.com/jpmml/jpmml-sklearn>.

2-     Call a java program which internally calls the python script and do
the model creation as well as execution in an external python environment
and serialized the model and the results in a file/csv or in memory DB (or
cache, like hazelcast) from where the parent Java application will fetch
the results etc.. I researched that I can’t use Jython for executing
Sci-kit models
<http://stackoverflow.com/questions/12738827/how-can-i-call-scikit-learn-classifiers-from-java>
.

3-     Can I use Jep <https://github.com/mrj0/jep> (Embed Python in Java)
to embed Cpython in JVM ? Does anybody tried it for sci-kit models?



Alternatively, I should explore to use Mahout or weka  - java based machine
learning libraries in my JVM based application. (I need to support both
windows and non-windows platforms)



I am also exploring H2Oai which is java based. Does anybody tried it.


Regards

Gaurav
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20160612/25d2ab08/attachment.html>


More information about the scikit-learn mailing list