[scikit-learn] Example of a scikit-learn compatible classifier with C++ implementation of the algorithms

drh at aiwerkstatt.com drh at aiwerkstatt.com
Wed May 15 15:18:13 EDT 2019


I use a PYTHON BASED ECOSYSTEM (SCIKIT-LEARN, … ) FOR PROTOTYPING and  
I have a C++ BASED PRODUCTION SYSTEM. A scikit-learn compatible  
interface allows me to take advantage of scikit-learn’s ecosystem.  
Implementing the algorithm in C++ allows me to develop and test my  
algorithms already during prototyping.

I started with scikit-learn’s project template to roll my own decision  
tree and forest classifier and implemented the algorithms in a C++  
library, using Cython to create the Python bindings.

Starting out with a Python implementation, I experimented a little bit  
with implementing the algorithms in Cython. But I found that if you  
are proficient in Python and C++ coding, that implementing the  
algorithm directly in C++ was much faster than writing it in Cython.

I made this project available to everybody, because I think it could  
serve as an example or template for anybody who would like to roll  
their own scikit-learn compatible classifier with a C++ based  
implementation of the algorithms to be re-used in a production system.  
At least version 1.0.0 should be useful, after that it might become  
too complex to be used as an example.

Check it out:

READTHEDOCs: https://koho.readthedocs.io

  GITHUB: https://github.com/AIWerkstatt/koho

I tried to be consistent with scikit-learn’s decision tree and  
ensemble modules, and the basic concepts, including stack, samples LUT  
with in-place partitioning, incremental histogram updates, for the  
implementation of the classifiers are based on: G. Louppe,  
Understanding Random Forests, PhD Thesis, 2014. Thanks a lot Gilles  
for that comprehensive work on random forests!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190515/c9d7cc05/attachment.html>


More information about the scikit-learn mailing list