[scikit-learn] Pipegraph example: KMeans + LDA

Manuel Castejón Limas manuel.castejon at gmail.com
Wed Oct 24 04:11:06 EDT 2018

Dear all,
as a way of improving the documentation of PipeGraph we intend to provide
more examples of its usage. It was a popular demand to show application
cases to motivate its usage, so here it is a very simple case with two
steps: a KMeans followed by a LDA.


This short example points out the following challenges:
- KMeans is not a transformer but an estimator
- LDA score function requires the y parameter, while its input does not
come from a known set of labels, but from the previous KMeans
- Moreover, the GridSearchCV.fit call would also require a 'y' parameter
- It would be nice to have access to the output of the KMeans step as well.

PipeGraph is capable of addressing these challenges.

The rationale for this example lies in the identification-reconstruction
realm. In a scenario where the class labels are unknown, we might want to
associate the quality of the clustering structure to the capability of a
later model to be able to reconstruct this structure. So the basic idea
here is that if LDA is capable of getting good results it was because the
information of the KMeans was good enough for that purpose, hinting the
discovery of a good structure.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20181024/9d74012d/attachment.html>

More information about the scikit-learn mailing list