[scikit-learn] plan to add the association rule classification algorithm in scikit learn

Tue Dec 18 03:17:00 EST 2018

Hi All,

Just a short comment to "If you had an alternative algorithm for frequent
itemset generation in mind (I am not sure if others exist, to be honest). I
would also be happy about that one, too." There are many other techniques
and their modifications for related problems like sequence mining, see e.g.
here: http://www.philippe-fournier-viger.com/spmf/. In my opinion, a
notable difference for practice exists between frequent itemsets and closed
(frequent) itemsets; the latter may reduce an output drastically. However,
combinatorial explosion w.r.t. the number of produced patterns is an issue
here.

Best,
Dmitry

пн, 17 дек. 2018 г. в 10:12, Sebastian Raschka <mail at sebastianraschka.com>:

> Hi Rui,
>
> I agree with Joel that association rule mining could be a bit tricky to
> fit nicely within the scikit-learn API. Maybe this could be some
> transformer class? I thought about that a few years ago but remember that I
> couldn't come up with a good solution at that point.
>
> In any case, I have an association rule implementation in mlxtend (
> http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/association_rules/),
> which is based on the apriori algorithm. Some users were asking about Eclat
> and FP-Growth algorithms, instead of apriori. If you are interested in such
> a contribution, i.e., implementing Eclat or FP-Growth such that instead of
>
> frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True)
> association_rules(frequent_itemsets, metric="confidence",
> min_threshold=0.7)
>
> one could use
>
> frequent_itemsets = eclat(df, min_support=0.6, use_colnames=True)
>
> or
>
> frequent_itemsets = fpgrowth(df, min_support=0.6, use_colnames=True)
> association_rules(frequent_itemsets, metric="confidence",
> min_threshold=0.7)
>
> I would be very happy about such a contribution (see issue tracker at
> https://github.com/rasbt/mlxtend/issues/248)
>
> If you had an alternative algorithm for frequent itemset generation in
> mind (I am not sure if others exist, to be honest). I would also be happy
> about that one, too.
>
> Best,
> Sebastian
>
> > On Dec 17, 2018, at 12:26 AM, Joel Nothman <joel.nothman at gmail.com>
> wrote:
> >
> > Hi Rui,
> >
> > This has been discussed several times on the mailing list and issue
> tracker. We are not interested in association rule mining in Scikit-learn
> for its own purposes. We would be interested in association rule mining
> only as part of a classification algorithm. Are there such algorithms which
> are mature and popular enough to meet our inclusion criteria (see our FAQ)?
> >
> > Cheers,
> >
> > Joel
> >
> > On Mon, 17 Dec 2018 at 09:24, rui min <minminmail at hotmail.com> wrote:
> > Dear scikit-learn developers,
> >
> >    I am Rui from Spain, Granada University. Currently I am planning to
> write an association rule algorithm in scikit-learn.
> >    I don’t know if anyone is working on that. So avoid duplication of
> the work, I would like to ask here.
> >
> > Hope to hear from you soon.
> >
> >
> > Best Regards
> >
> >
> > Rui
> >
> >
> >
> >
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20181218/43108f25/attachment.html>