[scikit-learn] Multi Armed Bandit Algorithms in Scikit-learn

Touqir Sajed touqir at ualberta.ca
Tue Sep 4 13:23:37 EDT 2018


This email is intended to initiate a discussion on whether it is worth
adding Multi-Armed Bandit (MAB) algorithms in Scikit-learn. For those of
you who have not heard of MAB algorithms, they are the simplest form of
decision-making algorithms applicable whenever data with labels are not
given beforehand and the objective is to try out different decisions,
whenever a sample is seen, and learn which decision is the best in the long
run. They are the simplest form of Reinforcement Learning algorithms. While
they are not applicable for every decision-making tasks, they naturally fit
into a number of problem settings where they are more sample efficient and
simpler than the more advanced RL algorithms. For a number of applications
you want to know more about their usage, how they work or their advantages,
feel free to let me know!

I do feel that MAB algorithms should be a part of Scikit-learn since a lot
of the interesting problems that we face regarding learning is about
decision making. There are quite a few github repos with MAB
implementations but their coverage is extremely limited and I do not know
of any dedicated library on MABs. Companies like Yahoo, Microsoft, Google
use MABs for Ad recommendation and search engine optimization but their
code is not made public.


Computing Science Master's student at University of Alberta, Canada,
specializing in Machine Learning. Website :
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20180904/17b93ee0/attachment.html>

More information about the scikit-learn mailing list