Multi Armed Bandit Algorithms in Scikit-learn
Hi, This email is intended to initiate a discussion on whether it is worth adding Multi-Armed Bandit (MAB) algorithms in Scikit-learn. For those of you who have not heard of MAB algorithms, they are the simplest form of decision-making algorithms applicable whenever data with labels are not given beforehand and the objective is to try out different decisions, whenever a sample is seen, and learn which decision is the best in the long run. They are the simplest form of Reinforcement Learning algorithms. While they are not applicable for every decision-making tasks, they naturally fit into a number of problem settings where they are more sample efficient and simpler than the more advanced RL algorithms. For a number of applications : https://www.quora.com/In-what-kind-of-real-life-situations-can-we-use-a-mult.... If <https://ml-trckr.com/link/https%3A%2F%2Fwww.quora.com%2FIn-what-kind-of-real...> you want to know more about their usage, how they work or their advantages, feel free to let me know! I do feel that MAB algorithms should be a part of Scikit-learn since a lot of the interesting problems that we face regarding learning is about decision making. There are quite a few github repos with MAB implementations but their coverage is extremely limited and I do not know of any dedicated library on MABs. Companies like Yahoo, Microsoft, Google use MABs for Ad recommendation and search engine optimization but their code is not made public. Cheers, Touqir -- Computing Science Master's student at University of Alberta, Canada, specializing in Machine Learning. Website : https://ca.linkedin.com/in/touqir-sajed-6a95b1126 <https://ml-trckr.com/link/https%3A%2F%2Fca.linkedin.com%2Fin%2Ftouqir-sajed-...>
The corrected link : https://www.quora.com/In-what-kind-of-real-life-situations-can-we-use-a-mult... <https://ml-trckr.com/link/https%3A%2F%2Fwww.quora.com%2FIn-what-kind-of-real...> ; On Tue, Sep 4, 2018 at 11:23 AM Touqir Sajed <touqir@ualberta.ca> wrote:
Hi,
This email is intended to initiate a discussion on whether it is worth adding Multi-Armed Bandit (MAB) algorithms in Scikit-learn. For those of you who have not heard of MAB algorithms, they are the simplest form of decision-making algorithms applicable whenever data with labels are not given beforehand and the objective is to try out different decisions, whenever a sample is seen, and learn which decision is the best in the long run. They are the simplest form of Reinforcement Learning algorithms. While they are not applicable for every decision-making tasks, they naturally fit into a number of problem settings where they are more sample efficient and simpler than the more advanced RL algorithms. For a number of applications : https://www.quora.com/In-what-kind-of-real-life-situations-can-we-use-a-mult.... If <https://ml-trckr.com/link/https%3A%2F%2Fwww.quora.com%2FIn-what-kind-of-real...> you want to know more about their usage, how they work or their advantages, feel free to let me know!
I do feel that MAB algorithms should be a part of Scikit-learn since a lot of the interesting problems that we face regarding learning is about decision making. There are quite a few github repos with MAB implementations but their coverage is extremely limited and I do not know of any dedicated library on MABs. Companies like Yahoo, Microsoft, Google use MABs for Ad recommendation and search engine optimization but their code is not made public.
Cheers, Touqir
-- Computing Science Master's student at University of Alberta, Canada, specializing in Machine Learning. Website : https://ca.linkedin.com/in/touqir-sajed-6a95b1126 <https://ml-trckr.com/link/https%3A%2F%2Fca.linkedin.com%2Fin%2Ftouqir-sajed-...>
-- Computing Science Master's student at University of Alberta, Canada, specializing in Machine Learning. Website : https://ca.linkedin.com/in/touqir-sajed-6a95b1126
See http://scikit-learn.org/dev/faq.html#what-are-the-inclusion-criteria-for-new... and http://scikit-learn.org/dev/faq.html#why-is-there-no-support-for-deep-or-rei... Bandit algorithms require a fundamentally different kind of interface than what's in scikit-learn right now, as they are sequential decision making algorithms. On 09/04/2018 01:23 PM, Touqir Sajed wrote:
Hi,
This email is intended to initiate a discussion on whether it is worth adding Multi-Armed Bandit (MAB) algorithms in Scikit-learn. For those of you who have not heard of MAB algorithms, they are the simplest form of decision-making algorithms applicable whenever data with labels are not given beforehand and the objective is to try out different decisions, whenever a sample is seen, and learn which decision is the best in the long run. They are the simplest form of Reinforcement Learning algorithms. While they are not applicable for every decision-making tasks, they naturally fit into a number of problem settings where they are more sample efficient and simpler than the more advanced RL algorithms. For a number of applications : https://www.quora.com/In-what-kind-of-real-life-situations-can-we-use-a-mult.... If <https://ml-trckr.com/link/https%3A%2F%2Fwww.quora.com%2FIn-what-kind-of-real...> you want to know more about their usage, how they work or their advantages, feel free to let me know!
I do feel that MAB algorithms should be a part of Scikit-learn since a lot of the interesting problems that we face regarding learning is about decision making. There are quite a few github repos with MAB implementations but their coverage is extremely limited and I do not know of any dedicated library on MABs. Companies like Yahoo, Microsoft, Google use MABs for Ad recommendation and search engine optimization but their code is not made public.
Cheers, Touqir
-- Computing Science Master's student at University of Alberta, Canada, specializing in Machine Learning. Website : https://ca.linkedin.com/in/touqir-sajed-6a95b1126 <https://ml-trckr.com/link/https%3A%2F%2Fca.linkedin.com%2Fin%2Ftouqir-sajed-...>
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
participants (2)
-
Andreas Mueller -
Touqir Sajed