[scikit-learn] Adding BM25 relevance function to sklearn.feature_extraction.text
Basil Beirouti
basilbeirouti at gmail.com
Mon Jun 13 22:44:36 EDT 2016
Hello all,
You can use sklearn.feature_extraction.text.TfidfVectorizer to learn a
corpus of documents and rank them in order of relevance to a new previously
unseen query.
BM25 works in a similar manner to TfidfVectorizer, but is more complex and
considered one of the most successful information retrieval algorithms.
I currently have code that implements BM25 quite efficiently to learn a
corpus of documents and I want to modify/port it to align with the
fit-transform framework of sklearn. I think it could fit neatly into the
current codebase.
Questions:
1.) Would this be a desirable feature?
2.) Any advice for how to proceed with this? Things to watch out for?
Any and all advice is welcome.
Thanks!
Basil
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20160613/dbd09d23/attachment.html>
More information about the scikit-learn
mailing list