[scikit-learn] combining datasets from different sources

Thomas Evangelidis tevang3 at gmail.com
Tue Sep 5 09:39:03 EDT 2017


I am working on a problem that involves predicting the binding affinity of
small molecules on a receptor structure (is regression problem, not
classification). I have multiple small datasets of molecules with measured
binding affinities on a receptor, but each dataset was measured in
different experimental conditions and therefore I cannot use them all
together as trainning set. So, instead of using them individually, I was
wondering whether there is a method to combine them all into a super
training set. The first way I could think of is to convert the binding
affinities to Z-scores and then combine all the small datasets of
molecules. But this is would be inaccurate because, firstly the datasets
are very small (10-50 molecules each), and secondly, the range of binding
affinities differs in each experiment (some datasets contain really strong
binders, while others do not, etc.). Is there any other approach to combine
datasets with values coming from different sources? Maybe if someone points
me to the right reference I could read and understand if it is applicable
to my case.




Dr Thomas Evangelidis

Post-doctoral Researcher
CEITEC - Central European Institute of Technology
Masaryk University
Kamenice 5/A35/2S049,
62500 Brno, Czech Republic

email: tevang at pharm.uoa.gr

          tevang3 at gmail.com

website: https://sites.google.com/site/thomasevangelidishomepage/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170905/ed7d4d0d/attachment.html>

More information about the scikit-learn mailing list