[scikit-learn] Is there a model for truncated regression in sklearn?
Francois Berenger
mlists at ligand.eu
Tue Jun 8 03:22:14 EDT 2021
Hello,
https://en.wikipedia.org/wiki/Truncated_regression_model
Sometimes, data have missing samples when the target variable
is above or below a threshold value.
This is very often the case for biochemical data (e.g. target
variable outside detection range of some lab equipment).
I highly suspect some specific models could handle such datasets
better than generic methods (i.e. train better models).
Some points of entry, if that might help:
- R has a truncreg package
https://cran.r-project.org/web/packages/truncreg/index.html
- a related paper from the wikipedia page:
"Local likelihood estimation of truncated regression and
its partial derivatives: Theory and application"
https://hal.archives-ouvertes.fr/hal-00520650/file/PEER_stage2_10.1016%252Fj.jeconom.2008.08.007.pdf
I can provide a cleaned public regression dataset, if someone is
interested, for tests
(there are many such datasets in ChEMBL and PubChem by the way, but you
need to know how
to "featurize"/encode molecules).
Regards,
F.
More information about the scikit-learn
mailing list