[scikit-learn] Is there a model for truncated regression in sklearn?

Francois Berenger mlists at ligand.eu
Tue Jun 8 03:22:14 EDT 2021


Hello,

https://en.wikipedia.org/wiki/Truncated_regression_model

Sometimes, data have missing samples when the target variable
is above or below a threshold value.
This is very often the case for biochemical data (e.g. target
variable outside detection range of some lab equipment).

I highly suspect some specific models could handle such datasets
better than generic methods (i.e. train better models).

Some points of entry, if that might help:

- R has a truncreg package
   https://cran.r-project.org/web/packages/truncreg/index.html
- a related paper from the wikipedia page:
   "Local likelihood estimation of truncated regression and
   its partial derivatives: Theory and application"
   
https://hal.archives-ouvertes.fr/hal-00520650/file/PEER_stage2_10.1016%252Fj.jeconom.2008.08.007.pdf

I can provide a cleaned public regression dataset, if someone is 
interested, for tests
(there are many such datasets in ChEMBL and PubChem by the way, but you 
need to know how
to "featurize"/encode molecules).

Regards,
F.


More information about the scikit-learn mailing list