[scikit-learn] Is there a model for truncated regression in sklearn?

Gael Varoquaux gael.varoquaux at normalesup.org
Tue Jun 8 03:31:03 EDT 2021


Hi,

Scikit-learn does not cover this problem.

I think that it relates to what is called survival analysis. You'll find
a survival analysis package in Python at
https://lifelines.readthedocs.io/en/latest/

Best,

Gaël

On Tue, Jun 08, 2021 at 04:22:14PM +0900, Francois Berenger wrote:
> Hello,

> https://en.wikipedia.org/wiki/Truncated_regression_model

> Sometimes, data have missing samples when the target variable
> is above or below a threshold value.
> This is very often the case for biochemical data (e.g. target
> variable outside detection range of some lab equipment).

> I highly suspect some specific models could handle such datasets
> better than generic methods (i.e. train better models).

> Some points of entry, if that might help:

> - R has a truncreg package
>   https://cran.r-project.org/web/packages/truncreg/index.html
> - a related paper from the wikipedia page:
>   "Local likelihood estimation of truncated regression and
>   its partial derivatives: Theory and application"
> https://hal.archives-ouvertes.fr/hal-00520650/file/PEER_stage2_10.1016%252Fj.jeconom.2008.08.007.pdf

> I can provide a cleaned public regression dataset, if someone is interested,
> for tests
> (there are many such datasets in ChEMBL and PubChem by the way, but you need
> to know how
> to "featurize"/encode molecules).

> Regards,
> F.
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

-- 
    Gael Varoquaux
    Research Director, INRIA		  Visiting professor, McGill 
    http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux


More information about the scikit-learn mailing list