[scikit-learn] A custom loss function for GradientBoostingRegressor

Mon Mar 20 13:45:43 EDT 2017

Hello,

I would like to add a custom loss function for gradient boosting 
regression. The function is similar to least squares, except that for 
each example it is OK to either undershoot or overshoot the target - 
loss is zero then. There is an additional binary indicator called 
"under" telling us whether it is OK to undershoot or overshoot. For 
example:

y    under    p    loss
5    1        4    0
5    0        4    1
5    1        6    1
5    0        6    0

Below is my attempt at implementation. I have three questions:

1. Is it correct?
2. How would you pass "under" to the loss function?
3. Functions other than LeastSquaresError() seem to 
_update_terminal_regions_. Is this necessary in this case, and if so, 
how to do it?

     def __call__(self, y, pred, sample_weight=None):
         if sample_weight is None:
             squares = (y - pred.ravel()) ** 2.0

             # the custom part
             overshoot_ok = (pred > y) & (under == 0)
             undershoot_ok = (pred < y) & (under == 1)
             squares[overshoot_ok] = 0
             squares[undershoot_ok] = 0

             return np.mean(squares)
         else:
             (...)

     def negative_gradient(self, y, pred, **kargs):
         diffs = y - pred.ravel()

         overshoot_ok = (pred > y) & (under == 0)
         undershoot_ok = (pred < y) & (under == 1)
         diffs[overshoot_ok] = 0
         diffs[undershoot_ok] = 0

         return diffs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170320/3c06e9b7/attachment.html>