[scikit-learn] XGboost Classifier error
Startup Hire
blrstartuphire at gmail.com
Thu Apr 20 00:21:37 EDT 2017
Hi Olivier,
Thanks for your info.I will follow it from now on. Details of traceback
are given below:
----------Full traceback---------------
Fitting 3 folds for each of 10 candidates, totalling 30 fits
C:\Users\ssampathkumar\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\grid_search.py:43:
DeprecationWarning: This module was deprecated in version 0.18 in
favor of the model_selection module into which all the refactored
classes and functions are moved. This module will be removed in 0.20.
DeprecationWarning)
---------------------------------------------------------------------------OverflowError
Traceback (most recent call
last)<ipython-input-19-321b410b10ad> in <module>() 18 19 --->
20 random_search_sg.fit(scaled_data, labels) 21 22
print("RandomizedSearchCV took %.2f seconds for %d candidates"
C:\Users\ssampathkumar\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\grid_search.py
in fit(self, X, y) 1023
self.n_iter, 1024
random_state=self.random_state)-> 1025 return self._fit(X, y,
sampled_params)
C:\Users\ssampathkumar\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\grid_search.py
in _fit(self, X, y, parameter_iterable) 571
self.fit_params, return_parameters=True, 572
error_score=self.error_score)--> 573
for parameters in parameter_iterable 574 for
train, test in cv) 575
C:\Users\ssampathkumar\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py
in __call__(self, iterable) 756 # was dispatched. In
particular this covers the edge 757 # case of Parallel
used with an exhausted iterator.--> 758 while
self.dispatch_one_batch(iterator): 759
self._iterating = True 760 else:
C:\Users\ssampathkumar\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py
in dispatch_one_batch(self, iterator) 601 602 with
self._lock:--> 603 tasks =
BatchedCalls(itertools.islice(iterator, batch_size)) 604
if len(tasks) == 0: 605 # No more tasks available
in the iterator: tell caller to stop.
C:\Users\ssampathkumar\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py
in __init__(self, iterator_slice) 125 126 def
__init__(self, iterator_slice):--> 127 self.items =
list(iterator_slice) 128 self._size = len(self.items)
129
C:\Users\ssampathkumar\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\grid_search.py
in <genexpr>(.0) 567 pre_dispatch=pre_dispatch 568
)(--> 569
delayed(_fit_and_score)(clone(base_estimator), X, y, self.scorer_,
570 train, test, self.verbose,
parameters, 571
self.fit_params, return_parameters=True,
C:\Users\ssampathkumar\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\grid_search.py
in __iter__(self) 250 + " For exhaustive
searches, use GridSearchCV.") 251 for i in
sample_without_replacement(grid_size, self.n_iter,--> 252
random_state=rnd): 253
yield param_grid[i] 254
sklearn\utils\_random.pyx in
sklearn.utils._random.sample_without_replacement
(sklearn\utils\_random.c:3975)()
OverflowError: Python int too large to convert to C long
-------------------End of traceback-----------------------------
Shape of scaled_data and labels are: (772330, 15) and (772330,) (I tried
using scaled_data as CSR matrix as well as numpy array)
btw, when I run it separately (without *randomizedsearchCV*), it works fine
with the same dataset:
---- ---------------------------Code below runs
fine-------------------------------------
params_c = { 'n_estimators': 310, 'learning_rate': 0.1, 'min_child_weight':
5, 'max_depth': 10, 'gamma': 0, 'max_delta_step': 14, 'max_depth':5,
'subsample': 1, 'colsample_bytree': 1, 'colsample_bylevel': 1,
'reg_lambda': 1, 'reg_alpha': 0, 'scale_pos_weight': 1, 'objective':
'binary:logistic', 'silent': False, } c = xgb.XGBClassifier(**params_c)
X_train, X_test, y_train, y_test = train_test_split(scaled_data, labels)
from sklearn.metrics import confusion_matrix c.fit(X_train,y_train) y_pred
= c.predict(X_test) cm3 = confusion_matrix(y_test, y_pred) print(cm3)
---------End of code that runs fine --------------------
On Wed, Apr 19, 2017 at 4:45 PM, Olivier Grisel <olivier.grisel at ensta.org>
wrote:
> Please provide the full traceback. Without it it's impossible to tell
> whether the problem is in scikit-learn or xgboost.
>
> Also, please provide a minimal reproduction script as explained in:
>
> http://scikit-learn.org/stable/faq.html#what-s-the-
> best-way-to-get-help-on-scikit-learn-usage
>
> --
> Olivier
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170420/d2608649/attachment-0001.html>
More information about the scikit-learn
mailing list