<div dir="ltr">Hi Sebastian,<div><br></div><div>Thanks for the advice! The model actually works on it's own in python fine luckily, so I don't think that that is the issue exactly. I have tried rolling my own estimator to wrap the pipeline to have it call the predict_proba method to return a dense array, however I then came across the problem that I would have to have that custom estimator defined on the Cloud ML end, which I'm unsure how to do.</div><div><br></div><div>Thanks,</div><div>Liam</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Apr 10, 2019 at 2:06 PM Sebastian Raschka <<a href="mailto:mail@sebastianraschka.com">mail@sebastianraschka.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Liam,<br>
<br>
not sure what your exact error message is, but it may also be that the XGBClassifier only accepts dense arrays? I think the TfidfVectorizer returns sparse arrays. You could probably fix your issues by inserting a "DenseTransformer" into your pipelone (a simple class that just transforms an array from a sparse to a dense format). I've implemented sth like that that you can import or copy&paste it from here:<br>
<br>
<a href="https://github.com/rasbt/mlxtend/blob/master/mlxtend/preprocessing/dense_transformer.py" rel="noreferrer" target="_blank">https://github.com/rasbt/mlxtend/blob/master/mlxtend/preprocessing/dense_transformer.py</a><br>
<br>
The usage would then basically be<br>
<br>
model = Pipeline([('tfidf', TfidfVectorizer()), ('to_dense', DenseTransformer()), ('clf', OneVsRestClassifier(XGBClassifier()))])<br>
<br>
Best,<br>
Sebastian<br>
<br>
<br>
<br>
<br>
> On Apr 10, 2019, at 12:25 PM, Liam Geron <<a href="mailto:liam@chatdesk.com" target="_blank">liam@chatdesk.com</a>> wrote:<br>
> <br>
> Hi all,<br>
> <br>
> I was hoping to get some guidance re: changing the result of the predict method of the OneVsRestClassifier to return a dense array rather than a sparse array, given that Google Cloud ML only accepts dense numpy arrays as a result of a given models predict method. Right now my model architecture looks like:<br>
> <br>
> model = Pipeline([('tfidf', TfidfVectorizer()), ('clf', OneVsRestClassifier(XGBClassifier()))])<br>
> <br>
> Which returns a sparse array with the predict method. I saw the Stack Overflow post here: <a href="https://stackoverflow.com/questions/52151548/google-cloud-ml-engine-scikit-learn-prediction-probability-predict-proba" rel="noreferrer" target="_blank">https://stackoverflow.com/questions/52151548/google-cloud-ml-engine-scikit-learn-prediction-probability-predict-proba</a><br>
> <br>
> which recommends overwriting the predict method with the predict_proba method, however I found that I can't serialize the model after doing so. I also have a stack overflow post here: <a href="https://stackoverflow.com/questions/55366454/how-to-convert-scikit-learn-onevsrestclassifier-predict-method-output-to-dense-a" rel="noreferrer" target="_blank">https://stackoverflow.com/questions/55366454/how-to-convert-scikit-learn-onevsrestclassifier-predict-method-output-to-dense-a</a> which details the specific pickling error.<br>
> <br>
> Is this a known issue? Is there an accepted way to convert this into a dense array?<br>
> <br>
> Thanks,<br>
> Liam Geron<br>
> _______________________________________________<br>
> scikit-learn mailing list<br>
> <a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a><br>
> <a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>
<br>
_______________________________________________<br>
scikit-learn mailing list<br>
<a href="mailto:scikit-learn@python.org" target="_blank">scikit-learn@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/scikit-learn" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/scikit-learn</a><br>
</blockquote></div>