[scikit-learn] Help With Text Classification

Joel Nothman joel.nothman at gmail.com
Wed Aug 2 22:38:34 EDT 2017


Use a Pipeline to help avoid this kind of issue (and others). You might
also want to do something like
http://scikit-learn.org/stable/auto_examples/hetero_feature_union.html

On 3 August 2017 at 12:01, pybokeh <pybokeh at gmail.com> wrote:

> Hello,
> I am studying this example from scikit-learn's site:
> http://scikit-learn.org/stable/tutorial/text_analytics/
> working_with_text_data.html
>
> The problem that I need to solve is very similar to this example, except I
> have one
> additional feature column (part #) that is categorical of type string.  My
> label or target
> values consist of just 2 values: 0 or 1.
>
> With that additional feature column, I am transforming it with a
> LabelEncoder and
> then I am encoding it with the OneHotEncoder.
>
> Then I am concatenating that one-hot encoded column (part #) to the
> text/document
> feature column (complaint), which I had applied the CountVectorizer and
> TfidfTransformer transformations.
>
> Then I chose the MultinomialNB model to fit my concatenated training data
> with.
>
> The problem I run into is when I invoke the prediction, I get a dimension
> mis-match error.
>
> Here's my jupyter notebook gist:
> http://nbviewer.jupyter.org/gist/anonymous/59ba930a783571c85
> ef86ba41424b311
>
> I would gladly appreciate it if someone can guide me where I went wrong.
> Thanks!
>
> - Daniel
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170803/1746a230/attachment-0001.html>


More information about the scikit-learn mailing list