[scikit-learn] Help With Text Classification
pybokeh at gmail.com
Wed Aug 2 22:01:36 EDT 2017
I am studying this example from scikit-learn's site:
The problem that I need to solve is very similar to this example, except I
additional feature column (part #) that is categorical of type string. My
label or target
values consist of just 2 values: 0 or 1.
With that additional feature column, I am transforming it with a
then I am encoding it with the OneHotEncoder.
Then I am concatenating that one-hot encoded column (part #) to the
feature column (complaint), which I had applied the CountVectorizer and
Then I chose the MultinomialNB model to fit my concatenated training data
The problem I run into is when I invoke the prediction, I get a dimension
Here's my jupyter notebook gist:
I would gladly appreciate it if someone can guide me where I went wrong.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the scikit-learn