[scikit-learn] Help With Text Classification
pybokeh at gmail.com
Wed Aug 2 23:12:36 EDT 2017
Thanks Joel for recommending FeatureUnion. I did run across that. But for
just 2 features, I thought that might be overkill. I am aware of Pipeline
which the scikit-learn example explains very well, which I was going to
utilize once I finalize my script. I did not want to abstract away too
much early on since I am in the beginning stages of learning machine
learning and scikit-learn.
On Wed, Aug 2, 2017 at 10:38 PM, Joel Nothman <joel.nothman at gmail.com>
> Use a Pipeline to help avoid this kind of issue (and others). You might
> also want to do something like http://scikit-learn.org/
> On 3 August 2017 at 12:01, pybokeh <pybokeh at gmail.com> wrote:
>> I am studying this example from scikit-learn's site:
>> The problem that I need to solve is very similar to this example, except
>> I have one
>> additional feature column (part #) that is categorical of type string.
>> My label or target
>> values consist of just 2 values: 0 or 1.
>> With that additional feature column, I am transforming it with a
>> LabelEncoder and
>> then I am encoding it with the OneHotEncoder.
>> Then I am concatenating that one-hot encoded column (part #) to the
>> feature column (complaint), which I had applied the CountVectorizer and
>> TfidfTransformer transformations.
>> Then I chose the MultinomialNB model to fit my concatenated training data
>> The problem I run into is when I invoke the prediction, I get a dimension
>> mis-match error.
>> Here's my jupyter notebook gist:
>> I would gladly appreciate it if someone can guide me where I went wrong.
>> - Daniel
>> scikit-learn mailing list
>> scikit-learn at python.org
> scikit-learn mailing list
> scikit-learn at python.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the scikit-learn