[scikit-learn] Help With Text Classification
joel.nothman at gmail.com
Thu Aug 3 18:29:10 EDT 2017
pipeline helps in prediction time too.
On 4 Aug 2017 7:49 am, "pybokeh" <pybokeh at gmail.com> wrote:
> I found my problem. When I one-hot encoded my test part #, it resulted in
> being a 1x1 matrix, when I need it to be a 1x153. This happened because I
> used the default setting ('auto') for n_values, when I needed it set it to
> 153. Now when I horizontally stacked it to my other feature matrix, the
> resulting total # of columns now correctly comes to 1294, instead of
> 1142. Looking back now, not sure if using Pipeline or using FeatureUnion
> would have helped in this case or prevented this since this error occurred
> on the prediction side, not on training or modeling side.
> On Wed, Aug 2, 2017 at 10:38 PM, Joel Nothman <joel.nothman at gmail.com>
>> Use a Pipeline to help avoid this kind of issue (and others). You might
>> also want to do something like http://scikit-learn.org/stable
>> On 3 August 2017 at 12:01, pybokeh <pybokeh at gmail.com> wrote:
>>> I am studying this example from scikit-learn's site:
>>> The problem that I need to solve is very similar to this example, except
>>> I have one
>>> additional feature column (part #) that is categorical of type string.
>>> My label or target
>>> values consist of just 2 values: 0 or 1.
>>> With that additional feature column, I am transforming it with a
>>> LabelEncoder and
>>> then I am encoding it with the OneHotEncoder.
>>> Then I am concatenating that one-hot encoded column (part #) to the
>>> feature column (complaint), which I had applied the CountVectorizer and
>>> TfidfTransformer transformations.
>>> Then I chose the MultinomialNB model to fit my concatenated training
>>> data with.
>>> The problem I run into is when I invoke the prediction, I get a
>>> dimension mis-match error.
>>> Here's my jupyter notebook gist:
>>> I would gladly appreciate it if someone can guide me where I went
>>> wrong. Thanks!
>>> - Daniel
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>> scikit-learn mailing list
>> scikit-learn at python.org
> scikit-learn mailing list
> scikit-learn at python.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the scikit-learn