Sentiment analysis using sklearn
qrious
mittra at juno.com
Sun Jan 28 01:59:00 EST 2018
On Saturday, January 27, 2018 at 5:21:15 PM UTC-8, Dan Stromberg wrote:
> On Sat, Jan 27, 2018 at 1:05 PM, qrious wrote:
> > I am attempting to understand how scikit learn works for sentiment analysis and came across this blog post:
> >
> > https://marcobonzanini.wordpress.com/2015/01/19/sentiment-analysis-with-python-and-scikit-learn
> >
> > The corresponding code is at this location:
> >
> > https://gist.github.com/bonzanini/c9248a239bbab0e0d42e
> >
> > My question is while trying to predict, why does the curr_class in Line 44 of the code need a classification (pos or neg) for the test data? After all, am I not trying to predict it? Without any initial value of curr_class, the program has a run time error.
>
> I'm a real neophyte when it comes to modern AI, but I believe the
> intent is to divide your inputs into "training data" and "test data"
> and "real world data".
>
> So you create your models using training data including correct
> classifications as part of the input.
>
> And you check how well your models are doing on inputs they haven't
> seen before with test data, which also is classified in advance, to
> verify how well things are working.
>
> And then you use real world, as-yet-unclassified data in production,
> after you've selected your best model, to derive a classification from
> what your model has seen in the past.
>
> So both the training data and test data need accurate labels in
> advance, but the real world data trusts the model to do pretty well
> without further labeling.
Dan,
Thanks and I was also thinking along this line: 'So both the training data and test data need accurate labels in advance'. It makes sense to me.
For this part: 'the real world data trusts the model to do pretty well without further labeling', the question is: how do I do this using sklearn library functions? Is there some code example for using the actual data that needs prediction?
More information about the Python-list
mailing list