[Tutor] Sentiment analysis read from a file

Peter Otten __peter__ at web.de
Wed Mar 28 13:14:59 EDT 2018


Alan Gauld via Tutor wrote:

> On 28/03/18 11:07, theano orf wrote:
>> I am new in python and I am having problems of how to read a txt file and
>> insert the data in a list,
> 
> Just a quick response, but your data is more than a text file its a CSV
> file so the rules change slightly. Especially since you are using the csv
> module.
> 
> Your data file is not a CSV file - it is just space separated and the
> string is not quoted so the CSV default mode of operation won;t
> work on this data as you seem to expect it to,. You will need to
> specify the separator (as what? A space wiill split on each word...)
> CSV might not be the best option here a simple string split combined
> with slicing might be better.

>>> next(open("training.txt"))
'1\tThe Da Vinci Code book is just awesome.\n'

So the delimiter would be TAB:

>>> import csv
>>> next(csv.reader(open("training.txt"), delimiter="\t"))
['1', 'The Da Vinci Code book is just awesome.']

>> with open("training.txt", 'r') as file:
> 
> The CSV module prefers binary files so open it with mode 'rb' not 'r'

That's no longer true for Python 3:

>>> next(csv.reader(open("training.txt", "rb"), delimiter="\t"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
_csv.Error: iterator should return strings, not bytes (did you open the file 
in text mode?)

However, as csv still does its own newline handling it's a good idea to get 
into the habit of opening the file with newline="" as explained here:

https://docs.python.org/dev/library/csv.html#id3

>> reviews = list(csv.reader(file))
> 
> Try printing the first 2 lines of reviews to check what you have.
> I suspect it's not what you think.
> 
>>    positive_review = [r[1] for r in reviews if r[0] == str(1)]
> 
> str(1) is just '1' so you might as well just use that.
> 
>> after the print I only take an empty array. Why is this happening? I am
>> attaching also the training.txt file
> 
> See the comments above about your data format.
> 




More information about the Tutor mailing list