[Tutor] Sentiment analysis read from a file
Peter Otten
__peter__ at web.de
Wed Mar 28 13:14:59 EDT 2018
Alan Gauld via Tutor wrote:
> On 28/03/18 11:07, theano orf wrote:
>> I am new in python and I am having problems of how to read a txt file and
>> insert the data in a list,
>
> Just a quick response, but your data is more than a text file its a CSV
> file so the rules change slightly. Especially since you are using the csv
> module.
>
> Your data file is not a CSV file - it is just space separated and the
> string is not quoted so the CSV default mode of operation won;t
> work on this data as you seem to expect it to,. You will need to
> specify the separator (as what? A space wiill split on each word...)
> CSV might not be the best option here a simple string split combined
> with slicing might be better.
>>> next(open("training.txt"))
'1\tThe Da Vinci Code book is just awesome.\n'
So the delimiter would be TAB:
>>> import csv
>>> next(csv.reader(open("training.txt"), delimiter="\t"))
['1', 'The Da Vinci Code book is just awesome.']
>> with open("training.txt", 'r') as file:
>
> The CSV module prefers binary files so open it with mode 'rb' not 'r'
That's no longer true for Python 3:
>>> next(csv.reader(open("training.txt", "rb"), delimiter="\t"))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
_csv.Error: iterator should return strings, not bytes (did you open the file
in text mode?)
However, as csv still does its own newline handling it's a good idea to get
into the habit of opening the file with newline="" as explained here:
https://docs.python.org/dev/library/csv.html#id3
>> reviews = list(csv.reader(file))
>
> Try printing the first 2 lines of reviews to check what you have.
> I suspect it's not what you think.
>
>> positive_review = [r[1] for r in reviews if r[0] == str(1)]
>
> str(1) is just '1' so you might as well just use that.
>
>> after the print I only take an empty array. Why is this happening? I am
>> attaching also the training.txt file
>
> See the comments above about your data format.
>
More information about the Tutor
mailing list