<div dir="ltr"><div>Hi Bob, Thanks, very much, for your quick and detailed reply. This is just a utility script to read some sentiment analysis data to manipulate the positive and negative sentiments of multiple people into a single sentiment per line. The data I got was from some public domain which I have no control over. What worked was Steve's suggestion to ignore the errors (I made sure that my results are not messed up when I choose to ignore the errors). <br>

</div>Thanks for the other suggestions. I haven't done much of file I/O in python. Hence the crude method that I used.<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Oct 28, 2013 at 7:31 PM, bob gailer <span dir="ltr"><<a href="mailto:bgailer@gmail.com" target="_blank">bgailer@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 10/28/2013 6:13 PM, SM wrote:<br>

> Hello,<br>

Hi welcome to the Tutor list<div class="im"><br>

<br>

> I have an extremely simple piece of code<br>

<br></div>

which could be even simpler - see my comments below<div class="im"><br>

<br>

> which reads a .csv file, which has 1000 lines of fixed fields, one line at a time, and tries to print some values.<br>

><br>

>   1 #!/usr/bin/python3<br>

>   2 #<br>

>   3 import sys, time, re, os<br>

>   4<br>

>   5 if __name__=="__main__":<br>

>   6<br>

>   7     ifd = open("infile.csv", 'r')<br>

<br></div>

The simplest way to discard the first line is to follow the open with<br>

8     ifd.readline()<br>

<br>

The simplest way to track line number is<br>

<br>

10     for linenum, line in enumerate(ifd, 1):<br>

<br>

>  11         line1 = line.split(",")<br>

<br>

FWIW you don't need re to do this split<br>

<br>

>  12         total = 0<div class="im"><br>

>  19         print("LINE: ", linenum, line1[1])<br>

>  20         for i in range(1,8):<br>

>  21             if line1[i].strip():<br>

>  22                 print("line[i] ", int(line1[i]))<br>

>  23                 total = total + int(line1[i])<br>

>  24         print("Total: ", total)<br>

>  25<br>

>  26         if total >= 4:<br>

>  27             print("POSITIVE")<br>

>  28         else:<br>

>  29             print("Negative")<br></div>

>  31     ifd.close()<br>

<br>

That should have () after it, since it is a method call.<div class="im"><br>

><br>

> It works fine till  it parses the 1st 126 lines in the input file. For the 127th line (irrespective of the contents of the actual line), it prints the following error:<br>

> Traceback (most recent call last):<br>

>   File "p1.py", line 10, in <module><br>

>     for line in ifd:<br>

>   File "/usr/lib/python3.2/codecs.py"<u></u>, line 300, in decode<br>

>     (result, consumed) = self._buffer_decode(data, self.errors, final)<br>

> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 1173: invalid continuation byte<br></div>

Do you get exactly the same message irrespective of the contents of the actual line?<br>

<br>

"Code points larger than 127 are represented by multi-byte sequences, composed of a leading byte and one or more continuation bytes. The leading byte has two or more high-order 1s followed by a 0, while continuation bytes all have '10' in the high-order position."<br>


<br>

This suggests that a byte close to the end of the previous line is "leading byte"and therefore a continuation byte was expected but where the 0xe9was found.<br>

<br>

BTWhen I divide 1173 by 126 I get something close to 9 characters per lne. That is not possible, as there would have to be at least 16 characters in each line.<br>

<br>

Best you send us at least the first 130 lines so we can play with the file.<span class="HOEnZb"><font color="#888888"><br>

<br>

-- <br>

Bob Gailer<br>

<a href="tel:919-636-4239" value="+19196364239" target="_blank">919-636-4239</a><br>

Chapel Hill NC<br>

<br>

</font></span></blockquote></div><br></div>