<div class="gmail_quote">On Thu, Dec 22, 2011 at 15:25, Stan Iverson <span dir="ltr"><<a href="mailto:iversonstan@gmail.com">iversonstan@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="gmail_quote"><div class="im">On Thu, Dec 22, 2011 at 10:58 AM, Chris Angelico <span dir="ltr"><<a href="mailto:rosuav@gmail.com" target="_blank">rosuav@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div><div>Firstly, are you using Python 2 or Python 3? Things will be slightly</div></div>
different, since the default 'str' object in Py3 is Unicode.<br></blockquote><div><br></div></div><div>2 </div><div class="im"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
I would guess that your page is being output as UTF-8; you may find<br>
that the solution is as easy as declaring the encoding of your text<br>
file when you read it in.<br></blockquote><div><br></div></div><div>So I tried this:</div><div><br></div><div><div>file = open(p + "2.txt")</div><div>for line in file:</div><div> print unicode(line, 'utf-8')</div>
</div></div></blockquote><div><br></div><div>Could you try using the 'open' function from the 'codecs' module?</div><div><br></div><div>file = codecs.open(p + "2.txt", "utf-8") # or whatever encoding your file is written in</div>
<div>for line in file:</div><div> print line</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="gmail_quote"><div>
<div><br></div><div>and got this error:</div><div><br></div><div><table border="0" cellpadding="0" cellspacing="0" width="100%"><tbody><tr><td bgcolor="#ffccee"><tt><small> 142</small> print unicode(line, 'utf-8')<br>
</tt></td></tr>
<tr><td><font color="#909090"><tt><small> 143</small> <br>
</tt></font></td></tr>
<tr><td><font color="#909090"><tt><small> 144</small> print '''<br /><br /><form id="signup" action="<a href="http://13gems.com/Sign_Up.py" target="_blank">http://13gems.com/Sign_Up.py</a>" method="post" target="_blank"><br>
</tt></font></td></tr>
<tr><td><small><font color="#909090"><em>builtin</em> <strong>unicode</strong> = <type 'unicode'>, <strong>line</strong> = '<span class="text"><font color="#c040c0">\r\n</font>'</font></small></td>
</tr></tbody></table>
<table border="0" cellpadding="0" cellspacing="0" width="100%">
<tbody><tr><td bgcolor="#d8bbff"><big> </big><a>/usr/lib64/python2.4/encodings/utf_8.py</a> in <strong>decode</strong>(input=<read-only buffer ptr 0x2b197e378454, size 21>, errors='strict')</td>
</tr>
<tr><td><font color="#909090"><tt><small> 14</small> <br>
</tt></font></td></tr>
<tr><td><font color="#909090"><tt><small> 15</small> def decode(input, errors='strict'):<br>
</tt></font></td></tr>
<tr><td bgcolor="#ffccee"><tt><small> 16</small> return codecs.utf_16_decode(input, errors, True)<br>
</tt></td></tr>
<tr><td><font color="#909090"><tt><small> 17</small> <br>
</tt></font></td></tr>
<tr><td><font color="#909090"><tt><small> 18</small> class StreamWriter(codecs.StreamWriter):<br>
</tt></font></td></tr>
<tr><td><small><font color="#909090"><em>global</em> <strong>codecs</strong> = <module 'codecs' from '/usr/lib64/python2.4/codecs.pyc'>, codecs.<strong>utf_16_decode</strong> = <built-in function utf_16_decode>, <strong>input</strong> = <read-only buffer ptr 0x2b197e378454, size 21>, <strong>errors</strong> = 'strict', <em>builtin</em> <strong>True</strong> = True</font></small></td>
</tr></tbody></table><p><strong>UnicodeDecodeError</strong>: 'utf16' codec can't decode byte 0x0a in position 20: truncated data
<br><tt><small> </small> </tt>args =
('utf16', '<span class="text"><font color="#c040c0">\r\n</font>', 20, 21, 'truncated data')
<br><tt><small> </small> </tt>encoding =
'utf16'
<br><tt><small> </small> </tt>end =
21
<br><tt><small> </small> </tt>object =
'<span class="text"><font color="#c040c0">\r\n</font>'
<br><tt><small> </small> </tt>reason =
'truncated data'
<br><tt><small> </small> </tt>start =
20 </p><p>Tried it with utf-16 with same results.</p><p>TIA,</p><p>Stan</p></div></div></div>
<br>--<br>
<a href="http://mail.python.org/mailman/listinfo/python-list" target="_blank">http://mail.python.org/mailman/listinfo/python-list</a><br>
<br></blockquote></div><br><br clear="all"><div><br></div>-- <br>Rami Chowdhury<br>"Never assume malice when stupidity will suffice." -- Hanlon's Razor<br>+44-7581-430-517 / +1-408-597-7068 / +88-0189-245544<br>