[Tutor] converting encoded symbols from rss feed?
Serdar Tumgoren
zstumgoren at gmail.com
Fri Jun 19 15:29:50 CEST 2009
> OK, so newline is unicode, outfile.write() wants a plain string. What
> encoding do you want outfile to be in? Try something like
> outfile.write(newline.encode('utf-8'))
> or use the codecs module to create an output that knows how to encode.
Aha!! The second of the two options above did the trick! It appears I
needed to open my "outfile" with utf-8 encoding. After that, I was
able to write out cleaned lines without any hitches.
Below is the working code. And of course, many thanks for the help!!
infile = open('test.txt','rb')
#infile = codecs.open('test.txt','rb','utf-8')
outfile = codecs.open('test_cleaned.txt','wb','utf-8')
for line in infile:
cleanline = strip_html(translate_code(line)).strip()
if cleanline:
outline = cleanline + '\n'
outfile.write(outline)
else:
continue
More information about the Tutor
mailing list