[Tutor] converting encoded symbols from rss feed?

Serdar Tumgoren zstumgoren at gmail.com
Fri Jun 19 15:29:50 CEST 2009


> OK, so newline is unicode, outfile.write() wants a plain string. What
> encoding do you want outfile to be in? Try something like
> outfile.write(newline.encode('utf-8'))
> or use the codecs module to create an output that knows how to encode.

Aha!! The second of the two options above did the trick! It appears I
needed to open my "outfile" with utf-8 encoding. After that, I was
able to write out cleaned lines without any hitches.

Below is the working code. And of course, many thanks for the help!!


    infile = open('test.txt','rb')
    #infile = codecs.open('test.txt','rb','utf-8')

    outfile = codecs.open('test_cleaned.txt','wb','utf-8')


    for line in infile:
        cleanline = strip_html(translate_code(line)).strip()
        if cleanline:
            outline = cleanline + '\n'
            outfile.write(outline)
        else:
            continue


More information about the Tutor mailing list