[Tutor] string encoding

Lie Ryan lie.1296 at gmail.com
Fri Jun 18 12:35:44 CEST 2010

On 06/18/10 14:21, Rick Pasotto wrote:
>> Remember, even if your terminal display is restricted to ASCII, you can
>> still use Beautiful Soup to parse, process, and write documents in UTF-8
>> and other encodings. You just can't print certain strings with print.
> I can print the string fine. It's f.write(string_with_unicode) that fails with:
> UnicodeEncodeError: 'ascii' codec can't encode characters in position 31-32: ordinal not in range(128)
> Shouldn't I be able to f.write() *any* 8bit byte(s)?
> repr() gives: u"Realtors\\xc2\\xae"
> BTW, I'm running python 2.5.5 on debian linux.

The FAQ explains half of it, except that in your case, substitute what
it says about "terminal" with "file object". Python plays it safe and
does not implicitly encode a unicode string when writing into a file. If
you have a unicode string and you want to .write() that unicode string
to a file, you need to .encode() the string first, so:

string_with_unicode = u"Realtors\xc2\xae"

otherwise, you can use the codecs module to wrap the file object:

f = codecs.open('filename.txt', 'w', encoding="utf-8")
f.write(string_with_unicode) # now you can send unicode string to f

More information about the Tutor mailing list