[Tutor] string encoding

Lie Ryan lie.1296 at gmail.com
Fri Jun 18 12:35:44 CEST 2010


On 06/18/10 14:21, Rick Pasotto wrote:
>> Remember, even if your terminal display is restricted to ASCII, you can
>> still use Beautiful Soup to parse, process, and write documents in UTF-8
>> and other encodings. You just can't print certain strings with print.
> 
> I can print the string fine. It's f.write(string_with_unicode) that fails with:
> 
> UnicodeEncodeError: 'ascii' codec can't encode characters in position 31-32: ordinal not in range(128)
> 
> Shouldn't I be able to f.write() *any* 8bit byte(s)?
> 
> repr() gives: u"Realtors\\xc2\\xae"
> 
> BTW, I'm running python 2.5.5 on debian linux.
> 

The FAQ explains half of it, except that in your case, substitute what
it says about "terminal" with "file object". Python plays it safe and
does not implicitly encode a unicode string when writing into a file. If
you have a unicode string and you want to .write() that unicode string
to a file, you need to .encode() the string first, so:

string_with_unicode = u"Realtors\xc2\xae"
f.write(string_with_unicode.encode('utf-8'))

otherwise, you can use the codecs module to wrap the file object:

f = codecs.open('filename.txt', 'w', encoding="utf-8")
f.write(string_with_unicode) # now you can send unicode string to f




More information about the Tutor mailing list