how to write a unicode string to a file ?

Stef Mientki stef.mientki at gmail.com
Fri Oct 16 20:07:57 EDT 2009


Stephen Hansen wrote:
> On Thu, Oct 15, 2009 at 4:43 PM, Stef Mientki <stef.mientki at gmail.com 
> <mailto:stef.mientki at gmail.com>> wrote:
>
>     hello,
>
>     By writing the following unicode string (I hope it can be send on
>     this mailing list)
>
>       Bücken
>
>     to a file
>
>        fh.write ( line )
>
>     I get the following error:
>
>      UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc'
>     in position 9: ordinal not in range(128)
>
>     How should I write such a string to a file ?
>
>
> First, you have to understand that a file never really contains 
> unicode-- not in the way that it exists in memory / in python when you 
> type line = u'Bücken'. It contains a series of bytes that are an 
> encoded form of that abstract unicode data.
>
> There's various encodings you can use-- UTF-8 and UTF-16 are in my 
> experience the most common. UTF-8 is an ASCII-superset, and its the 
> one I see most often.
>
> So, you can do:
>
>   import codecs
>   f = codecs.open('filepath', 'w', 'utf-8')
>   f.write(line)
>
> To read such a file, you'd do codecs.open as well, just with a 'r' 
> mode and not a 'w' mode.
Thanks guys,
I didn't know the codecs module,
and the codecs seems to be a good solution,
at least it can safely write a file.
But now I have to open that file in Excel 2000 ... 2007,
and I get something completely wrong.
After changing codecs to latin-1 or windows-1252,
everything works fine.

Which of the 2 should I use latin-1 or windows-1252 ?

And a more general question, how should I organize my Python programs ?
In general I've data coming from Excel, Delphi, SQLite.
In Python I always use wxPython, so I'm forced to use unicode.
My output often needs to be exported to Excel, SPSS, SQLite.
So would this be a good design ?

Excel    |      convert        wxPython      convert        Excel
Delphi   |===>    to      ===>   in     ===>   to      ===> SQLite
SQLite   |      unicode        unicode       latin-1        SPSS

thanks,
Stef Mientki

>
> Now, that uses a file object created with the "codecs" module which 
> operates with theoretical unicode streams. It will automatically take 
> any passed in unicode strings, encode them in the specified encoding 
> (utf8), and write the resulting bytes out.
>
> You can also do that manually with a regular file object, via:
>
>   f.write(line.encode("utf8"))
>
> If you are reading such a file later with a normal file object (e.g., 
> not one created with codecs.open), you would do:
>
>   f = open('filepath', 'rb')
>   byte_data = f.read()
>   uni_data = byte_data.decode("utf8")
>
> That will convert the byte-encoded data back to real unicode strings. 
> Be sure to do this even if it doesn't seem you need to if the file 
> contains encoded unicode data (a thing you can only know based on 
> documentation of whatever produced that file)... for example, a UTF8 
> encoded file might look and work like a completely normal ASCII file, 
> but if its really UTF8... eventually your code will break that one 
> time someone puts in a non-ascii character. Since UTF8 is an ASCII 
> superset, its indistinguishable from ASCII until it contains a 
> non-ASCII character.

>
> HTH,
>
> --S




More information about the Python-list mailing list