how to write a unicode string to a file ?
Stef Mientki
stef.mientki at gmail.com
Fri Oct 16 20:07:57 EDT 2009
Stephen Hansen wrote:
> On Thu, Oct 15, 2009 at 4:43 PM, Stef Mientki <stef.mientki at gmail.com
> <mailto:stef.mientki at gmail.com>> wrote:
>
> hello,
>
> By writing the following unicode string (I hope it can be send on
> this mailing list)
>
> Bücken
>
> to a file
>
> fh.write ( line )
>
> I get the following error:
>
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc'
> in position 9: ordinal not in range(128)
>
> How should I write such a string to a file ?
>
>
> First, you have to understand that a file never really contains
> unicode-- not in the way that it exists in memory / in python when you
> type line = u'Bücken'. It contains a series of bytes that are an
> encoded form of that abstract unicode data.
>
> There's various encodings you can use-- UTF-8 and UTF-16 are in my
> experience the most common. UTF-8 is an ASCII-superset, and its the
> one I see most often.
>
> So, you can do:
>
> import codecs
> f = codecs.open('filepath', 'w', 'utf-8')
> f.write(line)
>
> To read such a file, you'd do codecs.open as well, just with a 'r'
> mode and not a 'w' mode.
Thanks guys,
I didn't know the codecs module,
and the codecs seems to be a good solution,
at least it can safely write a file.
But now I have to open that file in Excel 2000 ... 2007,
and I get something completely wrong.
After changing codecs to latin-1 or windows-1252,
everything works fine.
Which of the 2 should I use latin-1 or windows-1252 ?
And a more general question, how should I organize my Python programs ?
In general I've data coming from Excel, Delphi, SQLite.
In Python I always use wxPython, so I'm forced to use unicode.
My output often needs to be exported to Excel, SPSS, SQLite.
So would this be a good design ?
Excel | convert wxPython convert Excel
Delphi |===> to ===> in ===> to ===> SQLite
SQLite | unicode unicode latin-1 SPSS
thanks,
Stef Mientki
>
> Now, that uses a file object created with the "codecs" module which
> operates with theoretical unicode streams. It will automatically take
> any passed in unicode strings, encode them in the specified encoding
> (utf8), and write the resulting bytes out.
>
> You can also do that manually with a regular file object, via:
>
> f.write(line.encode("utf8"))
>
> If you are reading such a file later with a normal file object (e.g.,
> not one created with codecs.open), you would do:
>
> f = open('filepath', 'rb')
> byte_data = f.read()
> uni_data = byte_data.decode("utf8")
>
> That will convert the byte-encoded data back to real unicode strings.
> Be sure to do this even if it doesn't seem you need to if the file
> contains encoded unicode data (a thing you can only know based on
> documentation of whatever produced that file)... for example, a UTF8
> encoded file might look and work like a completely normal ASCII file,
> but if its really UTF8... eventually your code will break that one
> time someone puts in a non-ascii character. Since UTF8 is an ASCII
> superset, its indistinguishable from ASCII until it contains a
> non-ASCII character.
>
> HTH,
>
> --S
More information about the Python-list
mailing list