[Pythonmac-SIG] read content from latin-1 file, write it to ut8 file

Piet van Oostrum piet at cs.uu.nl
Tue Apr 18 10:27:52 CEST 2006


>>>>> "frank h." <frank.hoffsummer at gmail.com> (FH) wrote:

>FH> Hello,
>FH> I am using Mac Python 2.4.1 on Mac OS X 10.4 and I cannot seem to be able to
>FH> read from a latin-1 file and then write to a UTF8 file correctly

>FH> Using Textwrangler on OS X, I create a latin-1 file with some special
>FH> characters in it and save it as "test.txt"

>FH> I am reading the textfile as such:

>FH>    f = codecs.open('test.txt', 'r', 'latin-1')
>FH>    content = f.read()
>FH>    f.close()

>FH>    type(content)
>FH>    <type 'unicode'>

>FH> all good. I can even

>FH>    print content.encode('utf8')
>FH>    äöåäöäööåäöäöå

>FH> (having set sys.defaultencoding to 'utf8' in siteconfig.py).
>FH> Now I want to create a new utf8 file and write "content" into it. I do the
>FH> following:

>FH>    f=codecs.open('newtest.txt','w','utf-8')
>FH>    f.write(content)
>FH>    f.close()

>FH> my problem is, that when I open "newtest.txt" in Textwrangler again,
>FH> Textwrangler recognizes the file as "MacRoman" encoded and the content is
>FH> garbled.

Then that is Textwrangler's fault. Interpreting a utf-8 file as MacRoman
will indeed give garbage. Maybe you can configure Texwrangler to recognize
utf-8 files. Otherwise use an editor that does this well. This is not a
python problem, as the file should be (and probably is) generated in utf-8.
-- 
Piet van Oostrum <piet at cs.uu.nl>
URL: http://www.cs.uu.nl/~piet [PGP 8DAE142BE17999C4]
Private email: piet at vanoostrum.org


More information about the Pythonmac-SIG mailing list