[Pythonmac-SIG] read content from latin-1 file, write it to ut8 file
Piet van Oostrum
piet at cs.uu.nl
Tue Apr 18 10:27:52 CEST 2006
>>>>> "frank h." <frank.hoffsummer at gmail.com> (FH) wrote:
>FH> Hello,
>FH> I am using Mac Python 2.4.1 on Mac OS X 10.4 and I cannot seem to be able to
>FH> read from a latin-1 file and then write to a UTF8 file correctly
>FH> Using Textwrangler on OS X, I create a latin-1 file with some special
>FH> characters in it and save it as "test.txt"
>FH> I am reading the textfile as such:
>FH> f = codecs.open('test.txt', 'r', 'latin-1')
>FH> content = f.read()
>FH> f.close()
>FH> type(content)
>FH> <type 'unicode'>
>FH> all good. I can even
>FH> print content.encode('utf8')
>FH> äöåäöäööåäöäöå
>FH> (having set sys.defaultencoding to 'utf8' in siteconfig.py).
>FH> Now I want to create a new utf8 file and write "content" into it. I do the
>FH> following:
>FH> f=codecs.open('newtest.txt','w','utf-8')
>FH> f.write(content)
>FH> f.close()
>FH> my problem is, that when I open "newtest.txt" in Textwrangler again,
>FH> Textwrangler recognizes the file as "MacRoman" encoded and the content is
>FH> garbled.
Then that is Textwrangler's fault. Interpreting a utf-8 file as MacRoman
will indeed give garbage. Maybe you can configure Texwrangler to recognize
utf-8 files. Otherwise use an editor that does this well. This is not a
python problem, as the file should be (and probably is) generated in utf-8.
--
Piet van Oostrum <piet at cs.uu.nl>
URL: http://www.cs.uu.nl/~piet [PGP 8DAE142BE17999C4]
Private email: piet at vanoostrum.org
More information about the Pythonmac-SIG
mailing list