Puzzled by code pages
Adam Tauno Williams
awilliam at whitemice.org
Fri May 14 20:27:18 EDT 2010
I'm trying to process OpenStep plist files in Python. I have a parser
which works, but only for strict ASCII. However plist files may contain
accented characters - equivalent to ISO-8859-2 (I believe). For example
I read in the line:
>>> handle = open('file.txt', 'rb')
>>> data = handle.read()
>>> handle.close()
>>> data
' "skyp4_filelist_10201/localit\xc3\xa0 termali_sortfield" =
NSFileName;\n'
What is the correct way to re-encode this data into UTF-8 so I can use
unicode strings, and then write the output back to ISO8859-?
I can read the file using codecs as ISO8859-2, but it still doesn't seem
correct.
>>> handle = codecs.open('file.txt', 'rb', encoding='iso8859-2')
>>> data = handle.read()
>>> handle.close()
>>> data
u' "skyp4_filelist_10201/localit\u0102\xa0 termali_sortfield" =
NSFileName;\n'
More information about the Python-list
mailing list