not quite 1252
Anton Vredegoor
anton.vredegoor at gmail.com
Sat Apr 29 06:08:12 EDT 2006
Martin v. Löwis wrote:
> Well, if the document is UTF-8, you should decode it as UTF-8, of
> course.
Thanks. This and:
http://en.wikipedia.org/wiki/UTF-8
solved my problem with understanding the encoding.
Anton
proof that I understand it now (please anyone, prove me wrong if you can):
from zipfile import ZipFile, ZIP_DEFLATED
def by80(seq):
it = iter(seq)
while it:
yield ''.join(it.next() for i in range(80))
def utfCheck(infn):
zin = ZipFile(infn, 'r', ZIP_DEFLATED)
data = zin.read('content.xml').decode('utf-8')
for line in by80(data):
print line.encode('1252')
def test():
infn = "xxx.sxw"
utfCheck(infn)
if __name__=='__main__':
test()
More information about the Python-list
mailing list