not quite 1252

Sat Apr 29 06:08:12 EDT 2006

Martin v. Löwis wrote:

> Well, if the document is UTF-8, you should decode it as UTF-8, of
> course.

Thanks. This and:

http://en.wikipedia.org/wiki/UTF-8

solved my problem with understanding the encoding.

Anton

proof that I understand it now (please anyone, prove me wrong if you can):

from zipfile   import ZipFile, ZIP_DEFLATED

def by80(seq):
     it = iter(seq)
     while it:
         yield ''.join(it.next() for i in range(80))

def utfCheck(infn):
     zin   = ZipFile(infn, 'r', ZIP_DEFLATED)
     data = zin.read('content.xml').decode('utf-8')
     for line in by80(data):
         print line.encode('1252')

def test():
     infn = "xxx.sxw"
     utfCheck(infn)

if __name__=='__main__':
     test()