distinction between unzipping bytes and unzipping a file

webcomm ryandw at gmail.com
Sat Jan 10 20:15:45 CET 2009

On Jan 9, 6:07 pm, John Machin <sjmac... at lexicon.net> wrote:
> Yup, it looks like it's encoded in utf_16_le, i.e. no BOM as
> God^H^H^HGates intended:
> >>> buff = open('data', 'rb').read()
> >>> buff[:100]
> '<\x00R\x00e\x00g\x00i\x00s\x00t\x00r\x00a\x00t\x00i\x00o\x00n\x00>
> \x00<\x00B\x0
> 0a\x00l\x00a\x00n\x00c\x00e\x00D\x00u\x00e\x00>
> \x000\x00.\x000\x000\x000\x000\x0
> 0<\x00/\x00B\x00a\x00l\x00a\x00n\x00c\x00e\x00D\x00u\x00e\x00>\x00<
> \x00S\x00t\x0
> 0a\x00t\x00'>>> buff[:100].decode('utf_16_le')

There it is.  Thanks.

> u'<Registration><BalanceDue>0.0000</BalanceDue><Stat'
> >  But if I return it to my browser with python+django,
> > there are bad characters every other character
> Please consider that we might have difficulty guessing what "return it
> to my browser with python+django" means. Show actual code.

I did stop and consider what code to show.  I tried to show only the
code that seemed relevant, as there are sometimes complaints on this
and other groups when someone shows more than the relevant code.  You
solved my problem with decode('utf_16_le').  I can't find any
description of that encoding on the WWW... and I thought *everything*
was on the WWW.  :)

I didn't know the data was utf_16_le-encoded because I'm getting it
from a service.  I don't even know if *they* know what encoding they
used.  I'm not sure how you knew what the encoding was.

> Please consider reading the Unicode HOWTO athttp://docs.python.org/howto/unicode.html

Probably wouldn't hurt, though reading that HOWTO wouldn't have given
me the encoding, I don't think.


> Cheers,
> John

More information about the Python-list mailing list