distinction between unzipping bytes and unzipping a file
John Machin
sjmachin at lexicon.net
Sat Jan 10 16:18:14 EST 2009
On Jan 11, 6:15 am, webcomm <rya... at gmail.com> wrote:
> On Jan 9, 6:07 pm, John Machin <sjmac... at lexicon.net> wrote:
>
> > Yup, it looks like it's encoded in utf_16_le, i.e. no BOM as
> > God^H^H^HGates intended:
>
> > >>> buff = open('data', 'rb').read()
> > >>> buff[:100]
>
> > '<\x00R\x00e\x00g\x00i\x00s\x00t\x00r\x00a\x00t\x00i\x00o\x00n\x00>
> > \x00<\x00B\x0
> > 0a\x00l\x00a\x00n\x00c\x00e\x00D\x00u\x00e\x00>
> > \x000\x00.\x000\x000\x000\x000\x0
> > 0<\x00/\x00B\x00a\x00l\x00a\x00n\x00c\x00e\x00D\x00u\x00e\x00>\x00<
> > \x00S\x00t\x0
> > 0a\x00t\x00'
> > >>> buff[:100].decode('utf_16_le')
>
> There it is. Thanks.
>
> > u'<Registration><BalanceDue>0.0000</BalanceDue><Stat'
>
> > > But if I return it to my browser with python+django,
> > > there are bad characters every other character
>
> > Please consider that we might have difficulty guessing what "return it
> > to my browser with python+django" means. Show actual code.
>
> I did stop and consider what code to show. I tried to show only the
> code that seemed relevant, as there are sometimes complaints on this
> and other groups when someone shows more than the relevant code. You
> solved my problem with decode('utf_16_le'). I can't find any
> description of that encoding on the WWW... and I thought *everything*
> was on the WWW. :)
Try searching using the official name UTF-16LE ... looks like a blind
spot in the approximate matching algorithm(s) used by the search engine
(s) that you tried :-(
> I didn't know the data was utf_16_le-encoded because I'm getting it
> from a service. I don't even know if *they* know what encoding they
> used. I'm not sure how you knew what the encoding was.
Actually looked at the raw data. Pattern appeared to be an alternation
of 1 "meaningful" byte and one zero ('\x00') byte: => UTF16*. No BOM
('\xFE\xFF' or '\xFF\xFE') at start of file: => UTF16-?E. First byte
is meaningful: => UTF16-LE.
> > Please consider reading the Unicode HOWTO at http://docs.python.org/howto/unicode.html
>
> Probably wouldn't hurt,
Definitely won't hurt. Could even help.
> though reading that HOWTO wouldn't have given
> me the encoding, I don't think.
It wasn't intended to give you the encoding. Just read it.
Cheers,
John
More information about the Python-list
mailing list