Bug with win32 open and utf-16 file

Christos TZOTZIOY Georgiou tzot at sil-tec.gr
Sun Aug 24 04:40:56 CEST 2003


On Sun, 24 Aug 2003 00:29:27 GMT, rumours say that derek / nul
<abuseonly at sgrail.org> might have written:

>>[snip opening 'rb' a UTF-16 file]
>>
>>>The original file has line terminator characters of 00 0d 00 0a.
>>>After being read into a variable or a list the line termination characters have
>>>been changed to 00 0a 00 0a

I believe you don't explain completely your situation, or maybe I am
missing something.  The code you posted, which I will quote right here:

>#!c:/program files/python/python.exe
># win32 python 2.3
>
>import sys, string, codecs
>#eng_file = list (open("c:/program files/microsoft games/train
>simulator/trains/trainset/dash9/dash9.eng", "rb").read())	# read the whole file
>eng_file = open("c:/program files/microsoft games/train
>simulator/trains/trainset/dash9/dash9.eng", "rb").read()	# read the whole file
>
>print hexdump (eng_file)						# ok

opens the file in 'read binary' mode, and without further processing you
pass the data just read to the function hexdump (presumably written by
you).  The output of the hex dump shows only \u000a and not any \u000d.
I tried it by downloading your file and ran:

>>> data = file("c:/dash9.eng", "rb").read()
>>> data[-10:]
'\r\x00\n\x00)\x00\r\x00\n\x00'

which seems correct to me (\r is 0x0d, \n is 0x0a, and the file is
indeed utf-16 le (little endian).  Which open command are you using?  If
you just enter open in the python prompt, does it show:

>>> open
<type 'file'>

or what?
-- 
TZOTZIOY, I speak England very best,
Microsoft Security Alert: the Matrix began as open source.




More information about the Python-list mailing list