Bug with win32 open and utf-16 file
derek / nul
abuseonly at sgrail.org
Sun Aug 24 06:13:44 CEST 2003
On Sun, 24 Aug 2003 05:40:56 +0300, Christos "TZOTZIOY" Georgiou
<tzot at sil-tec.gr> wrote:
>On Sun, 24 Aug 2003 00:29:27 GMT, rumours say that derek / nul
><abuseonly at sgrail.org> might have written:
>>>[snip opening 'rb' a UTF-16 file]
>>>>The original file has line terminator characters of 00 0d 00 0a.
>>>>After being read into a variable or a list the line termination characters have
>>>>been changed to 00 0a 00 0a
>I believe you don't explain completely your situation, or maybe I am
>missing something. The code you posted, which I will quote right here:
>># win32 python 2.3
>>import sys, string, codecs
>>#eng_file = list (open("c:/program files/microsoft games/train
>>simulator/trains/trainset/dash9/dash9.eng", "rb").read()) # read the whole file
>>eng_file = open("c:/program files/microsoft games/train
>>simulator/trains/trainset/dash9/dash9.eng", "rb").read() # read the whole file
>>print hexdump (eng_file) # ok
>opens the file in 'read binary' mode, and without further processing you
>pass the data just read to the function hexdump (presumably written by
>you). The output of the hex dump shows only \u000a and not any \u000d.
>I tried it by downloading your file and ran:
hexdump (not mine)
addr, bytes, ascii = "$00000000", [' '] * 20, [' '] * 20
result = cStringIO.StringIO()
print >>result, "Dump of %d Bytes (type %s)" % (len(data), type(data))
for i in range(len(data)):
byte = ord(data[i])
bytes[i%20], ascii[i%20] = "%02X" % byte, printable_chars[byte]
if i%4 == 3: bytes[i%20] += " "
if (i % 20) == 19:
print >>result, addr, ''.join(bytes), ''.join(ascii)
addr, bytes, ascii = "$%08X" % (i+1), [' '] * 20, [' '] * 20
i = len(data)
if i and (i % 20) <> 0:
print >>result, addr, "%-45s" % ''.join(bytes).strip(),
printable_chars = ['.'] * 256
for __c in
printable_chars[ord(__c)] = __c
>>>> data = file("c:/dash9.eng", "rb").read()
Interesting, I have just done the same and it would appear that the hexdump
routine is changing 0d to 0a ???
output from my screen
W\x00a\x00g\x00o\x00n\x00 \x00(\x00 \x00D\x00a\x00s\x00h\x009\x00
This sequence is correct
This will teach me to have faith in code I have not tested!!
>which seems correct to me (\r is 0x0d, \n is 0x0a, and the file is
>indeed utf-16 le (little endian). Which open command are you using? If
>you just enter open in the python prompt, does it show:
I am not running in immediate mode, I am running 'python apply_physics.pl'
I don't understand what the data[-2:] is doing, could you explain please or
point to some notes on this.
More information about the Python-list