Bug with win32 open and utf-16 file
derek / nul
abuseonly at sgrail.org
Sun Aug 24 00:13:44 EDT 2003
On Sun, 24 Aug 2003 05:40:56 +0300, Christos "TZOTZIOY" Georgiou
<tzot at sil-tec.gr> wrote:
>On Sun, 24 Aug 2003 00:29:27 GMT, rumours say that derek / nul
><abuseonly at sgrail.org> might have written:
>
>>>[snip opening 'rb' a UTF-16 file]
>>>
>>>>The original file has line terminator characters of 00 0d 00 0a.
>>>>After being read into a variable or a list the line termination characters have
>>>>been changed to 00 0a 00 0a
>
>I believe you don't explain completely your situation, or maybe I am
>missing something. The code you posted, which I will quote right here:
>
>>#!c:/program files/python/python.exe
>># win32 python 2.3
>>
>>import sys, string, codecs
>>#eng_file = list (open("c:/program files/microsoft games/train
>>simulator/trains/trainset/dash9/dash9.eng", "rb").read()) # read the whole file
>>eng_file = open("c:/program files/microsoft games/train
>>simulator/trains/trainset/dash9/dash9.eng", "rb").read() # read the whole file
>>
>>print hexdump (eng_file) # ok
>
>opens the file in 'read binary' mode, and without further processing you
>pass the data just read to the function hexdump (presumably written by
>you). The output of the hex dump shows only \u000a and not any \u000d.
>I tried it by downloading your file and ran:
hexdump (not mine)
import cStringIO
def hexdump(data):
global printable_chars
addr, bytes, ascii = "$00000000", [' '] * 20, [' '] * 20
result = cStringIO.StringIO()
print >>result, "Dump of %d Bytes (type %s)" % (len(data), type(data))
for i in range(len(data)):
byte = ord(data[i])
bytes[i%20], ascii[i%20] = "%02X" % byte, printable_chars[byte]
if i%4 == 3: bytes[i%20] += " "
if (i % 20) == 19:
print >>result, addr, ''.join(bytes), ''.join(ascii)
addr, bytes, ascii = "$%08X" % (i+1), [' '] * 20, [' '] * 20
i = len(data)
if i and (i % 20) <> 0:
print >>result, addr, "%-45s" % ''.join(bytes).strip(),
''.join(ascii)
return result.getvalue()
printable_chars = ['.'] * 256
for __c in
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!\"$:;-#+*%&/()=?'":
printable_chars[ord(__c)] = __c
>>>> data = file("c:/dash9.eng", "rb").read()
>>>> data[-10:]
>'\r\x00\n\x00)\x00\r\x00\n\x00'
Interesting, I have just done the same and it would appear that the hexdump
routine is changing 0d to 0a ???
output from my screen
>>> data[2:]
'S\x00I\x00M\x00I\x00S\x00A\x00@\x00@\x00@\x00@\x00@\x00@\x00@\x00@\x00@\x00@\x0
0J\x00I\x00N\x00X\x000\x00D\x000\x00t\x00_\x00_\x00_\x00_\x00_\x00_\x00
\r\x00\n\x00
\r\x00\n\x00
W\x00a\x00g\x00o\x00n\x00 \x00(\x00 \x00D\x00a\x00s\x00h\x009\x00
This sequence is correct
This will teach me to have faith in code I have not tested!!
>which seems correct to me (\r is 0x0d, \n is 0x0a, and the file is
>indeed utf-16 le (little endian). Which open command are you using? If
>you just enter open in the python prompt, does it show:
>
>>>> open
><type 'file'>
>
>or what?
I am not running in immediate mode, I am running 'python apply_physics.pl'
I don't understand what the data[-2:] is doing, could you explain please or
point to some notes on this.
thanks Derek
More information about the Python-list
mailing list