[Tutor] reading binary file on windows and linux
spir ☣
denis.spir at gmail.com
Sun May 9 20:16:44 CEST 2010
On Sun, 9 May 2010 19:33:51 +0200
Jan Jansen <knacktus at googlemail.com> wrote:
> Hello,
>
> I've got some trouble reading binary files with struct.unpack on windows.
> According to the documentation of the binary file's content, at the
> beginning there're some simple bytes (labeled as 'UChar: 8-bit unsigned
> byte'). Within those bytes there's a sequence to check the file's sanity.
> The sequence is (in ascii C-Notation):
> " "
> "\n"
> "\r"
> "\n"
> " "
> I've downloaded the file from the same website from two machines. One is a
> Windows 7 64-Bit, the other one is a virtual Linux machine. Now the trouble
> is while on linux everything is fine, on windows the carriage return does
> not appear when reading the file with struct.unpack.
>
> The file sizes on Linux and Windows are exaktly the same, and also my script
> determines the file sizes correctly on both plattforms (according to the
> OS). When I open the file on Windows in an editor and display the
> whitespaces, the linefeed and cariage-return are shown a expected.
>
> The code I'm using to check the first 80 bytes of the file is:
>
> import struct
> import sys
>
> with open(sys.argv[1]) as source:
> size = struct.calcsize("80B")
> raw_data = struct.unpack("80B", source.read(size))
> for i, data in enumerate(raw_data):
> print i, data, chr(data)
> source.seek(0, 2)
> print source.tell()
I guess (but am not 100% sure because never use 'b'), the issue will be solved using:
with open(sys.argv[1], 'rb') as source:
The reason is by default files are opened in read 'r' and text mode. In text mode, whatever char seq is used by a given OS with the sense of "line separator" ("\r\n' under win) is silently converted by python to a canonical code made of the single '\n' (char #0xa). So that, in your case, in the header sub-sequence '\r'+'\n' you lose '\r'.
In so-called bynary mode 'b' instead, python does not perform this replacement anymore, so that you get the raw byte sequence.
Hope I'm right on this and it helps.
Denis
________________________________
vit esse estrany ☣
spir.wikidot.com
More information about the Tutor
mailing list