[Tutor] reading binary file on windows and linux

Hugo Arts hugo.yoshi at gmail.com
Sun May 9 20:00:06 CEST 2010


On Sun, May 9, 2010 at 7:33 PM, Jan Jansen <knacktus at googlemail.com> wrote:
> Hello,
>
> I've got some trouble reading binary files with struct.unpack on windows.
> According to the documentation of the binary file's content, at the
> beginning there're some simple bytes (labeled as 'UChar: 8-bit unsigned
> byte'). Within those bytes there's a sequence to check the file's sanity.
> The sequence is (in ascii C-Notation):
> " "
> "\n"
> "\r"
> "\n"
> " "
> I've downloaded the file from the same website from two machines. One is a
> Windows 7 64-Bit, the other one is a virtual Linux machine. Now the trouble
> is while on linux everything is fine, on windows the carriage return does
> not appear when reading the file with struct.unpack.
>
> The file sizes on Linux and Windows are exaktly the same, and also my script
> determines the file sizes correctly on both plattforms (according to the
> OS). When I open the file on Windows in an editor and display the
> whitespaces, the linefeed and cariage-return are shown a expected.
>
> The code I'm using to check the first 80 bytes of the file is:
>
> import struct
> import sys
>
> with open(sys.argv[1]) as source:
>     size = struct.calcsize("80B")
>     raw_data = struct.unpack("80B", source.read(size))
>     for i, data in enumerate(raw_data):
>         print i, data, chr(data)
>     source.seek(0, 2)
>     print source.tell()
>

Since the file is binary, you should use the "b" mode when opening it:

with open(sys.argv[1], "rb") as source:

otherwise, the file will open in text mode, which converts newline
characters to/from a platform specific representation when reading or
writing. In windows, that representation is \r\n, meaning that that
sequence is converted to just \n when you read from the file. That is
why the carriage return disappears.

Hugo


More information about the Tutor mailing list