[Tutor] reading binary file on windows and linux

spir ☣ denis.spir at gmail.com
Sun May 9 20:16:44 CEST 2010


On Sun, 9 May 2010 19:33:51 +0200
Jan Jansen <knacktus at googlemail.com> wrote:

> Hello,
> 
> I've got some trouble reading binary files with struct.unpack on windows.
> According to the documentation of the binary file's content, at the
> beginning there're some simple bytes (labeled as 'UChar: 8-bit unsigned
> byte'). Within those bytes there's a sequence to check the file's sanity.
> The sequence is (in ascii C-Notation):
> " "
> "\n"
> "\r"
> "\n"
> " "
> I've downloaded the file from the same website from two machines. One is a
> Windows 7 64-Bit, the other one is a virtual Linux machine. Now the trouble
> is while on linux everything is fine, on windows the carriage return does
> not appear when reading the file with struct.unpack.
> 
> The file sizes on Linux and Windows are exaktly the same, and also my script
> determines the file sizes correctly on both plattforms (according to the
> OS). When I open the file on Windows in an editor and display the
> whitespaces, the linefeed and cariage-return are shown a expected.
> 
> The code I'm using to check the first 80 bytes of the file is:
> 
> import struct
> import sys
> 
> with open(sys.argv[1]) as source:
>     size = struct.calcsize("80B")
>     raw_data = struct.unpack("80B", source.read(size))
>     for i, data in enumerate(raw_data):
>         print i, data, chr(data)
>     source.seek(0, 2)
>     print source.tell()

I guess (but am not 100% sure because never use 'b'), the issue will be solved using:

   with open(sys.argv[1], 'rb') as source:

The reason is by default files are opened in read 'r' and text mode. In text mode, whatever char seq is used by a given OS with the sense of "line separator" ("\r\n' under win) is silently converted by python to a canonical code made of the single '\n' (char #0xa). So that, in your case, in the header sub-sequence '\r'+'\n' you lose '\r'.
In so-called bynary mode 'b' instead, python does not perform this replacement anymore, so that you get the raw byte sequence.

Hope I'm right on this and it helps.


Denis
________________________________

vit esse estrany ☣

spir.wikidot.com


More information about the Tutor mailing list