[Tutor] checking if data files are good, readable, and exist

Alan Gauld alan.gauld at btinternet.com
Tue Jul 22 23:01:30 CEST 2008


"W W" <srilyk at gmail.com> wrote

> Am I wrong in thinking that /all/ files are stored as binary?

No, thats quite right.

> python opens them, it automagically opens them in a
> more readable format,

But that isn't. Python just reads the data and interprets it
as text if you specify a text file - the default - or as raw data
if you use rb.

Python doesn't alter the data in any way it simply assumes
that its text and interprets the bytes according to the current
alphabet. Thus it reads the value 65 and interprets it as 'A'
(assuming ASCII) in text mode or just as the bit pattern
01000001 in binary. The application must then interpret the
bits in whatever way it considers appropriate - ass an integer,
a bitmask, part of a graphic image etc.

The important point is that there is no distinction between
binary data or text data in the file itself its just how it is
interpreted that distinguishes them. (This is not completely
true on some OS where text files always have an EOF marker,
but it is itself just a binary value!)

None of which helps the OP other than to highlight the difficulty
of determining if a file in binary or not. We can sometimes
tell if a file is not text - if it uses ASCII - by looking at the 
range
of byte values, but thats sloooooowww... but we can never be
sure that a file is non text. (We can also check for common
file headers such as postscript, GIF, MP3, JPEG, MIDI, etc
etc but even they can be misleading if they just coincidentally
look valid)

HTH,

Alan G. 




More information about the Tutor mailing list