[ python-Bugs-816627 ] file read() forgets some bytes

SourceForge.net noreply at sourceforge.net
Tue Nov 11 19:18:46 EST 2003


Bugs item #816627, was opened at 2003-10-02 10:54
Message generated for change (Comment added) made by jfdutcher
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=816627&group_id=5470

Category: Python Library
Group: Not a Bug
Status: Closed
Resolution: Invalid
Priority: 5
Submitted By: Philippe Fremy (pfremy)
Assigned to: Nobody/Anonymous (nobody)
Summary: file read() forgets some bytes

Initial Comment:
It seems the python miscalculates the size of a file.
This is on windows, with python 2.2.2 from ActiveState

E:\work\jayacard\inkit\demo>python
ActivePython 2.2.2 Build 224 (ActiveState Corp.) based on
Python 2.2.2 (#37, Nov 26 2002, 10:24:37) [MSC 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for
more information.
>>> import array, os
>>> fname = "current_img.jpg"
>>> print len(open(fname).read())
41
>>> print os.stat(fname).st_size
693

As you can see, there some mismatch here. 693 is indeed
the size reported by the windows explorer.

>>> a = array.array('B')
>>> a.fromfile( open(fname), 693)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
EOFError: not enough items in file

>>> a.fromfile(open(fname), 42)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
EOFError: not enough items in file
>>> a.fromfile(open(fname), 41)
>>> a = array.array('B')
>>> a.fromfile(open(fname), 41)
>>> len(a)
41


I attach the nasty file

This is a very serious bug for me.

----------------------------------------------------------------------

Comment By: JohnF Dutcher (jfdutcher)
Date: 2003-11-11 19:18

Message:
Logged In: YES 
user_id=906889

Despite 'tim-one's assertion to the contrary, my own 
experience confirms that fromfile and array do NOT work 
properly when importing (fromfile) and exporting (tofile) to 
even binary files ( I use no other kind than binary). The tofile
(or even the write command) appears to properly increment
the file pointer after a given tofile is issued. A subsequent
fromfile (or a read) will NOT read fromthe same byte location 
in the file where the previous write terminated.....thus 
destroying file integrity, and also leading to inappropriate
EOF  messages.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2003-10-05 14:11

Message:
Logged In: YES 
user_id=31435

Text files must be opened in text mode in order to (at least) 
hide platform differences in how line endings are represented.  
For example, on Linux a line ends with \n, on Windows with 
\r\n, and on classic Macintoshes with \r.  Python has no 
control over what the native operating system conventions 
are.  But when a text file is opened in text mode, Python 
hides the platform differences, making all lines *appear* as if 
they ended with \n on input.  On output to a file opened in 
text mode, Python changes \n to whatever the platform 
convention is for ending a line.  If it didn't do this, text files 
created with Python would be unusable by non-Python 
programs on your system, and text files created by non-
Python programs on your system would be unusable by 
Python.

The distinction between text mode and binary mode didn't 
originate with Python, and all you can do is try to live with it.  
The distinction is simply a reality on most non-Unix systems 
(Unix treats text and binary mode exactly the same).

Note that it's not possible either to guess whether a file is 
*intended* to be text mode or binary mode by staring at its 
contents.  The distinction is in the mind of the programmer 
who created the file.  So it goes.

Portability isn't hard despite all this:  always open binary files 
in binary mode, and always open text files in text mode.  
That's it.

----------------------------------------------------------------------

Comment By: Philippe Fremy (pfremy)
Date: 2003-10-05 13:31

Message:
Logged In: YES 
user_id=233844

Thanks a lot for the answer. 
 
But is there any advantage in opening a file in ascii mode ? Why 
not open every windows file in binary mode by default, and treat 
it internall as an ascii file later on, if it turns out to be one ? 
 
With python returning always strings, the distinction makes very 
little sense.  
 
This is the kind of thing that will unfortunately make a program 
more difficult to port. I guess it is too late to change this 
behaviour. 

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2003-10-02 11:02

Message:
Logged In: YES 
user_id=31435

Windows makes a distinction between binary-mode and text-
mode files.  open() defaults to text-mode.  A .jpg is certainly 
a binary file, so on Windows you *must* open with it in binary 
mode.  That means changing your

    open(fname)

to

    open(fname, 'rb')

Your problem will go away then.  It's unfortunate, but that's 
the way Windows works -- it's not a Python bug.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=816627&group_id=5470



More information about the Python-bugs-list mailing list