[Python-bugs-list] [ python-Bugs-816627 ] file read() forgets some bytes

SourceForge.net noreply at sourceforge.net
Sun Oct 5 15:11:46 EDT 2003


Bugs item #816627, was opened at 2003-10-02 11:54
Message generated for change (Comment added) made by tim_one
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=816627&group_id=5470

Category: Python Library
Group: Not a Bug
Status: Closed
Resolution: Invalid
Priority: 5
Submitted By: Philippe Fremy (pfremy)
Assigned to: Nobody/Anonymous (nobody)
Summary: file read() forgets some bytes

Initial Comment:
It seems the python miscalculates the size of a file.

This is on windows, with python 2.2.2 from ActiveState



E:\work\jayacard\inkit\demo>python

ActivePython 2.2.2 Build 224 (ActiveState Corp.) based on

Python 2.2.2 (#37, Nov 26 2002, 10:24:37) [MSC 32 bit

(Intel)] on win32

Type "help", "copyright", "credits" or "license" for

more information.

>>> import array, os

>>> fname = "current_img.jpg"

>>> print len(open(fname).read())

41

>>> print os.stat(fname).st_size

693



As you can see, there some mismatch here. 693 is indeed

the size reported by the windows explorer.



>>> a = array.array('B')

>>> a.fromfile( open(fname), 693)

Traceback (most recent call last):

  File "<stdin>", line 1, in ?

EOFError: not enough items in file



>>> a.fromfile(open(fname), 42)

Traceback (most recent call last):

  File "<stdin>", line 1, in ?

EOFError: not enough items in file

>>> a.fromfile(open(fname), 41)

>>> a = array.array('B')

>>> a.fromfile(open(fname), 41)

>>> len(a)

41





I attach the nasty file



This is a very serious bug for me.

----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2003-10-05 15:11

Message:
Logged In: YES 
user_id=31435

Text files must be opened in text mode in order to (at least) 

hide platform differences in how line endings are represented.  

For example, on Linux a line ends with \n, on Windows with 

\r\n, and on classic Macintoshes with \r.  Python has no 

control over what the native operating system conventions 

are.  But when a text file is opened in text mode, Python 

hides the platform differences, making all lines *appear* as if 

they ended with \n on input.  On output to a file opened in 

text mode, Python changes \n to whatever the platform 

convention is for ending a line.  If it didn't do this, text files 

created with Python would be unusable by non-Python 

programs on your system, and text files created by non-

Python programs on your system would be unusable by 

Python.



The distinction between text mode and binary mode didn't 

originate with Python, and all you can do is try to live with it.  

The distinction is simply a reality on most non-Unix systems 

(Unix treats text and binary mode exactly the same).



Note that it's not possible either to guess whether a file is 

*intended* to be text mode or binary mode by staring at its 

contents.  The distinction is in the mind of the programmer 

who created the file.  So it goes.



Portability isn't hard despite all this:  always open binary files 

in binary mode, and always open text files in text mode.  

That's it.

----------------------------------------------------------------------

Comment By: Philippe Fremy (pfremy)
Date: 2003-10-05 14:31

Message:
Logged In: YES 
user_id=233844

Thanks a lot for the answer. 
 
But is there any advantage in opening a file in ascii mode ? Why 
not open every windows file in binary mode by default, and treat 
it internall as an ascii file later on, if it turns out to be one ? 
 
With python returning always strings, the distinction makes very 
little sense.  
 
This is the kind of thing that will unfortunately make a program 
more difficult to port. I guess it is too late to change this 
behaviour. 

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2003-10-02 12:02

Message:
Logged In: YES 
user_id=31435

Windows makes a distinction between binary-mode and text-

mode files.  open() defaults to text-mode.  A .jpg is certainly 

a binary file, so on Windows you *must* open with it in binary 

mode.  That means changing your



    open(fname)



to



    open(fname, 'rb')



Your problem will go away then.  It's unfortunate, but that's 

the way Windows works -- it's not a Python bug.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=816627&group_id=5470



More information about the Python-bugs-list mailing list