[ python-Bugs-816627 ] file read() forgets some bytes
SourceForge.net
noreply at sourceforge.net
Tue Nov 11 19:29:48 EST 2003
Bugs item #816627, was opened at 2003-10-02 11:54
Message generated for change (Comment added) made by tim_one
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=816627&group_id=5470
Category: Python Library
Group: Not a Bug
Status: Closed
Resolution: Invalid
Priority: 5
Submitted By: Philippe Fremy (pfremy)
Assigned to: Nobody/Anonymous (nobody)
Summary: file read() forgets some bytes
Initial Comment:
It seems the python miscalculates the size of a file.
This is on windows, with python 2.2.2 from ActiveState
E:\work\jayacard\inkit\demo>python
ActivePython 2.2.2 Build 224 (ActiveState Corp.) based on
Python 2.2.2 (#37, Nov 26 2002, 10:24:37) [MSC 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for
more information.
>>> import array, os
>>> fname = "current_img.jpg"
>>> print len(open(fname).read())
41
>>> print os.stat(fname).st_size
693
As you can see, there some mismatch here. 693 is indeed
the size reported by the windows explorer.
>>> a = array.array('B')
>>> a.fromfile( open(fname), 693)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
EOFError: not enough items in file
>>> a.fromfile(open(fname), 42)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
EOFError: not enough items in file
>>> a.fromfile(open(fname), 41)
>>> a = array.array('B')
>>> a.fromfile(open(fname), 41)
>>> len(a)
41
I attach the nasty file
This is a very serious bug for me.
----------------------------------------------------------------------
>Comment By: Tim Peters (tim_one)
Date: 2003-11-11 19:29
Message:
Logged In: YES
user_id=31435
Why are you attaching your complaint to a closed bug report?
Nobody is going to see it here. If the OP wasn't happy with
the resolution, he's had 5 weeks to say so.
If you have a new problem, open a new bug report. Also
include basic information: which version of Python, which OS,
and a minimal executable program that shows the bad
behavior. Because the OP did give that information, it was
obvious he was opening a binary file on Windows in text
mode. Your problem may be just as shallow.
Let me guess: you cannot switch between reading and
writing without doing a seek in between. That's the way C's
standard I/O works on all platforms. No exception is raised if
you neglect to seek, but the results are undefined. Python
inherits that behavior from the platform C library. If you're
not seeking when switching between reading and writing,
that's the cause of your problem.
----------------------------------------------------------------------
Comment By: JohnF Dutcher (jfdutcher)
Date: 2003-11-11 19:18
Message:
Logged In: YES
user_id=906889
Despite 'tim-one's assertion to the contrary, my own
experience confirms that fromfile and array do NOT work
properly when importing (fromfile) and exporting (tofile) to
even binary files ( I use no other kind than binary). The tofile
(or even the write command) appears to properly increment
the file pointer after a given tofile is issued. A subsequent
fromfile (or a read) will NOT read fromthe same byte location
in the file where the previous write terminated.....thus
destroying file integrity, and also leading to inappropriate
EOF messages.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2003-10-05 15:11
Message:
Logged In: YES
user_id=31435
Text files must be opened in text mode in order to (at least)
hide platform differences in how line endings are represented.
For example, on Linux a line ends with \n, on Windows with
\r\n, and on classic Macintoshes with \r. Python has no
control over what the native operating system conventions
are. But when a text file is opened in text mode, Python
hides the platform differences, making all lines *appear* as if
they ended with \n on input. On output to a file opened in
text mode, Python changes \n to whatever the platform
convention is for ending a line. If it didn't do this, text files
created with Python would be unusable by non-Python
programs on your system, and text files created by non-
Python programs on your system would be unusable by
Python.
The distinction between text mode and binary mode didn't
originate with Python, and all you can do is try to live with it.
The distinction is simply a reality on most non-Unix systems
(Unix treats text and binary mode exactly the same).
Note that it's not possible either to guess whether a file is
*intended* to be text mode or binary mode by staring at its
contents. The distinction is in the mind of the programmer
who created the file. So it goes.
Portability isn't hard despite all this: always open binary files
in binary mode, and always open text files in text mode.
That's it.
----------------------------------------------------------------------
Comment By: Philippe Fremy (pfremy)
Date: 2003-10-05 14:31
Message:
Logged In: YES
user_id=233844
Thanks a lot for the answer.
But is there any advantage in opening a file in ascii mode ? Why
not open every windows file in binary mode by default, and treat
it internall as an ascii file later on, if it turns out to be one ?
With python returning always strings, the distinction makes very
little sense.
This is the kind of thing that will unfortunately make a program
more difficult to port. I guess it is too late to change this
behaviour.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2003-10-02 12:02
Message:
Logged In: YES
user_id=31435
Windows makes a distinction between binary-mode and text-
mode files. open() defaults to text-mode. A .jpg is certainly
a binary file, so on Windows you *must* open with it in binary
mode. That means changing your
open(fname)
to
open(fname, 'rb')
Your problem will go away then. It's unfortunate, but that's
the way Windows works -- it's not a Python bug.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=816627&group_id=5470
More information about the Python-bugs-list
mailing list