[ python-Bugs-1735418 ] file.read() truncating strings under Windows
SourceForge.net
noreply at sourceforge.net
Thu Jun 14 19:59:50 CEST 2007
Bugs item #1735418, was opened at 2007-06-12 00:19
Message generated for change (Comment added) made by cgkanchi
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1735418&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: cgkanchi (cgkanchi)
Assigned to: Nobody/Anonymous (nobody)
Summary: file.read() truncating strings under Windows
Initial Comment:
On Python 2.4.4 and 2.5.1 under Windows, file.read() fails to read a varying number of characters from the last line(s) of text files when asked to read more than 800 characters from near the end of the file. For example, if the last word of a 500kb file is "superlative", file.read() might output "erlative". The file pointer at this stage is very close (a few words at most) to the end of the file. I ran into this problem while writing a program to split .txt ebooks into smaller files so that my ancient iPod could handle them. The behaviour is identical on both 2.4.4 and 2.5.1 under Windows, but does not appear under Mac OS X. I was unable to test it under Linux. To test the bug, I used various books from http://gutenberg.org . The one primarily used was Pride and Prejudice by Jane Austen.
----------------------------------------------------------------------
>Comment By: cgkanchi (cgkanchi)
Date: 2007-06-14 17:59
Message:
Logged In: YES
user_id=1814873
Originator: YES
>(e) To have tell() on the same level with read(), try the unbuffered
mode
>by specifying bufsize=0 in open(),
>
> http://docs.python.org/lib/built-in-funcs.html
This does not work either. There is no change in the behaviour of the
program.
>(a) I would not trust tell(). Calculate the absolute position and use
>seek().
That defeats the purpose of having native string handling in python. It
means I have to do things the C way. Therefore, it is a bug in the
implementation.
>(b) Just from the documentation to Python's file-like objects I can
assume
>that read() and tell() belong to different levels of API. The read()
>function has this in its documentation:
>"Note that this method may call the underlying C function fread() more
>than once in an effort to acquire as close to size bytes as possible".
>http://docs.python.org/lib/bltin-file-objects.html
That should not make any difference whatsoever.
>The tell() function's documentation refers to stdio's ftell(). This
hints
>that tell() will return the position of the fread() buffer's end, not
the
>read()'s end.
Again, irrelevant.
>(c) It also appears that by adding 1 to the "current position - unget
>size" you are skipping the space character itself.
This is by design. I didn't want the space. Functionally, it makes no
difference.
>(d) The rfind() might return -1 if the search fails.
This is by design as well, when there are no spaces in the remaining file,
i.e., the file pointer is on the last word, a return value of -1 causes
read() to read till EOF.
I did however find the solution in the python docs, but it is a workaround
rather than a fix for a very obvious bug.
"tell()
Return the file's current position, like stdio's ftell().
Note: On Windows, tell() can return illegal values (after an fgets())
when reading files with Unix-style line-endings. Use binary mode ('rb') to
circumvent this problem. "
Cheers,
cgkanchi
----------------------------------------------------------------------
Comment By: Ilguiz Latypov (ilgiz)
Date: 2007-06-12 15:51
Message:
Logged In: YES
user_id=281701
Originator: NO
(e) To have tell() on the same level with read(), try the unbuffered mode
by specifying bufsize=0 in open(),
http://docs.python.org/lib/built-in-funcs.html
----------------------------------------------------------------------
Comment By: Ilguiz Latypov (ilgiz)
Date: 2007-06-12 15:47
Message:
Logged In: YES
user_id=281701
Originator: NO
This is your coding bug.
(a) I would not trust tell(). Calculate the absolute position and use
seek().
(b) Just from the documentation to Python's file-like objects I can assume
that read() and tell() belong to different levels of API. The read()
function has this in its documentation:
"Note that this method may call the underlying C function fread() more
than once in an effort to acquire as close to size bytes as possible".
http://docs.python.org/lib/bltin-file-objects.html
The tell() function's documentation refers to stdio's ftell(). This hints
that tell() will return the position of the fread() buffer's end, not the
read()'s end.
(c) It also appears that by adding 1 to the "current position - unget
size" you are skipping the space character itself.
(d) The rfind() might return -1 if the search fails.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1735418&group_id=5470
More information about the Python-bugs-list
mailing list