Binary vs. Text mode

Ben Mitchell ben at semio.com
Thu Nov 30 01:14:36 CET 2000


Hello,

I'm a little confused as to why, exactly, I'm seeing the following behavior
and I'm hoping someone can clarify for me.

First, assume I have a large xml-like file ("myfile") which has a tag
</document> that appears alone on a number of lines throughout the file.

I ran the following against that file:

-----
import os
import sys
import string

fp = open("myfile", "r+")
while 1:
    justbefore = fp.tell()
    line = fp.readline()
    if not line :
        break
    line = string.strip(line)
    if line == "</document>":
        fp.seek(justbefore, 0)
        reread = fp.readline()
        print string.strip(reread)
----

Now it was my expectation that this would print out a whole bunch of lines
that looked like:
</document>
</document>
</document>
</document>
</document>
</document>
</document>
...

Instead, I got:
5D
15E
ocument>
>
nt>
xt>
ocument>
cument>

ocument>
ument>
ument>
</url>
<url>
l>
ument>
ment>
ocument>
16F
ment>
ument>
nt>

document>
t>
cument>
/url>
nt>
ment>
url>
ument>
cument>
l>
nt>
cument>
url>
url>
t>

ent>
cument>
cument>
cument>
cument>
cument>
cument>
...

When I then changed to opening in "rb+" mode instead of "r+", everything
worked fine.

This yields two questions.  The first is why the byte offsets returned by
fp.tell() vary depending on the read mode in which I've opened the file?
Isn't it just returning a number of bytes from the head of the file?  That
doesn't vary, regardless of how the system perceives the data.

The second question is the more perplexing one for me.  It looks like, on a
large enough number of instances that it's not random, the tell operation on
a text mode opened file returned a location that was very close to where I
expected it to be. (That's all those "ument>" and similar strings.)  If in
fact text mode is going to screw up the tell operation, shouldn't it
*really* screw it up instead of getting it consistently close?

In sum, I'd like to better understand the implications of opening in binary
versus text mode.

This is on a Windows box, btw.

Thanks in advance for any clarification you can provide!

Best,

-Ben





More information about the Python-list mailing list