[Q] File Object -- function 'read'

Peter Hansen peter at engcorp.com
Fri Jul 20 04:11:36 CEST 2001

jackyci wrote:
> In Windows. "\n" equal "\x0d\x0a"
> But in function "read" of File Object, "\x0d\x0a" equal 1 byte.
> file = open('test.txt')
> str = file.read(22)    # I want to read the first two line
> # but str = "This is a test\ntest\nte"
> How to solve the problem ?

(I'm surprised there's no answer yet to your question -- at least
none visible on my system, yet.)

There is a concept of "text files" versus "binary files".  The
primary way in which these differ is that "text" files, when 
written, may convert the data written in ways which depend on the
current platform.  "Binary" files, on the other hand, contain
precisely the bytes which were written out.

The main difference between the two, is that on Windows
(and DOS, and anything else in that ugly realm), "text" files
have 'newlines' (represented as \n but really just LF or 
linefeed characters, or \x0a) converted to pairs of bytes,
specifically a Carriage Return (CR) followed by a Line Feed

When you read in a "text" file again, CR followed by LF is 
converted back into a single byte, the \n character (which,
again, is just a LF character).

On a Unix platform, "binary" and "text" files are the same 
thing.  Windows is "special". :-)

To solve your problem in one way: on Windows, use the
"b" option in the open command to force the file to 
be read as a "binary" file.  (You need to explicitly 
include the "r" option as well, for read mode.)

>>> file = open('test.txt', 'rb')
>>> str = file.read()
>>> str
'This is a test\r\ntest\r\ntest\r\n'

Note that your comment suggests you really would be
better off interpreting the file as a "text" file,
since the entire concept of "lines" is exclusive
to "text" files.  "Binary" files should generally be
thought of as having "raw" data, without "lines"
(this is not a hard and fast rule of course).

If your application doesn't really require the
CR LF combination to be read in, don't open the
file in binary mode.

(I use "text" and "binary" in the above to emphasize
the fact that the distinction is one of convention
or interpretation.  Or to put it another way, 
files are just series of bytes; whether you want 
to treat them as text or binary is up to you...
it's not an inherent property of the file.)

Peter Hansen, P.Eng.
peter at engcorp.com

More information about the Python-list mailing list