[Tutor] file.read() doesn't give full contents of compressed files

Barton David David.Barton at nottingham.ac.uk
Tue Feb 20 14:18:05 CET 2007

Oh... of course. Thanks and sorry for missing the bleeding obvious.

Mind you, when reading in 'txt mode' rather than binary, len() actually
gives a much *smaller* size than getsize. Does the conversion into txt
happen to introduce some sort of terminator character that stops
file.read() from going to the end?


-----Original Message-----
From: Kent Johnson [mailto:kent37 at tds.net] 
Sent: 20 February 2007 12:53
To: Barton David
Cc: tutor at python.org
Subject: Re: [Tutor] file.read() doesn't give full contents of
compressed files

Barton David wrote:
> Hi,
> I'm really confused, and I hope somebody can explain this for me...
> I've been playing with compression and archives, and have some .zip, 
> .tar, .gz and .tgz example files to test my code on.
> I can read them using either zipfile, tarfile, gzip or zlib, and 
> that's fine. But just reading them in 'raw' doesn't give me the whole 
> string of
> (compressed) bytes.
> i.e...
> len( file("mytestfile","r").read() ) != os.path.getsize("mytestfile")
> Not even close, in fact. It seems like file.read() just stops after 
> reading a small portion of each example file, but why would that
> And what could I do if I wanted to read in the entire (compressed) 
> contents as a string?

Why do you think it stops reading? len() should be giving a bigger
number than getsize() because you are reading the file in text mode
which will convert \n to \r\n. Try file("mytestfile","rb").


This message has been checked for viruses but the contents of an attachment
may still contain software viruses, which could damage your computer system:
you are advised to perform your own checks. Email communications with the
University of Nottingham may be monitored as permitted by UK legislation.

More information about the Tutor mailing list