File holes in Linux

Ned Deily nad at acm.org
Wed Sep 29 16:12:49 EDT 2010


In article 
<AANLkTinPUYzL5LaQBV-B3BUX6OzYd6+UMPXRptqH7Wcz at mail.gmail.com>,
 Tom Potts <karaken12 at gmail.com> wrote:
> Hi, all.  I'm not sure if this is a bug report, a feature request or what,
> so I'm posting it here first to see what people make of it.  I was copying
> over a large number of files using shutil, and I noticed that the final
> files were taking up a lot more space than the originals; a bit more
> investigation showed that files with a positive nominal filesize which
> originally took up 0 blocks were now taking up the full amount.  It seems
> that Python does not write back file holes as it should; here is a simple
> program to illustrate:
>   data = '\0' * 1000000
>   file = open('filehole.test', 'wb')
>   file.write(data)
>   file.close()
> A quick `ls -sl filehole.test' will show that the created file actually
> takes up about 980k, rather than the 0 bytes expected.

I would expect the file size to be 980k in that case.  AFAIK, simply 
writing null bytes doesn't automatically create a sparse file on Unix-y 
systems.  Generally, on file systems that support it, files become 
sparse when you don't write to certain parts of it, i.e. by using 
lseek(2) to position forward past the end of the file when writing, 
thereby implying that the intermediate blocks should be treated as zero 
when reading.  Only files on certain file systems on certain platforms 
support operations like that.  Python makes no claim to do that 
optimization in either its lower-level i/o routines or in the shutil 
module.  The latter's copyfile just copies bytes from input to output.  
If you want to always preserve sparse files, you could use the GNU cp 
routine with --sparse=always.  If you look at its code, you see that it 
checks for all-zero blocks when copying and then uses lseek to skip over 
them when writing.  Something like that could be added to shutil, with 
the necessary tests for which platforms support it.  If you are 
interested in adding that feature, you could write a patch and open a 
feature request on the Python bug tracker (http://bugs.python.org/).  
It's not likely to progress without a supplied patch and even then maybe 
not.

-- 
 Ned Deily,
 nad at acm.org




More information about the Python-list mailing list