File holes in Linux
Ned Deily
nad at acm.org
Wed Sep 29 16:12:49 EDT 2010
In article
<AANLkTinPUYzL5LaQBV-B3BUX6OzYd6+UMPXRptqH7Wcz at mail.gmail.com>,
Tom Potts <karaken12 at gmail.com> wrote:
> Hi, all. I'm not sure if this is a bug report, a feature request or what,
> so I'm posting it here first to see what people make of it. I was copying
> over a large number of files using shutil, and I noticed that the final
> files were taking up a lot more space than the originals; a bit more
> investigation showed that files with a positive nominal filesize which
> originally took up 0 blocks were now taking up the full amount. It seems
> that Python does not write back file holes as it should; here is a simple
> program to illustrate:
> data = '\0' * 1000000
> file = open('filehole.test', 'wb')
> file.write(data)
> file.close()
> A quick `ls -sl filehole.test' will show that the created file actually
> takes up about 980k, rather than the 0 bytes expected.
I would expect the file size to be 980k in that case. AFAIK, simply
writing null bytes doesn't automatically create a sparse file on Unix-y
systems. Generally, on file systems that support it, files become
sparse when you don't write to certain parts of it, i.e. by using
lseek(2) to position forward past the end of the file when writing,
thereby implying that the intermediate blocks should be treated as zero
when reading. Only files on certain file systems on certain platforms
support operations like that. Python makes no claim to do that
optimization in either its lower-level i/o routines or in the shutil
module. The latter's copyfile just copies bytes from input to output.
If you want to always preserve sparse files, you could use the GNU cp
routine with --sparse=always. If you look at its code, you see that it
checks for all-zero blocks when copying and then uses lseek to skip over
them when writing. Something like that could be added to shutil, with
the necessary tests for which platforms support it. If you are
interested in adding that feature, you could write a patch and open a
feature request on the Python bug tracker (http://bugs.python.org/).
It's not likely to progress without a supplied patch and even then maybe
not.
--
Ned Deily,
nad at acm.org
More information about the Python-list
mailing list