securely overwrite files with Python

Thomas Bellman bellman at lysator.liu.se
Fri Mar 5 21:41:02 EST 2004


Skip Montanaro <skip at pobox.com> writes:

> I'm not sure I understand how that can work.  Suppose I have multiple (hard)
> links to a small file named "small".  If the OS moves it around to reduce
> fragmentation (implying it will have a different inode next time it's
> opened) how does it efficiently track down and change all inode references
> to it?  In theory it could keep a cache mapping inode numbers back to the
> directories which reference them, but that could consume a fairly large
> chunk of memory to maintain.

I think you have misunderstood how Unix file systems work.

A directory is a list of directory entries, each entry consisting
of a name and an inode number.  There may be several directory
entries in a file system that point to the same inode, and the
entries can be in different directories, and need not have the
same name.  The names are also called "hard links".  All names
for a file are equal in status -- none is worth more than any of
the others.

The inode is the central point of information for a file.  It
holds information like:

 - file type (regular file, directory, device file, ...)
 - file permissions
 - file ownership
 - timestamps (data modification, inode modification, read)
 - number of names (hard links) the file has
 - file size
 - list of data blocks for the file

The actual location of the inode on the storage device can
typically be calculated from the inode number, and from a small
index of inode clusters in the file system.

Finally there are the actual data blocks for the file.  They are
*not* part of the inode, and they do not need to be placed near
the inode -- they can be scattered around at random places on the
storage device.  The list of data blocks in the inode holds only
pointers to the data blocks.

In a typical Unix file systems, like the Fast File System of BSD
ancestry (in common use in many Unices; it is called UFS in
SunOS, for example), or the 2nd and 3rd Extended File System in
Linux (ext2 and ext3), the inode only contains pointers to the
first few datablocks (10 is a common number).  There is also a
pointer to a single indirect block, which in turn holds pointers
to data blocks 10-1034 (or something).  And there is a pointer to
a single indirect-indirect block, containing pointers to indirect
blocks, containing ponters to actual data block.  Depending on
the implementation, the inode may also contain a pointer to an
indirect-indirect-indirect block.

A file system that moves around files when you overwrite them,
will only move the data blocks, not the inode.  The inode will
stay the same, and in the same position on the storage device.


-- 
Thomas Bellman,   Lysator Computer Club,   Linköping University,  Sweden
"Adde parvum parvo magnus acervus erit"       ! bellman @ lysator.liu.se
          (From The Mythical Man-Month)       ! Make Love -- Nicht Wahr!



More information about the Python-list mailing list