emulating du with os.walk

Kirk Job-Sluder kirk at eyegor.jobsluder.net
Tue Sep 28 11:04:14 EDT 2004


On 2004-09-28, Gerrit <gerrit at nl.linux.org> wrote:
> "Martin v. Löwis" wrote:
>> Kirk Job-Sluder wrote:
>> >There should be an easy way to get around this, or perhaps I'm better
>> >off just parsing the output of du.
>> 
>> I suggest that you don't use os.path.walk, but write a recursive
>> function yourself. You should find that the entire problem can
>> be solved in 12 lines of Python code.
>
> There are some nasty little problems which make it difficult.
>
> First, what do you do with hardlinks? Suppose directory a/a, a/b and a/c
> all contain the same 100 MiB file. Directory a/ only has 100 MiB, but a
> naive script will report 300 MiB.

Well, that is a good question.  The primary goal of this script is to
construct lists of files that can be passed to cpio in order to make
multiple volumes of a certain size.  (In my case, efficiently pack
CD-ROM or CD-RW disks.)  The other goal is to minimize splitting of
directory heirarchies between volumes where possible.  So for example, 
given a list of directories:

foo 500M
bar 400M
baz 100M
rab 200M

the script should construct file lists for two volumes:
volume1: foo baz
volume2: bar rab

(Of course, the actual volumes will be larger than 600M to allow for
compression.)

Since each volume should be independent of other volumes, it makes sense
to treat hard links as regular files.  Even though foo/a.txt and
bar/b.txt point to the same file.  A full copy of a.txt and b.txt is
required.  

> Most of the time, you'll want to stay in one filesystem.
>
> You don't want to get stuck in recursive symlinks. If a/b is a symlink
> to a/, you quickly get into an infinite loop.

Good point.  I should check for that.

> Directories have a size too.
>
> What do we do with files we can't read?

At the moment, throw an error and move on.

> In /proc, even stranger subtleties exist which I don't understand -
> ENOENT although listed by listdir() and that sort of thing.
>
> Together with more options, human-readable file sizes and documentation,
> it took be ~200 LOC at
> http://topjaklont.student.utwente.nl/creaties/dkus.py

Thanks!

> Note that du doesn't solve these problems either.

True, but I'm willing to sacrifice some precision for the sake of getting
it done.  Getting volume sizes in the ballpark is good enough.  


> yours,
> Gerrit.
>
> -- 
> Weather in Twenthe, Netherlands 28/09 08:55:
> 	15.0°C mist overcast wind 4.0 m/s SW (57 m above NAP)


-- 
Kirk Job-Sluder
"The square-jawed homunculi of Tommy Hilfinger ads make every day an
existential holocaust."  --Scary Go Round



More information about the Python-list mailing list