walking a directory with very many files

Mike Kazantsev mk.fraggod at gmail.com
Tue Jun 16 23:18:58 EDT 2009


On Wed, 17 Jun 2009 14:52:28 +1200
Lawrence D'Oliveiro <ldo at geek-central.gen.new_zealand> wrote:

> In message 
> <234b19ac-7baf-4356-9fe5-37d00146d982 at z9g2000yqi.googlegroups.com>,
> thebjorn wrote:
> 
> > Not proud of this, but...:
> > 
> > [django] www4:~/datakortet/media$ ls bfpbilder|wc -l
> >  174197
> > 
> > all .jpg files between 40 and 250KB with the path stored in a
> > database field... *sigh*
> 
> Why not put the images themselves into database fields?
> 
> > Oddly enough, I'm a relieved that others have had similar folder
> > sizes ...
> 
> One of my past projects had 400000-odd files in a single folder. They
> were movie frames, to allow assembly of movie sequences on demand.

For both scenarios:
Why not use hex representation of md5/sha1-hashed id as a path,
arranging them like /path/f/9/e/95ea4926a4 ?

That way, you won't have to deal with many-files-in-path problem, and,
since there's thousands of them anyway, name readability shouldn't
matter.

In fact, on modern filesystems it doesn't matter whether you accessing 
/path/f9e95ea4926a4 with million files in /path or /path/f/9/e/95ea
with only hundred of them in each path. Former case (all-in-one-path)
would even outperform the latter with ext3 or reiserfs by a small
margin.
Sadly, that's not the case with filesystems like FreeBSD ufs2 (at least
in sixth branch), so it's better to play safe and create subdirs if the
app might be run on different machines than keeping everything in one
path.

-- 
Mike Kazantsev // fraggod.net
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 205 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20090617/96cb7a2a/attachment.sig>


More information about the Python-list mailing list