walking a directory with very many files

Lie Ryan lie.1296 at gmail.com
Tue Jun 16 23:42:02 EDT 2009


Mike Kazantsev wrote:
> On Wed, 17 Jun 2009 14:52:28 +1200
> Lawrence D'Oliveiro <ldo at geek-central.gen.new_zealand> wrote:
> 
>> In message 
>> <234b19ac-7baf-4356-9fe5-37d00146d982 at z9g2000yqi.googlegroups.com>,
>> thebjorn wrote:
>>
>>> Not proud of this, but...:
>>>
>>> [django] www4:~/datakortet/media$ ls bfpbilder|wc -l
>>>  174197
>>>
>>> all .jpg files between 40 and 250KB with the path stored in a
>>> database field... *sigh*
>> Why not put the images themselves into database fields?
>>
>>> Oddly enough, I'm a relieved that others have had similar folder
>>> sizes ...
>> One of my past projects had 400000-odd files in a single folder. They
>> were movie frames, to allow assembly of movie sequences on demand.
> 
> For both scenarios:
> Why not use hex representation of md5/sha1-hashed id as a path,
> arranging them like /path/f/9/e/95ea4926a4 ?
> 
> That way, you won't have to deal with many-files-in-path problem, and,
> since there's thousands of them anyway, name readability shouldn't
> matter.
> 
> In fact, on modern filesystems it doesn't matter whether you accessing 
> /path/f9e95ea4926a4 with million files in /path or /path/f/9/e/95ea
> with only hundred of them in each path. Former case (all-in-one-path)
> would even outperform the latter with ext3 or reiserfs by a small
> margin.
> Sadly, that's not the case with filesystems like FreeBSD ufs2 (at least
> in sixth branch), so it's better to play safe and create subdirs if the
> app might be run on different machines than keeping everything in one
> path.
> 

It might not matter for the filesystem, but the file explorer (and ls)
would still suffer. Subfolder structure would be much better, and much
easier to navigate manually when you need to.



More information about the Python-list mailing list