At 9:52 AM +1000 2005-10-25, AE Somerville wrote:
Problem manifests as an inability of the create list process being able to make the archiving directories. The number appears to be when the directory count approaches 32,000 separate directories.
Most *nix OSes have problems with too many files (or
subdirectories) within a given directory structure. Frequently, you start seeing problems at much lower numbers, like 1000 or 10,000.
My temp solution:
I have altered Site.py line 52 to add the list name again into the path for the archives. This halved the number of directories in the /var/mailman/archives/private/ level and pushed the extra directories into their own named sub directory. Now we can create new lists again (in our situation we have the list population updated daily and the lists themselves are added/deleted as required)
This just pushes the horizon out. This doesn't solve the
fundamental problem. IMO, you're better off doing a quick MD5 hash of the listname and then slicing off the first few (or last) characters of the hash, then incorporating that into the path name.
If you use hex characters instead of some other base, that's
roughly a factor of sixteen reduction in the number of subdirectories/files for each character of hash. In practice, you'll get birthday collisions more frequently than you'd like, so count it as something closer to a four to eight reduction.
With this technique, it doesn't take too many hash characters to
greatly reduce the problem to a much more manageable size. Just three characters of a reasonably well distributed hash will result in no more than 4096 hash subdirectories at the parent, and probably something close to a factor of 64 to 512 reduction in the number of grandchild subdirectories/files within each hash subdirectory.
If you go with base-32 instead, two base-32 characters would be
no more than 1024 files in a single directory, and probably close to a factor of six to 32 reduction in the number of grandchild subdirectories/files per hash subdirectory.
Base-64 would let you get two characters creating no more than
4096 hash subdirectories, and you can see the numbers above for the likely reduction in the number of grandchild subdirectories/files.
If you need, you can take the hashing another level. It all
depends on how cramped you are for space in your filenames, because there are also inode and iname caching issues to consider.
-- Brad Knowles, brad@stop.mail-abuse.org
"Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
SAGE member since 1995. See http://www.sage.org/ for more info.