Issues with archiving directory and OS limitations
Hello,
I have recently come across a problem that prevents the creation of any new lists for our site.
Problem manifests as an inability of the create list process being able to make the archiving directories. The number appears to be when the directory count approaches 32,000 separate directories.
How did it happen?
We have close to 15,000+ lists but the archive directories houses two directories per list normally:
/var/mailman/archives/private/<listname>
/var/mailman/archives/private/<listname>.mbox
So the number of directories are essentially doubled and then Linux has trouble with having any more.
My temp solution:
I have altered Site.py line 52 to add the list name again into the path for the archives. This halved the number of directories in the /var/mailman/archives/private/ level and pushed the extra directories into their own named sub directory. Now we can create new lists again (in our situation we have the list population updated daily and the lists themselves are added/deleted as required)
def get_archpath(listname, domain=None, create=False, public=False):
if public:
subdir = mm_cfg.PUBLIC_ARCHIVE_FILE_DIR
else:
subdir = mm_cfg.PRIVATE_ARCHIVE_FILE_DIR
path = os.path.join(subdir, listname, listname)
if create:
_makedir(path)
return path
Related problems (from the 'fix'):
The HTML links are not working for the archive site, but it would be nice to have them functioning.
Possible larger ramifications from the alteration of this function that I cannot see yet.
Advice from the folks who are a lot more familiar with mailman would be great to point us at a more eloquent solution.
Antony Somerville
Network Programmer / Project Manager: QUT AD Upgrade Project
Network Applications
Queensland University of Technology, Brisbane Australia
Phone +61 7 38644434 Fax +61 7 38642921
At 9:52 AM +1000 2005-10-25, AE Somerville wrote:
Problem manifests as an inability of the create list process being able to make the archiving directories. The number appears to be when the directory count approaches 32,000 separate directories.
Most *nix OSes have problems with too many files (or
subdirectories) within a given directory structure. Frequently, you start seeing problems at much lower numbers, like 1000 or 10,000.
My temp solution:
I have altered Site.py line 52 to add the list name again into the path for the archives. This halved the number of directories in the /var/mailman/archives/private/ level and pushed the extra directories into their own named sub directory. Now we can create new lists again (in our situation we have the list population updated daily and the lists themselves are added/deleted as required)
This just pushes the horizon out. This doesn't solve the
fundamental problem. IMO, you're better off doing a quick MD5 hash of the listname and then slicing off the first few (or last) characters of the hash, then incorporating that into the path name.
If you use hex characters instead of some other base, that's
roughly a factor of sixteen reduction in the number of subdirectories/files for each character of hash. In practice, you'll get birthday collisions more frequently than you'd like, so count it as something closer to a four to eight reduction.
With this technique, it doesn't take too many hash characters to
greatly reduce the problem to a much more manageable size. Just three characters of a reasonably well distributed hash will result in no more than 4096 hash subdirectories at the parent, and probably something close to a factor of 64 to 512 reduction in the number of grandchild subdirectories/files within each hash subdirectory.
If you go with base-32 instead, two base-32 characters would be
no more than 1024 files in a single directory, and probably close to a factor of six to 32 reduction in the number of grandchild subdirectories/files per hash subdirectory.
Base-64 would let you get two characters creating no more than
4096 hash subdirectories, and you can see the numbers above for the likely reduction in the number of grandchild subdirectories/files.
If you need, you can take the hashing another level. It all
depends on how cramped you are for space in your filenames, because there are also inode and iname caching issues to consider.
-- Brad Knowles, <brad@stop.mail-abuse.org>
"Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety."
-- Benjamin Franklin (1706-1790), reply of the Pennsylvania
Assembly to the Governor, November 11, 1755
SAGE member since 1995. See <http://www.sage.org/> for more info.
On 10/24/05 5:28 PM, "Brad Knowles" <brad@stop.mail-abuse.org> wrote:
Base-64 would let you get two characters creating no more than 4096 hash subdirectories, and you can see the numbers above for the likely reduction in the number of grandchild subdirectories/files.
Base 64 isn't a good idea for code which might run on case-insensitive file systems (eg Cygwin or Mac OS X). Base 36 would seem safer if this code is going to go into the official Mailman release sometime (which is probably a good idea).
--John
AE Somerville wrote:
Related problems (from the 'fix'):
- The HTML links are not working for the archive site, but it would be nice to have them functioning.
They don't work because they are constructed using the Archiver.GetBaseArchiveURL() method which doesn't use Site.get_archpath(). For public archives, assuming you haven't changed the default
PUBLIC_ARCHIVE_URL = 'http://%(hostname)s/pipermail/%(listname)s'
I think you can put
PUBLIC_ARCHIVE_URL = 'http://%(hostname)s/pipermail/%(listname)s/%(listname)s'
(watchout for wrapped line) in mm_cfg.py to fix.
For private archives, you will need to edit the definition of GetBaseArchiveURL() in Mailman/Archiver/Archiver.py or possibly you can make the old URL work with a rewrite rule in your web server.
- Possible larger ramifications from the alteration of this function that I cannot see yet.
The links in the archive itself are all relative, so that should be OK. I think you're probably OK in general if you fix the stuff in 1), but I haven't really looked hard enough to verify this. Of course, if you patch Archiver.py, you have to maintain the patch across upgrades.
Advice from the folks who are a lot more familiar with mailman would be great to point us at a more eloquent solution.
Brad has addressed your basic solution and suggested ways for further reducing the size of the archives/private and archives/public directories. Of course, you eventually have the same issue with the lists/ directory.
-- Mark Sapiro <msapiro@value.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (4)
-
AE Somerville
-
Brad Knowles
-
John W. Baxter
-
Mark Sapiro