Hi All,
This morning, I set out to improve the performance of "mailman import21" command. If you have used it in the past, you will know that it is slow. Until now, I never had an idea about why? Here were my ideas:
Too many database calls and sqlite3 being the usual self
Although, I forgot that it is slow irrespective of the database backend. Maybe we are doing way too many queries?
Too many string comparisons
We all know string comparisons are slow, but how slow could they be?
Something wasteful being done over and over again.
Here is a rough estimate of the time it takes to import mailman2.1's config.pck for two lists:
151 members: 58 seconds
1429 members: 9 minutes
This is quote slow, 9 minutes is a lot. So, I set out to do the usual python profiling using the standard library cProfile
module and only wrapped it around mailman.utilities.importers._import_roster
. That method is the slowest one since if you have run the the command, you know it takes the maximum amount of time importing the list of members.
Without even looking at the entire output, the problem was apparent and none of the ones that I guessed before:
ncalls tottime percall cumtime percall filename:lineno(function) 1 0.009 0.009 50.692 50.692 /home/maxking/Documents/mm3/core/src/mailman/utilities/importer.py:600(_import_roster) 151 0.001 0.000 45.691 0.303 /home/maxking/Documents/mm3/core/src/mailman/utilities/passwords.py:35(encrypt)
90% of the time is spent trying to encrypt user passwords, for each of the imported member. Well, duh, encryption is an expensive operation and when you do that once per-imported member, it is definitely going to be slow.
Mailman 3 uses passlib1 for crypto and so I set out to figure out if there is a hashing algorithm which can do this much faster and perhaps has a C library wrapper that we can use to speed things up. I settled on argon2 cipher with a supporting library argon2_cffi. Then I changed the config and tried the imports again:
151 members: 15.884 seconds
1429 memebrs: 2minutes 29 seconds
That was a significant improvement over the previous numbers.
Although, another interesting fact is the user passwords are kind of useless in Mailman 3. In Mailman 2 you had to setup a password or one was auto-generated for you per-list and you needed that to login to the web ui. However, in Mailman 3, the passwords (in Core's database) aren't used for logging in since Web Frontend stores the authentication tokens (social auth or passwords). In fact, the users who sign up first time on Mailman 3 probably don't ever have a password set in Mailman Core's database.
So, I commented out the code that actually imports the password(src/mailman/utilities/importer.py#L663-664) and the import speed improved even more, obviously:
151 members: 4 seconds
1429 members: 57 seconds
I am hoping that I can commit the change with the commented out code, unless I am reminded of a use for the passwords in Core's database. Then, it might be a bit more of work trying to figure out another way to improve the speed.
Thanks for reading up!
-- thanks, Abhilash Raj (maxking)