Improving the speed of mailman import21
Hi All,
This morning, I set out to improve the performance of "mailman import21" command. If you have used it in the past, you will know that it is slow. Until now, I never had an idea about why? Here were my ideas:
Too many database calls and sqlite3 being the usual self
Although, I forgot that it is slow irrespective of the database backend. Maybe we are doing way too many queries?
Too many string comparisons
We all know string comparisons are slow, but how slow could they be?
Something wasteful being done over and over again.
Here is a rough estimate of the time it takes to import mailman2.1's config.pck for two lists:
151 members: 58 seconds
1429 members: 9 minutes
This is quote slow, 9 minutes is a lot. So, I set out to do the usual python profiling using the standard library cProfile
module and only wrapped it around mailman.utilities.importers._import_roster
. That method is the slowest one since if you have run the the command, you know it takes the maximum amount of time importing the list of members.
Without even looking at the entire output, the problem was apparent and none of the ones that I guessed before:
ncalls tottime percall cumtime percall filename:lineno(function) 1 0.009 0.009 50.692 50.692 /home/maxking/Documents/mm3/core/src/mailman/utilities/importer.py:600(_import_roster) 151 0.001 0.000 45.691 0.303 /home/maxking/Documents/mm3/core/src/mailman/utilities/passwords.py:35(encrypt)
90% of the time is spent trying to encrypt user passwords, for each of the imported member. Well, duh, encryption is an expensive operation and when you do that once per-imported member, it is definitely going to be slow.
Mailman 3 uses passlib1 for crypto and so I set out to figure out if there is a hashing algorithm which can do this much faster and perhaps has a C library wrapper that we can use to speed things up. I settled on argon2 cipher with a supporting library argon2_cffi. Then I changed the config and tried the imports again:
151 members: 15.884 seconds
1429 memebrs: 2minutes 29 seconds
That was a significant improvement over the previous numbers.
Although, another interesting fact is the user passwords are kind of useless in Mailman 3. In Mailman 2 you had to setup a password or one was auto-generated for you per-list and you needed that to login to the web ui. However, in Mailman 3, the passwords (in Core's database) aren't used for logging in since Web Frontend stores the authentication tokens (social auth or passwords). In fact, the users who sign up first time on Mailman 3 probably don't ever have a password set in Mailman Core's database.
So, I commented out the code that actually imports the password(src/mailman/utilities/importer.py#L663-664) and the import speed improved even more, obviously:
151 members: 4 seconds
1429 members: 57 seconds
I am hoping that I can commit the change with the commented out code, unless I am reminded of a use for the passwords in Core's database. Then, it might be a bit more of work trying to figure out another way to improve the speed.
Thanks for reading up!
-- thanks, Abhilash Raj (maxking)
Abhilash Raj writes:
90% of the time is spent trying to encrypt user passwords, for each of the imported member. Well, duh, encryption is an expensive operation and when you do that once per-imported member, it is definitely going to be slow.
Why are we storing unencrypted passwords at all? Passwords are pretty low-security in any case, but this is asking for trouble.
Although, another interesting fact is the user passwords are kind of useless in Mailman 3. In Mailman 2 you had to setup a password or one was auto-generated for you per-list and you needed that to login to the web ui. However, in Mailman 3, the passwords (in Core's database) aren't used for logging in since Web Frontend stores the authentication tokens (social auth or passwords). In fact, the users who sign up first time on Mailman 3 probably don't ever have a password set in Mailman Core's database.
I'll trust you on that. Although it suggests the question, if nobody has a password, why does it take so much time to encrypt no passwords?
So, I commented out the code that actually imports the password(src/mailman/utilities/importer.py#L663-664)
I'm happy with this. This is a major breaking change *if* anyone is using core passwords which they probably aren't, but it deserves flashing lights and sirens in the release announcements.
Steve
-- Associate Professor Division of Policy and Planning Science http://turnbull.sk.tsukuba.ac.jp/ Faculty of Systems and Information Email: turnbull@sk.tsukuba.ac.jp University of Tsukuba Tel: 029-853-5175 Tennodai 1-1-1, Tsukuba 305-8573 JAPAN
On Sun, Oct 6, 2019, at 8:24 PM, Stephen J. Turnbull wrote:
Abhilash Raj writes:
90% of the time is spent trying to encrypt user passwords, for each of the imported member. Well, duh, encryption is an expensive operation and when you do that once per-imported member, it is definitely going to be slow.
Why are we storing unencrypted passwords at all? Passwords are pretty low-security in any case, but this is asking for trouble.
We store un-encrypted passwords in Mailman 2.1. Today, they are encrypted when we import lists to Mailman 3. Although, the process is a bit weird because previously it used to be one password per user-mailinglist pair in 2.1. Now, since we store passwords for "User" (instead of user-mailinglinst pair, a.k.a "Member"), each new imported list overrides every user's password with the one set for that specific list in Mailman 2.1.
If these passwords are being used somewhere, I am sure it is already in a broken state ;).
Although, another interesting fact is the user passwords are kind of useless in Mailman 3. In Mailman 2 you had to setup a password or one was auto-generated for you per-list and you needed that to login to the web ui. However, in Mailman 3, the passwords (in Core's database) aren't used for logging in since Web Frontend stores the authentication tokens (social auth or passwords). In fact, the users who sign up first time on Mailman 3 probably don't ever have a password set in Mailman Core's database.
I'll trust you on that. Although it suggests the question, if nobody has a password, why does it take so much time to encrypt no passwords?
It is mostly when importing lists from 2.1 that it takes time to encrypt. In 2.1, everyone has to have a password to get access to web ui, either manually setup or auto generated.
So, I commented out the code that actually imports the password(src/mailman/utilities/importer.py#L663-664)
I'm happy with this. This is a major breaking change *if* anyone is using core passwords which they probably aren't, but it deserves flashing lights and sirens in the release announcements.
Yep, I'll make a note of that and make sure to add it to release announcement.
Although, this shouldn't be a breaking change for anyone using the Core passwords. It would only affect people porting lists over from Mailman 2.1 and hoping that the password for their last imported list would work for the Users, which is already going to be difficult for them.
Steve
-- Associate Professor Division of Policy and Planning Science http://turnbull.sk.tsukuba.ac.jp/ Faculty of Systems and Information Email: turnbull@sk.tsukuba.ac.jp University of Tsukuba Tel: 029-853-5175 Tennodai 1-1-1, Tsukuba 305-8573 JAPAN
-- thanks, Abhilash Raj (maxking)
On 10/6/19 10:11 AM, Abhilash Raj wrote:
I am hoping that I can commit the change with the commented out code, unless I am reminded of a use for the passwords in Core's database. Then, it might be a bit more of work trying to figure out another way to improve the speed.
I'm not at all sure what's actually implemented, but there is a feature for pre-approving a post with an Approved: header with a password. This is also supposed to work to approve held posts, but approving/discarding held posts by email is broken anyway[1].
Lists have a moderator_password attribute which is an encrypted version of a plain text password that can be used for this purpose, but the original intent IIRC was that this could be the password of the user sending the mail and would be accepted if the user was an owner or moderator. As I said, I'm not sure (don't think) this is implemented, and a much better approach is to abandon the Approved: header in favor of a pgp signature from an owner/moderator.
The other possible use for this password is if a user imported by import21 wants to authenticate to Django, she might be able to use this password. I don't think that's the case now.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On 10/7/19 5:37 PM, Mark Sapiro wrote:
I'm not at all sure what's actually implemented, but there is a feature for pre-approving a post with an Approved: header with a password. This is also supposed to work to approve held posts, but approving/discarding held posts by email is broken anyway[1].
Forgot the reference
[1] https://gitlab.com/mailman/mailman/issues/169
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On Mon, Oct 7, 2019, at 5:37 PM, Mark Sapiro wrote:
On 10/6/19 10:11 AM, Abhilash Raj wrote:
I am hoping that I can commit the change with the commented out code, unless I am reminded of a use for the passwords in Core's database. Then, it might be a bit more of work trying to figure out another way to improve the speed.
I'm not at all sure what's actually implemented, but there is a feature for pre-approving a post with an Approved: header with a password. This is also supposed to work to approve held posts, but approving/discarding held posts by email is broken anyway[1].
Lists have a moderator_password attribute which is an encrypted version of a plain text password that can be used for this purpose, but the original intent IIRC was that this could be the password of the user sending the mail and would be accepted if the user was an owner or moderator. As I said, I'm not sure (don't think) this is implemented, and a much better approach is to abandon the Approved: header in favor of a pgp signature from an owner/moderator.
That's correct, it does seem to be implemented today but using the moderator password.
I agree that it is better implemented using gpg signatures instead of passwords.
The other possible use for this password is if a user imported by import21 wants to authenticate to Django, she might be able to use this password. I don't think that's the case now.
I don't think we should be doing this, it is better than the migration allows for a new more secure password than re-using old ones, which have been sent out over email in past.
It is tricky how multiple-password world get translated to single-password world, it makes the final password somewhat non-deterministic, depending on what the last mailing list imported was, which does not sound right anyway.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Mailman-Developers mailing list -- mailman-developers@python.org To unsubscribe send an email to mailman-developers-leave@python.org https://mail.python.org/mailman3/lists/mailman-developers.python.org/ Mailman FAQ: https://wiki.list.org/x/AgA3
Security Policy: https://wiki.list.org/x/QIA9
-- thanks, Abhilash Raj (maxking)
On Oct 7, 2019, at 21:51, Abhilash Raj <maxking@asynchronous.in> wrote:
It is tricky how multiple-password world get translated to single-password world, it makes the final password somewhat non-deterministic, depending on what the last mailing list imported was, which does not sound right anyway.
Maybe the answer is to simply not import passwords, and ask that users reset them if they need it, which they probably won’t.
-Barry
participants (4)
-
Abhilash Raj
-
Barry Warsaw
-
Mark Sapiro
-
Stephen J. Turnbull