[Moin-user] Very slow page saving caused by notifications

Steve McIntyre steve at einval.com
Sun Apr 15 16:56:06 EDT 2012


On Sun, Apr 15, 2012 at 10:18:07PM +0200, Paul Boddie wrote:
>On Sunday 15 April 2012 21:44:34 Steve McIntyre wrote:
>>
>> I've posted a bug at
>>
>>   http://moinmo.in/MoinMoinBugs/SubscribedPagesPerformanceProblem
>>
>> Quick summary:
>>
>> On a site with a large number of registered users
>> (e.g. wiki.debian.org), saving a page taks a very long time. With a
>> large number of users, the design of the page subscription system
>> doesn't scale well. Saving a page works well, but moin then scans all
>> the user data files looking for the subscribed_pages data. With
>> thousands of users registered, this can take a very long time; we're
>> seeing > 90 seconds on a wiki with more than 10,000 users.
>
>I can see that the offending code is in MoinMoin/Page.py, specifically the 
>getSubscribers method of the Page class. This looks like a classic case of 
>needing to "invert" the way the data is stored so that it can be queried more 
>efficiently - it's a bit like comparing the standard text search 
>functionality with Xapian-based searching, where the former relies on 
>scanning pages sequentially (pages yield terms), whereas the latter employs 
>such "inverted" storage of queryable information (terms yield pages).

Yup, exactly.

>> This area needs fixing in some way - maybe add a cache in front of the
>> user lookup here, or store the subscribed_pages information
>> differently. I might be able to help with coding this, but I'd want to
>> see what other people think first in terms of a design.
>>
>> What do people think?
>
>I'd be inclined to index the subscription information so that there's a more 
>efficiently queryable structure (pages yielding subscribers) that can be used 
>in preference to the existing approach. Having subscriptions amend the index 
>when created would eliminate any need for periodic reindexing, and I think 
>you could implement this by having an event handler that can handle the 
>SubscribedToPageEvent type of event.

OK, I'll have a play at that now and see if I can get it working.

-- 
Steve McIntyre, Cambridge, UK.                                steve at einval.com
  Getting a SCSI chain working is perfectly simple if you remember that there
  must be exactly three terminations: one on one end of the cable, one on the
  far end, and the goat, terminated over the SCSI chain with a silver-handled
  knife whilst burning *black* candles. --- Anthony DeBoer





More information about the Moin-user mailing list