[Mailman-Developers] Re: External subscriber lists in 2.1

Dale Newfield Dale@Newfield.org
Fri, 16 Aug 2002 12:56:13 -0400 (EDT)


On Sun, 23 Dec 2001, Barry A. Warsaw wrote:
> Note that if it was too expensive for getRegularMemberKeys() to return
> an in-memory list, it could (if you use Python 2.2) return an iterator
> object that implemented things in a more efficient manner, e.g. by
> paging through blocks.  I believe that any place where we expect a
> Python sequence (list) we could probably accept an iterator.

Do you know if this is in fact the case?  Is this how I should implement
it?

(While I've hacked in python fairly regularly, I've not done serious
Python development since ~1996, so some of these questions will be fairly
basic "How's it work in Python2.2?" ones.  Of course, this brings up the
question of whether or not it's reasonable to require that--don't we
currently only require 2.1.3?)

Below are different ways the results of "get{Regular,Digest}MemberKeys"
and "getMembers" are used in various places in the codebase.  Will each of
them still work if those methods return an iterator instead of a list?
(For example, does len(iterator) work?)

I'm guessing that a good answer might be that we want to add more
general-purpose accessor methods in class MemberAdaptor (and have the rest
of the codebase use those where appropriate) so that if there's a more
efficient way for a specific backend to provide specific data, it is able
to do so.  I'll include the two methods I propose at the end of this
message.

in Handlers/CalcRecips.py:
    # Calculate the regular recipients of the message
    recips = [mlist.getMemberCPAddress(m)
              for m in mlist.getRegularMemberKeys()
              if mlist.getDeliveryStatus(m) == ENABLED]
and:
            recips = mlist.getMemberCPAddresses(mlist.getRegularMemberKeys() +
                                                mlist.getDigestMemberKeys())

in Cgi/admin.py:
        if not mlist.nondigestable and mlist.getRegularMemberKeys():
and:
        if addr in mlist.getRegularMemberKeys():
and:
    # If there are more members than allowed by chunksize, then we split the
    # membership up alphabetically.  Otherwise just display them all.
    chunksz = mlist.admin_member_chunksize
    all = mlist.getMembers()
    all.sort(lambda x, y: cmp(x.lower(), y.lower()))
then:
            # BAW: There's got to be a more efficient way of doing this!
            names = [mlist.getMemberName(s) or '' for s in all]
            all = [a for n, a in zip(names, all)
                   if cre.search(n) or cre.search(a)]

in HTMLFormatter.py:
            members = self.getRegularMemberKeys()
            for m in members:
                if not self.getMemberOption(m, conceal_sub):
                    people.append(m)
            num_concealed = len(members) - len(people)
and:
        member_len = len(self.getRegularMemberKeys())





def getNumMembers(self, type, status, options, regexp=None):
    """Get the number of members of this mailing list matching type and
    status, and optionally matching the regular expression passed in.

    type is one of the module constants REGULAR, DIGEST, or EITHER.
    The tally should include just the appropriate type of members.

    status is a list containing some subset of the values ENABLED,
    UNKNOWN, BYUSER, BYADMIN, BYBOUNCE.  The tally should include only
    members whose status is in that list.

    options is a dictionary containing some number of {flag:boolean}
    pairs.  Only members with values matching that specified for each
    flag in the dictionary should be included in the tally.

    regexp is a string containing a regular expression to use as a filter.
    If this is not None, only members whose CPE or NAME match the regexp
    should be included in the tally.  (I don't have in mind a more
    efficient way to implement this in the SQL MemberAdaptor, unless the
    only non-token element in the regexp is ".*", as that could be matched
    using sql's wild-card "%".  Just because it doesn't result in a more
    efficient implementation in this case doesn't mean it shouldn't be
    part of the interface, though.)
    """
    raise NotImplemented

def getMemberIterator(self, type, status, options, style, order, regexp=None):
    """Get an iterator of members of this mailing list matching type and
    status, and optionally matching the regular expression passed in.

    type, status, options, and regexp are as used in getNumMembers().

    style is what content the iterator should contain: KEY, LCE, CPE, or
    NAME   (If NAME, and some users have no RealName set, those
    iterator entries will be None.)

    order specifies in which order those items should be returned by the
    iterator: KEY, LCE, CPE, or NAME  (This part hasn't been thought
    through as thoroughly--are there other interesting orderings?)
    """
    raise NotImplemented




The rest of this message is for context since I'm responding to something
8 months old.

On Sun, 23 Dec 2001, Barry A. Warsaw wrote:
> >>>>> "JCL" == J C Lawrence <claw@kanga.nu> writes:
>     JCL> In 2.1 when used with external subscriber storage (eg SQL),
>     JCL> will the new equivalent of qrunner request and load the
>     JCL> entire subscriber DB onto the heap prior to broadcast?
>
> That's really up to the implementation of the MemberAdaptor interface
> for SQL (but fwiw, I'm not aware of such a beast).  Mailman's
> CalcrRecips module loops through all the member addresses in a list
> comprehension, but all other information is requested a member at a
> time.
>
>     JCL> ObExcuse: Chap on -users asking about millions of
>     JCL> subscribers.
>
> Cool! :)