[python-ldap] ldap.OPT_DESC, async ops and paged search controls

Mark R Bannister mark at proseconsulting.co.uk
Thu Jan 22 14:33:16 CET 2015


On 21/01/2015 09:16, Michael Ströder wrote:
> Mark R Bannister wrote:
>> I've been using the new ldap.OPT_DESC feature introduced in python-ldap 2.4.17
>> and have a question concerning the use of it with asynchronous search
>> operations and paged search controls.
> Never used that myself.
>
> Why are you using paging? This only makes sense if you want to retrieve more
> than 1000 entries from MS AD.

I've tested this now without paged search controls and get the same problem.

The directory server is OpenLDAP 2.4.30 on Solaris 11.2 (it's listening 
on localhost).

I've attached two test scripts that examine this problem.  The output of 
testldap1.py run on a test DN that has 7 child entries:

    ldap.bind complete, fd 4
    ldap.search started
    calling select with ([4], [], []) , returned: ([4], [], [])
    ldap.result returned: (97, [])
    calling select with ([4], [], []) , returned: ([4], [], [])
    ldap.result returned: (None, None)
    calling select with ([4], [], []) , returned: ([4], [], [])
    ldap.result returned: (None, None)
    calling select with ([4], [], []) , returned: ([4], [], [])
    ldap.result returned: (None, None)
    calling select with ([4], [], []) , returned: ([4], [], [])
    ldap.result returned: (None, None)
    calling select with ([4], [], []) , returned: ([4], [], [])
    ldap.result returned: (None, None)
    calling select with ([4], [], []) , returned: ([4], [], [])
    ldap.result returned: (None, None)
    calling select with ([4], [], []) , returned: ([4], [], [])
    ldap.result returned: (None, None)
    calling select with ([4], [], []) , returned: ([4], [], [])
    ldap.result returned: (101, [ ... results here ... ])

So why is select being woken up once per entry?  Why isn't select being 
woken up once with the entire result set?  This is a simple case, and 
for 7 entries one could argue it doesn't matter, but when I'm dealing 
with 80,000 entries, that's a lot of unnecessary wake-ups.

>> Does this seem right to you and is there anyway to optimise this? All 80,000
>> entries are taking about 15 seconds to read into Python using the python-ldap
>> module compared with 5 seconds for native C.
> For better comparison of the numbers could you please also test Python code
> without using the ldap.OPT_DESC feature:
>
> 1. using LDAPObject.search_ext_s()
>
> 2. using ldap.resiter
>
> 3. 1. and 2. with and without paging
>
> I'd also try to see what wakes up the select() by using wireshark.
>
> Ciao, Michael.
>
>
I can't use wireshark for this one because it's localhost.  However, 
truss works fine for me and demonstrates that each time ldap.result() is 
called it reads one entry from the fd, then returns [None, None].  I 
have to call ldap.result() as many times as there are entries before I 
actually get any meaningful results.  This doesn't seem right to me.

If I run the second attached test script (testldap2.py) and run truss on 
that, I see the same behaviour under the covers - a poll() and a 
separate read() for each individual entry.  I haven't tried 
ldap.resiter, but I'm sure it's the same as it uses LDAPObject.result3().

Perhaps this is some tuning I've missed from OpenLDAP, which seems to 
want to drip-feed one result at a time ... or is this normal?  It seems 
quite inefficient to me.

Thanks,
Mark.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ldap/attachments/20150122/500c8ce5/attachment.html>
-------------- next part --------------
#!/usr/bin/python
# LDAP async search for all entries directly underneath given base
import sys
import ldap
from select import select

base = sys.argv[1]
filter = "objectclass=*"
scope = ldap.SCOPE_ONELEVEL

l = ldap.initialize("ldap://localhost")
l.bind("", "")
fd = l.get_option(ldap.OPT_DESC)
print "ldap.bind complete, fd %d" % fd
l.search(base, scope, filter)
print "ldap.search started"
d = None
while d is None or len(d) == 0:
    print "calling select with ([%d], [], [])" % fd,
    sys.stdout.flush()
    r, w, e = select([fd], [], [])
    print ", returned: %s" % str((r, w, e))
    (t, d) = l.result(timeout=0)
    print "ldap.result returned: %s" % str((t, d))

l.unbind()
-------------- next part --------------
#!/usr/bin/python
# LDAP async search for all entries directly underneath given base
import sys
import ldap

base = sys.argv[1]
filter = "objectclass=*"
scope = ldap.SCOPE_ONELEVEL

l = ldap.initialize("ldap://localhost")
l.bind_s("", "")
print "ldap.bind complete"
result = l.search_s(base, scope, filter)
print "ldap.search returned: %s" % str(result)
l.unbind()


More information about the python-ldap mailing list