Performance penalty for using python-ldap

Mon Aug 4 18:44:05 CEST 2003

Ed .:
> Hi,
> 
> I was tuning an LDAP directory for a client last week and had cause to run 
> some before and after benchmarks.
> 
> Basically for a 3000 entry directory I wrote a python script which did the 
> following:
> 
> listed each entry using the filter (cn=*) using python-ldap and also 
> invoking the shell to use the ldapsearch command. These were done twice: 
> running all attributes an just returning the cn attribute
> 
> did 3000 random lookups using (cn=exact-match), and then (cn=exact-match*) 
> again using python-ldap and the ldapsearch command.
> 
> The searches were run twice on unloaded machines, the first time to 
> populate caches, the second time as a rough best-performance figure
> 
> The findings were somewhat surprising.
> 
> In the list whole directory search. ldap-search was generally and 
> consistently at least 30% faster than python-ldap. I.e. these figures apply 
> before and after tuning the directory. Remember the python searches are 
> pre-bound while ldapsearch binds each time it is called.
> 
> In the random lookup test, the performance figures were comparable but this 
> compares calling python-ldap to do a search against spawning a shell, 
> running ldpasearch, binding then doing the search, i.e. the command line 
> search has a LOT more overhead.
> 
> I'm happy to run some tests to identify the cause to see if we can fix it, 
> any suggestions where to start?

Just ran through some tests of my own. I have a function that leaps
through one ou (ou=users,dc=mydomain,dc=com), finds every entry with
objectclass=posixAccount and get the uid of that account using the
function search_ext_s

The result from the search is asigned a variable named 'res', and then
the funcion quits. Execution time from this python-script is aprox.
1m30s. Running the same query with ldapsearch from OpenLDAP-package, the
query executes, prints output to console and exits in some 30s or so. 

Total amount of entries found with that objectClass is aprox. 27k.

> General conclusions from my tests:
> 
> python-ldap has a suprising performance penalty
> 
> searching is helped by having ample cache (doh!)

I'm using OpenLDAP 2.1.22 with BerkeleyDB 4.1.25 with latest patch.
While using BDB as backend, OpenLDAP cannot handle caching - BDB has to.
Found a mail on the openldap-software at -mailinglist describing this issue
and how to setup caching for BDB. That did not help my ldap-search from
withing python, nor with ldapsearch.

> returning 1 attribute is much faster than returning all of them (doh!)

Here I got help from a fellow worker...
Well... when running ldapsearch (from OpenLDAP) and supply the argument
'-S uid' for sorting, we got a result in about 40s. When applying the
argument -S - it seems like ldapsearch fetches one by one into some
kind of datastructure and sort it before viewing. 

Now, it seems like the python-ldap module does the same thing. Fetching
the result - one by one. Like ldapsearch does with -S <attr> as
argument. 

Is there any way to force search_ext_s to fetch all at once and not one
by one other than changing the source-code to pyton-ldap?

--
Regards

Bjørn Ove Grøtan
"Resistance is futile.  You will be assimilated."