Performance penalty for using python-ldap
Bjørn Ove Grøtan
bjorn.grotan at itea.ntnu.no
Mon Aug 4 18:44:05 CEST 2003
Ed .:
> Hi,
>
> I was tuning an LDAP directory for a client last week and had cause to run
> some before and after benchmarks.
>
> Basically for a 3000 entry directory I wrote a python script which did the
> following:
>
> listed each entry using the filter (cn=*) using python-ldap and also
> invoking the shell to use the ldapsearch command. These were done twice:
> running all attributes an just returning the cn attribute
>
> did 3000 random lookups using (cn=exact-match), and then (cn=exact-match*)
> again using python-ldap and the ldapsearch command.
>
> The searches were run twice on unloaded machines, the first time to
> populate caches, the second time as a rough best-performance figure
>
> The findings were somewhat surprising.
>
> In the list whole directory search. ldap-search was generally and
> consistently at least 30% faster than python-ldap. I.e. these figures apply
> before and after tuning the directory. Remember the python searches are
> pre-bound while ldapsearch binds each time it is called.
>
> In the random lookup test, the performance figures were comparable but this
> compares calling python-ldap to do a search against spawning a shell,
> running ldpasearch, binding then doing the search, i.e. the command line
> search has a LOT more overhead.
>
> I'm happy to run some tests to identify the cause to see if we can fix it,
> any suggestions where to start?
Just ran through some tests of my own. I have a function that leaps
through one ou (ou=users,dc=mydomain,dc=com), finds every entry with
objectclass=posixAccount and get the uid of that account using the
function search_ext_s
The result from the search is asigned a variable named 'res', and then
the funcion quits. Execution time from this python-script is aprox.
1m30s. Running the same query with ldapsearch from OpenLDAP-package, the
query executes, prints output to console and exits in some 30s or so.
Total amount of entries found with that objectClass is aprox. 27k.
> General conclusions from my tests:
>
> python-ldap has a suprising performance penalty
>
> searching is helped by having ample cache (doh!)
I'm using OpenLDAP 2.1.22 with BerkeleyDB 4.1.25 with latest patch.
While using BDB as backend, OpenLDAP cannot handle caching - BDB has to.
Found a mail on the openldap-software at -mailinglist describing this issue
and how to setup caching for BDB. That did not help my ldap-search from
withing python, nor with ldapsearch.
> returning 1 attribute is much faster than returning all of them (doh!)
Here I got help from a fellow worker...
Well... when running ldapsearch (from OpenLDAP) and supply the argument
'-S uid' for sorting, we got a result in about 40s. When applying the
argument -S - it seems like ldapsearch fetches one by one into some
kind of datastructure and sort it before viewing.
Now, it seems like the python-ldap module does the same thing. Fetching
the result - one by one. Like ldapsearch does with -S <attr> as
argument.
Is there any way to force search_ext_s to fetch all at once and not one
by one other than changing the source-code to pyton-ldap?
--
Regards
Bjørn Ove Grøtan
"Resistance is futile. You will be assimilated."
More information about the python-ldap
mailing list