[python-ldap] Performance improvement for schema.tokenizer.split_tokens

Sat Feb 18 05:08:16 EST 2017

Could you please also test with Tests/t_ldap_schema_tokenizer.py in recent python-ldap
2.4.32? Maybe you already did.

Ciao, Michael.

Christian Heimes wrote:
> I have been running into performance issues with split_tokens from the
> schema parser. The first request to a new WSGI process spends about 25
> to 30% in split_tokens() while parsing LDAP schema. Consecutive requests
> benefit from a schema cache.
> 
> I was able to come up with a new implementation of split_tokens() which
> is about 8 times faster on Python 2. The new implementation uses a
> regular expression to split the schema string into tokens. It is
> successfully able to parse over 3,000 schema lines from 389-DS and
> FreeIPA with same result as the curent split_tokens() function.
> Personally I find it easier to read and understand, too.
> 
> Please review my implementation and consider it for python-ldap.
> 
> Implementation with tests:
> https://github.com/tiran/fast_split_tokens
> 
> Background information:
> https://github.com/pyldap/pyldap/issues/85
> https://fedorahosted.org/freeipa/ticket/6679
> 
> Regards,
> Christian

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3829 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://mail.python.org/pipermail/python-ldap/attachments/20170218/6279570e/attachment.bin>