[python-ldap] Performance improvement for schema.tokenizer.split_tokens
Michael Ströder
michael at stroeder.com
Sat Feb 18 05:08:16 EST 2017
Could you please also test with Tests/t_ldap_schema_tokenizer.py in recent python-ldap
2.4.32? Maybe you already did.
Ciao, Michael.
Christian Heimes wrote:
> I have been running into performance issues with split_tokens from the
> schema parser. The first request to a new WSGI process spends about 25
> to 30% in split_tokens() while parsing LDAP schema. Consecutive requests
> benefit from a schema cache.
>
> I was able to come up with a new implementation of split_tokens() which
> is about 8 times faster on Python 2. The new implementation uses a
> regular expression to split the schema string into tokens. It is
> successfully able to parse over 3,000 schema lines from 389-DS and
> FreeIPA with same result as the curent split_tokens() function.
> Personally I find it easier to read and understand, too.
>
> Please review my implementation and consider it for python-ldap.
>
> Implementation with tests:
> https://github.com/tiran/fast_split_tokens
>
> Background information:
> https://github.com/pyldap/pyldap/issues/85
> https://fedorahosted.org/freeipa/ticket/6679
>
> Regards,
> Christian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3829 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://mail.python.org/pipermail/python-ldap/attachments/20170218/6279570e/attachment.bin>
More information about the python-ldap
mailing list