[Doc-SIG] Building Python Document 30% faster.

Georg Brandl g.brandl at gmx.net
Sat Apr 4 16:42:04 CEST 2009


稲田直哉 schrieb:
> Hi, all.
> 
> I'm a member of Japanese translate of Python document Project.
> We complete translating Python 2.5 document last year and now
> work for Python 2.6 Document.
> 
> I feel building document is slow a little. So I try to tune docutils
> and Sphinx.

Great! I've already started tuning a bit with the docutils Node.traverse()
patch, but did not do much more than that.

> Attached patches make building document 30% faster.
> (In my environ. 330sec -> 220sec roughly)
> 
> I post sphinx.patch to bitbucket, but I don't know where to post docutils.patch.
> Could anyone review these patch?

I will, when I have a bit more time.

> These patches changes following:
> 
> 1. Use PyStemmer instead of PorterStemmer.
> PorterStemmer is implemented Python and consumes about 50seconds
> during buid.
> PyStemmer <http://pypi.python.org/pypi/PyStemmer/1.0.1> implemented in C
> and consumes only 7 seconds.
> 
> But searchindex.js with PyStemmer is different to one with PorterStemmer.

This could be a problem.  The client-side search implemented in JavaScript
uses exactly the same stemmer (which is necessary to be able to find all
words).  In short, if you can find a C implementation of the Porter stemmer
we could include it in Sphinx as an optional extension.

> 2. Avoid building OptionParser many times.
> Sphinx uses docutils.core.publish_parts() without `settings` argument
> many times.
> This causes building docutils.frontend.OptionParser many times and consumes
> 29 seconds.
> 
> 3. Avoid building NestedStateMachine many times.
> NestedStateMachine is built and destroyed many times.
> Recycling that SM make significant performance gain.

I assume that both of this is in the second commit I see on bitbucket?  Both
look like a worthy optimization.

Thanks,
Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.



More information about the Doc-SIG mailing list