From inada-n at klab.jp Sat Apr 4 15:57:25 2009 From: inada-n at klab.jp (=?ISO-2022-JP?B?GyRCMHBFREQ+OkgbKEI=?=) Date: Sat, 4 Apr 2009 22:57:25 +0900 Subject: [Doc-SIG] Building Python Document 30% faster. Message-ID: <2ca3242c0904040657j6f708f34nbcb8e51fb3393559@mail.gmail.com> Hi, all. I'm a member of Japanese translate of Python document Project. We complete translating Python 2.5 document last year and now work for Python 2.6 Document. I feel building document is slow a little. So I try to tune docutils and Sphinx. Attached patches make building document 30% faster. (In my environ. 330sec -> 220sec roughly) I post sphinx.patch to bitbucket, but I don't know where to post docutils.patch. Could anyone review these patch? These patches changes following: 1. Use PyStemmer instead of PorterStemmer. PorterStemmer is implemented Python and consumes about 50seconds during buid. PyStemmer implemented in C and consumes only 7 seconds. But searchindex.js with PyStemmer is different to one with PorterStemmer. 2. Avoid building OptionParser many times. Sphinx uses docutils.core.publish_parts() without `settings` argument many times. This causes building docutils.frontend.OptionParser many times and consumes 29 seconds. 3. Avoid building NestedStateMachine many times. NestedStateMachine is built and destroyed many times. Recycling that SM make significant performance gain. == before == ncalls tottime percall cumtime percall filename:lineno(function) 25720/459 0.997 0.000 134.085 0.292 tools/docutils/statemachine.py:178(run) 92281/1513 1.420 0.000 133.935 0.089 tools/docutils/statemachine.py:384(check_line) 25720 0.184 0.000 89.628 0.003 tools/docutils/statemachine.py:129(__init__) 25720 0.632 0.000 89.444 0.003 tools/docutils/statemachine.py:448(add_states) 385800 1.665 0.000 88.813 0.000 tools/docutils/statemachine.py:436(add_state) 385800 2.356 0.000 85.287 0.000 tools/docutils/statemachine.py:928(__init__) 385800 1.793 0.000 82.931 0.000 tools/docutils/statemachine.py:566(__init__) == after == ncalls tottime percall cumtime percall filename:lineno(function) 25720/459 1.051 0.000 68.175 0.149 tools/docutils/statemachine.py:178(run) 92281/1513 1.405 0.000 68.024 0.045 tools/docutils/statemachine.py:384(check_line) 6862 0.031 0.000 24.241 0.004 tools/docutils/statemachine.py:129(__init__) 6862 0.174 0.000 24.210 0.004 tools/docutils/statemachine.py:448(add_states) 102930 0.430 0.000 24.036 0.000 tools/docutils/statemachine.py:436(add_state) 102930 0.633 0.000 23.162 0.000 tools/docutils/statemachine.py:928(__init__) 102930 0.549 0.000 22.529 0.000 tools/docutils/statemachine.py:566(__init__) -------------- next part -------------- A non-text attachment was scrubbed... Name: sphinx.patch Type: application/octet-stream Size: 3930 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: docutils.patch Type: application/octet-stream Size: 1923 bytes Desc: not available URL: From g.brandl at gmx.net Sat Apr 4 16:42:04 2009 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 04 Apr 2009 16:42:04 +0200 Subject: [Doc-SIG] Building Python Document 30% faster. In-Reply-To: <2ca3242c0904040657j6f708f34nbcb8e51fb3393559@mail.gmail.com> References: <2ca3242c0904040657j6f708f34nbcb8e51fb3393559@mail.gmail.com> Message-ID: ???? schrieb: > Hi, all. > > I'm a member of Japanese translate of Python document Project. > We complete translating Python 2.5 document last year and now > work for Python 2.6 Document. > > I feel building document is slow a little. So I try to tune docutils > and Sphinx. Great! I've already started tuning a bit with the docutils Node.traverse() patch, but did not do much more than that. > Attached patches make building document 30% faster. > (In my environ. 330sec -> 220sec roughly) > > I post sphinx.patch to bitbucket, but I don't know where to post docutils.patch. > Could anyone review these patch? I will, when I have a bit more time. > These patches changes following: > > 1. Use PyStemmer instead of PorterStemmer. > PorterStemmer is implemented Python and consumes about 50seconds > during buid. > PyStemmer implemented in C > and consumes only 7 seconds. > > But searchindex.js with PyStemmer is different to one with PorterStemmer. This could be a problem. The client-side search implemented in JavaScript uses exactly the same stemmer (which is necessary to be able to find all words). In short, if you can find a C implementation of the Porter stemmer we could include it in Sphinx as an optional extension. > 2. Avoid building OptionParser many times. > Sphinx uses docutils.core.publish_parts() without `settings` argument > many times. > This causes building docutils.frontend.OptionParser many times and consumes > 29 seconds. > > 3. Avoid building NestedStateMachine many times. > NestedStateMachine is built and destroyed many times. > Recycling that SM make significant performance gain. I assume that both of this is in the second commit I see on bitbucket? Both look like a worthy optimization. Thanks, Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From fuzzyman at voidspace.org.uk Sat Apr 4 16:56:38 2009 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 04 Apr 2009 15:56:38 +0100 Subject: [Doc-SIG] Building Python Document 30% faster. In-Reply-To: <2ca3242c0904040657j6f708f34nbcb8e51fb3393559@mail.gmail.com> References: <2ca3242c0904040657j6f708f34nbcb8e51fb3393559@mail.gmail.com> Message-ID: <49D77526.40508@voidspace.org.uk> Hello, There is a docutils specific mailing list: docutils users You will need to subscribe from sourceforge, or you can just post your patch on sourceforge: http://docutils.sf.net Another patch was recently submitted by Georg Brandl offering a similar speedup. No idea if it is in the same area or not. All the best, Michael Foord ???? wrote: > Hi, all. > > I'm a member of Japanese translate of Python document Project. > We complete translating Python 2.5 document last year and now > work for Python 2.6 Document. > > I feel building document is slow a little. So I try to tune docutils > and Sphinx. > > Attached patches make building document 30% faster. > (In my environ. 330sec -> 220sec roughly) > > I post sphinx.patch to bitbucket, but I don't know where to post docutils.patch. > Could anyone review these patch? > > These patches changes following: > > 1. Use PyStemmer instead of PorterStemmer. > PorterStemmer is implemented Python and consumes about 50seconds > during buid. > PyStemmer implemented in C > and consumes only 7 seconds. > > But searchindex.js with PyStemmer is different to one with PorterStemmer. > > 2. Avoid building OptionParser many times. > Sphinx uses docutils.core.publish_parts() without `settings` argument > many times. > This causes building docutils.frontend.OptionParser many times and consumes > 29 seconds. > > 3. Avoid building NestedStateMachine many times. > NestedStateMachine is built and destroyed many times. > Recycling that SM make significant performance gain. > > == before == > ncalls tottime percall cumtime percall filename:lineno(function) > 25720/459 0.997 0.000 134.085 0.292 > tools/docutils/statemachine.py:178(run) > 92281/1513 1.420 0.000 133.935 0.089 > tools/docutils/statemachine.py:384(check_line) > 25720 0.184 0.000 89.628 0.003 > tools/docutils/statemachine.py:129(__init__) > 25720 0.632 0.000 89.444 0.003 > tools/docutils/statemachine.py:448(add_states) > 385800 1.665 0.000 88.813 0.000 > tools/docutils/statemachine.py:436(add_state) > 385800 2.356 0.000 85.287 0.000 > tools/docutils/statemachine.py:928(__init__) > 385800 1.793 0.000 82.931 0.000 > tools/docutils/statemachine.py:566(__init__) > > == after == > ncalls tottime percall cumtime percall filename:lineno(function) > 25720/459 1.051 0.000 68.175 0.149 > tools/docutils/statemachine.py:178(run) > 92281/1513 1.405 0.000 68.024 0.045 > tools/docutils/statemachine.py:384(check_line) > 6862 0.031 0.000 24.241 0.004 > tools/docutils/statemachine.py:129(__init__) > 6862 0.174 0.000 24.210 0.004 > tools/docutils/statemachine.py:448(add_states) > 102930 0.430 0.000 24.036 0.000 > tools/docutils/statemachine.py:436(add_state) > 102930 0.633 0.000 23.162 0.000 > tools/docutils/statemachine.py:928(__init__) > 102930 0.549 0.000 22.529 0.000 > tools/docutils/statemachine.py:566(__init__) > > ------------------------------------------------------------------------ > > _______________________________________________ > Doc-SIG maillist - Doc-SIG at python.org > http://mail.python.org/mailman/listinfo/doc-sig > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog From aahz at pythoncraft.com Sat Apr 4 17:43:26 2009 From: aahz at pythoncraft.com (Aahz) Date: Sat, 4 Apr 2009 08:43:26 -0700 Subject: [Doc-SIG] Building Python Document 30% faster. In-Reply-To: <49D77526.40508@voidspace.org.uk> References: <2ca3242c0904040657j6f708f34nbcb8e51fb3393559@mail.gmail.com> <49D77526.40508@voidspace.org.uk> Message-ID: <20090404154326.GA2629@panix.com> On Sat, Apr 04, 2009, Michael Foord wrote: > > There is a docutils specific mailing list: > > docutils users Actually, there are two docutils mailing lists, and I think that docutils-develop is probably more appropriate for this. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." --Brian W. Kernighan From inada-n at klab.jp Sat Apr 4 18:03:25 2009 From: inada-n at klab.jp (Naoki INADA) Date: Sun, 5 Apr 2009 01:03:25 +0900 Subject: [Doc-SIG] Building Python Document 30% faster. In-Reply-To: References: <2ca3242c0904040657j6f708f34nbcb8e51fb3393559@mail.gmail.com> Message-ID: <2ca3242c0904040903n6c640d9ep2cfb3570028be1fa@mail.gmail.com> Hi Georg. >> Attached patches make building document 30% faster. >> (In my environ. 330sec -> 220sec roughly) >> >> I post sphinx.patch to bitbucket, but I don't know where to post docutils.patch. >> Could anyone review these patch? > > I will, when I have a bit more time. Thank you. >> But searchindex.js with PyStemmer is different to one with PorterStemmer. > > This could be a problem. ?The client-side search implemented in JavaScript > uses exactly the same stemmer (which is necessary to be able to find all > words). ?In short, if you can find a C implementation of the Porter stemmer > we could include it in Sphinx as an optional extension. I see. Original Porter Stemmer is here. http://tartarus.org/~martin/PorterStemmer/ And that implemented in C. I'll try to make Python wrapper with swig and compare searchindex.js. Wait for a while. >> 2. Avoid building OptionParser many times. >> Sphinx uses docutils.core.publish_parts() without `settings` argument >> many times. >> This causes building docutils.frontend.OptionParser many times and consumes >> 29 seconds. >> >> 3. Avoid building NestedStateMachine many times. >> NestedStateMachine is built and destroyed many times. >> Recycling that SM make significant performance gain. > > I assume that both of this is in the second commit I see on bitbucket? ?Both > look like a worthy optimization. Former is in bitbucket. http://bitbucket.org/methane/sphinx-speedup/changeset/72fa0ceefcae/ And later is not in bitbucket because NestedStateMachine is not in Sphinx but docutils. -- Naoki INADA KLab Inc. From inada-n at klab.jp Sat Apr 4 18:06:10 2009 From: inada-n at klab.jp (Naoki INADA) Date: Sun, 5 Apr 2009 01:06:10 +0900 Subject: [Doc-SIG] Building Python Document 30% faster. In-Reply-To: <20090404154326.GA2629@panix.com> References: <2ca3242c0904040657j6f708f34nbcb8e51fb3393559@mail.gmail.com> <49D77526.40508@voidspace.org.uk> <20090404154326.GA2629@panix.com> Message-ID: <2ca3242c0904040906h6fd93e00n5ee5f65535430e82@mail.gmail.com> > On Sat, Apr 04, 2009, Michael Foord wrote: >> >> There is a docutils specific mailing list: >> >> docutils users > > Actually, there are two docutils mailing lists, and I think that > docutils-develop is probably more appropriate for this. OK. I'll subscribe both. -- Naoki INADA KLab Inc. From inada-n at klab.jp Sat Apr 4 23:01:35 2009 From: inada-n at klab.jp (Naoki INADA) Date: Sun, 5 Apr 2009 06:01:35 +0900 Subject: [Doc-SIG] Building Python Document 30% faster. In-Reply-To: <2ca3242c0904040903n6c640d9ep2cfb3570028be1fa@mail.gmail.com> References: <2ca3242c0904040657j6f708f34nbcb8e51fb3393559@mail.gmail.com> <2ca3242c0904040903n6c640d9ep2cfb3570028be1fa@mail.gmail.com> Message-ID: <2ca3242c0904041401l21dc61mbe9e24946e6928e3@mail.gmail.com> >>> But searchindex.js with PyStemmer is different to one with PorterStemmer. >> >> This could be a problem. ?The client-side search implemented in JavaScript >> uses exactly the same stemmer (which is necessary to be able to find all >> words). ?In short, if you can find a C implementation of the Porter stemmer >> we could include it in Sphinx as an optional extension. > > I see. > Original Porter Stemmer is here. > http://tartarus.org/~martin/PorterStemmer/ > > And that implemented in C. I'll try to make Python wrapper with swig and > compare searchindex.js. Wait for a while. I make a Python wrapper! http://bitbucket.org/methane/porterstemmer/ This is my first extension module, and still alpha version. But I can build Python document with the porterstemmer and searchindex.js is same to original. -- Naoki INADA KLab Inc. From g.brandl at gmx.net Thu Apr 9 22:13:34 2009 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 09 Apr 2009 22:13:34 +0200 Subject: [Doc-SIG] Building Python Document 30% faster. In-Reply-To: <2ca3242c0904040903n6c640d9ep2cfb3570028be1fa@mail.gmail.com> References: <2ca3242c0904040657j6f708f34nbcb8e51fb3393559@mail.gmail.com> <2ca3242c0904040903n6c640d9ep2cfb3570028be1fa@mail.gmail.com> Message-ID: Naoki INADA schrieb: >>> 2. Avoid building OptionParser many times. >>> Sphinx uses docutils.core.publish_parts() without `settings` argument >>> many times. >>> This causes building docutils.frontend.OptionParser many times and consumes >>> 29 seconds. >>> >>> 3. Avoid building NestedStateMachine many times. >>> NestedStateMachine is built and destroyed many times. >>> Recycling that SM make significant performance gain. >> >> I assume that both of this is in the second commit I see on bitbucket? Both >> look like a worthy optimization. > > Former is in bitbucket. > http://bitbucket.org/methane/sphinx-speedup/changeset/72fa0ceefcae/ Thanks, merged! When porterstemmer is mature I'd also like to include it in the Sphinx distribution as an optional extension. > And later is not in bitbucket because NestedStateMachine is not in Sphinx > but docutils. OK, let's see. I'd first try to get the patch into docutils, after passing the tests. However, since most people will be using docutils 0.4 or 0.5 it might also make sense to make a monkey-patch version for sphinx, like the traverse one. Georg From fuzzyman at voidspace.org.uk Thu Apr 16 16:20:59 2009 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Thu, 16 Apr 2009 15:20:59 +0100 Subject: [Doc-SIG] Docs redirection Message-ID: <49E73ECB.4080207@voidspace.org.uk> For a search google referred me to this page, which returns a 404. Should this page redirect? http://docs.python.org/dev/2.5/lib/module-code.html Michael -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog From g.brandl at gmx.net Thu Apr 16 20:59:05 2009 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 16 Apr 2009 20:59:05 +0200 Subject: [Doc-SIG] Docs redirection In-Reply-To: <49E73ECB.4080207@voidspace.org.uk> References: <49E73ECB.4080207@voidspace.org.uk> Message-ID: Michael Foord schrieb: > For a search google referred me to this page, which returns a 404. > Should this page redirect? > > http://docs.python.org/dev/2.5/lib/module-code.html Since that's a dev/2.5 page, I don't think it should. Google will eventually throw that out of their index. Georg From fuzzyman at voidspace.org.uk Thu Apr 16 21:01:04 2009 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Thu, 16 Apr 2009 20:01:04 +0100 Subject: [Doc-SIG] Docs redirection In-Reply-To: References: <49E73ECB.4080207@voidspace.org.uk> Message-ID: <49E78070.3070707@voidspace.org.uk> Georg Brandl wrote: > Michael Foord schrieb: > >> For a search google referred me to this page, which returns a 404. >> Should this page redirect? >> >> http://docs.python.org/dev/2.5/lib/module-code.html >> > > Since that's a dev/2.5 page, I don't think it should. Google will > eventually throw that out of their index. > Ok, fine. My only thought was that it could cause other links to be broken, but as it is a dev version hopefully there aren't too many of those. Michael > Georg > > _______________________________________________ > Doc-SIG maillist - Doc-SIG at python.org > http://mail.python.org/mailman/listinfo/doc-sig > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog