[Twisted-Python] Lore to Sphinx Conversion Progress Report 4

This time I think I'm gonna skip saying how I haven't gotten as much done as I would like...oh darn. Anyways, time for another gripping installment... Progress: - tables are now handled (mostly) properly, thanks to Zeth at http://commandline.org.uk/ - blockquote tags handled - much improved whitespace/indentation handling - some nicer styling thanks to Michael Thompson - I've managed to convert the docs for the 3 Divmod projects with Lore docs, though I've yet to put them up anywhere. Oustanding issues: - two files in the Lore source are not yet being converted, but it looks like one of them is about to be removed (http://twistedmatrix.com/trac/ticket/4188), and it's not really a Lore doc anyways. - due to ReST's insistence on "inline markup" being surrounded by whitespace or certain special characters, there are a lot of places where such inline markup gets jacked up, by not including whitespace in front of it. If I put whitespace in front of everything though, my indentation handling gets jacked up and about 400+ Sphinx build warning result. Not sure if I should spend the time to make whitespace handling really smart or if these should just be fixed manually post-conversion. - cite tags still need handling...not hard, just haven't decided the best way to do it yet. - Themeing/styling: still mostly a TODO, though new styling looks a lot better than the default to my eyes. I'm starting to think that eventually we might want to have 2 themes/styles...one to match the trac-based website, and one for bundled docs (docs tarballs, CHM files, etc.) - auto-generated toctree directives are currently generated in alphabetical order, which makes the "prev" and "next" links mostly make no sense - some of the Lore source files have nested "inline markup", which ReST disallows. This can be handled by: - fix the markup in the Lore source - figure out some kind of supersmart auto-conversion for every possible combination of nesting - just handle the outside level of nesting (what I'm doing now) and fix any problems manually post-conversion. - xhtml entities are not currently resolved...mostly because it makes the build take a LOOOONG time. They can be though. This shouldn't be a problem. - xhtml comments still need to be handled - <code class="API"> tags need something better..right now they are just the same as <code> tags...Sphinx has an upcoming feature coming in 1.0 that would make this nice and maintainable in the long run, but I don't know that I want to wait for it. I may try to "backport" the extension or just come up with a separate solution. - some of the generated links need fixing (e.g. links to directories, .py files) In other news: - Foolscap 0.5 was released today, which made me wonder what they use for docs...and it's Lore. I brought this up on IRC, and it was suggested by many that Lore should stick around even after the conversion according to the standard Twisted compatibility policy, to give anyone who still uses it time to migrate. This sounds like a fine idea to me. Any thoughts? As always, the lore2sphinx code is here: http://bitbucket.org/khorn/lore2sphinx/ And the sample output of the conversion process is here: http://twistedsphinx.funsize.net/ Cheers, Kevin Horn

On Jan 19, 2010, at 4:33 PM, Kevin Horn wrote:
Yay!
I don't really understand this problem. What do you mean about making whitespace handling really smart? Isn't this the sort of detail that docutils is supposed to handle for you?
That would certainly be nice, but is in no way required for the initial migration. Still, we should have a workable theme in order before we pull the trigger :).
- some of the Lore source files have nested "inline markup", which ReST disallows.
Ugh. So ReST can't do this? That's pretty lame.
- just handle the outside level of nesting (what I'm doing now) and fix any problems manually post-conversion.
I'm assuming there are very few instances of this, so that sounds fine.
Are you resolving them by downloading all the DTDs or something?
Since nobody really uses lore's API, the same compatibility policy doesn't really apply. In lore's case, I would say that the policy should be that we include it with X more releases just for packaging convenience, but stop doing maintenance immediately.

On 03:57 am, glyph@twistedmatrix.com wrote:
On IRC last night I brought up the idea that we could skip the conversion to ReST and use Sphinx with xhtml input documents. The conclusion seemed to be that this might be difficult, but no one was really sure what work would be involved in this approach. Kevin's already put a lot of effort into the conversion. It would be nice if someone else could investigate this.
As long as someone wants to do maintenance, I don't see any reason to stop them from doing it. We might mark all the Lore tickets lowest priority or otherwise signal that some subset of the "core" developers aren't interested in maintaining it.... but then, how would that be any different from the status quo? Jean-Paul

On Thu, Jan 21, 2010 at 3:03 AM, <exarkun@twistedmatrix.com> wrote:
Sphinx adds a bunch of extensions and conventions on top of docutils proper, and I have little to no idea of what would be involved in handling those issues.
If someone wants to maintain Lore, I certainly have no objection, but I don't think anyone really does. And it's not like it won't still be in the SVN repos back in the history someplace, even if it were to be "removed" from trunk. I don't have strong feelings about it either way, but my feeling is Lore should maybe be officially deprecated for a release (or 2, 3, etc.) and then "removed" from trunk, and if someone wants to maintain it themselves, they can fork it and deal with it outside the Twisted project. Or it can be "resurrected" later on if someone wants. At the very least, don't force users to install Lore along with the rest of Twisted once Lore is out of general use. Maybe it could be a separate package like web2? But if it were just up to me, I would just get rid of it, to avoid taking up developer time, cluttering up trac with open tickets, etc. Kevin Horn

On Wed, Jan 20, 2010 at 9:57 PM, Glyph Lefkowitz <glyph@twistedmatrix.com>wrote:
OK, lemme esplain...no, would take too long, lemme sum up. In some of the Lore docs, you have stuff like: some<em>stuff</em> which naively translated to ReST looks like: some*stuff* but since ReST wants whitespace and/or special characters surrounding "inline markup", docutils/Sphinx doesn't recognize it properly as markup and just sends it unmodified to the HTML (or whatever) output.. The obvious solution is to just surround all inline markup with spaces, since we're mainly targeting HTML output at the moment and a few extra spaces shouldn't hurt. Which would look like this: some *stuff* But this turns out to cause other problems, specifically in the case where inline markup occurs on the beginning of a line, and the extra space jacks up the indentation (which ReST considers significant). So the whitespace handling I was referring to was the output of whitespace from lore2sphinx.
Right, as I said "eventually".
It's a little bit lame, but I've found that it doesn't occur all that often in practice. Every markup language has it's limitations, and this is one I can live with.
That's pretty much my plan.
The lxml parser can download DTDs in order to resolve external entities, but the way things are currently set up, it would end up doing this once for each Lore source file, which ends up making my build process take much longer. Because I didn't want to deal with this during development, and since there are only like 4 external entities in all of the Lore docs, I turned this off and told lxml to ignore the errors. It may be that there is a better way to do this (maybe cache the DTDs somehow), but I haven't really bothered with it yet. It's a simple matter to turn it back on when we're ready, even if there isn't an easy/convenient way to avoid the repeated downloads of the DTDs.
Fine with me. Kevin Horn

On Jan 19, 2010, at 4:33 PM, Kevin Horn wrote:
Yay!
I don't really understand this problem. What do you mean about making whitespace handling really smart? Isn't this the sort of detail that docutils is supposed to handle for you?
That would certainly be nice, but is in no way required for the initial migration. Still, we should have a workable theme in order before we pull the trigger :).
- some of the Lore source files have nested "inline markup", which ReST disallows.
Ugh. So ReST can't do this? That's pretty lame.
- just handle the outside level of nesting (what I'm doing now) and fix any problems manually post-conversion.
I'm assuming there are very few instances of this, so that sounds fine.
Are you resolving them by downloading all the DTDs or something?
Since nobody really uses lore's API, the same compatibility policy doesn't really apply. In lore's case, I would say that the policy should be that we include it with X more releases just for packaging convenience, but stop doing maintenance immediately.

On 03:57 am, glyph@twistedmatrix.com wrote:
On IRC last night I brought up the idea that we could skip the conversion to ReST and use Sphinx with xhtml input documents. The conclusion seemed to be that this might be difficult, but no one was really sure what work would be involved in this approach. Kevin's already put a lot of effort into the conversion. It would be nice if someone else could investigate this.
As long as someone wants to do maintenance, I don't see any reason to stop them from doing it. We might mark all the Lore tickets lowest priority or otherwise signal that some subset of the "core" developers aren't interested in maintaining it.... but then, how would that be any different from the status quo? Jean-Paul

On Thu, Jan 21, 2010 at 3:03 AM, <exarkun@twistedmatrix.com> wrote:
Sphinx adds a bunch of extensions and conventions on top of docutils proper, and I have little to no idea of what would be involved in handling those issues.
If someone wants to maintain Lore, I certainly have no objection, but I don't think anyone really does. And it's not like it won't still be in the SVN repos back in the history someplace, even if it were to be "removed" from trunk. I don't have strong feelings about it either way, but my feeling is Lore should maybe be officially deprecated for a release (or 2, 3, etc.) and then "removed" from trunk, and if someone wants to maintain it themselves, they can fork it and deal with it outside the Twisted project. Or it can be "resurrected" later on if someone wants. At the very least, don't force users to install Lore along with the rest of Twisted once Lore is out of general use. Maybe it could be a separate package like web2? But if it were just up to me, I would just get rid of it, to avoid taking up developer time, cluttering up trac with open tickets, etc. Kevin Horn

On Wed, Jan 20, 2010 at 9:57 PM, Glyph Lefkowitz <glyph@twistedmatrix.com>wrote:
OK, lemme esplain...no, would take too long, lemme sum up. In some of the Lore docs, you have stuff like: some<em>stuff</em> which naively translated to ReST looks like: some*stuff* but since ReST wants whitespace and/or special characters surrounding "inline markup", docutils/Sphinx doesn't recognize it properly as markup and just sends it unmodified to the HTML (or whatever) output.. The obvious solution is to just surround all inline markup with spaces, since we're mainly targeting HTML output at the moment and a few extra spaces shouldn't hurt. Which would look like this: some *stuff* But this turns out to cause other problems, specifically in the case where inline markup occurs on the beginning of a line, and the extra space jacks up the indentation (which ReST considers significant). So the whitespace handling I was referring to was the output of whitespace from lore2sphinx.
Right, as I said "eventually".
It's a little bit lame, but I've found that it doesn't occur all that often in practice. Every markup language has it's limitations, and this is one I can live with.
That's pretty much my plan.
The lxml parser can download DTDs in order to resolve external entities, but the way things are currently set up, it would end up doing this once for each Lore source file, which ends up making my build process take much longer. Because I didn't want to deal with this during development, and since there are only like 4 external entities in all of the Lore docs, I turned this off and told lxml to ignore the errors. It may be that there is a better way to do this (maybe cache the DTDs somehow), but I haven't really bothered with it yet. It's a simple matter to turn it back on when we're ready, even if there isn't an easy/convenient way to avoid the repeated downloads of the DTDs.
Fine with me. Kevin Horn
participants (3)
-
exarkun@twistedmatrix.com
-
Glyph Lefkowitz
-
Kevin Horn