Re: [Mailman-Developers] Wiki woes
Hello,
I was just reading the discussion about Wiki migration from Confluence to MoinMoin on this list (mailman-developers). When this topic was first raised, a mailing list was set up along with some other resources for collaboration:
http://lists.bjdean.id.au/cgi-bin/mailman/listinfo/mmwiki http://moinmo.in/ConfluenceConverter
I was under the impression that people would be following the dedicated mailing list for this work (mmwiki), but it would appear that this is not the case. The last message I sent on this topic was this one:
http://lists.bjdean.id.au/pipermail/mmwiki/2012q3/000095.html
In fact, it touches upon the very issue that seems to be causing problems now: Confluence appears to have changed and useful functionality has been removed. Although this affects Confluence users in a negative way, it may have an impact on the exported form used in any migration work as well.
If you would like a summary of the work I have done so far to migrate the content, take a look at the following mail:
http://lists.bjdean.id.au/pipermail/mmwiki/2012q2/000092.html
There's another mail (cross-posted to moin-user) which references the content repository used as the basis of this effort, too:
http://lists.bjdean.id.au/pipermail/mmwiki/2012q2/000094.html
As far as I know, the volunteer to whom I was responding has not done any work on this.
In short, after Bradley Dean's initial research, I have written something which can convert Confluence content, although there will undoubtedly be things that need finishing, and if you take a look at the ConfluenceConverter pages, you may also be able to identify functionality that needs deploying or implementing in MoinMoin depending on the project's needs.
So, I would recommend that you don't start from scratch on this. If you like, I can even make an example of the migrated content available on the Internet so that you can see what needs doing, but you will, of course, need to let me know. I was going to mail this list (mailman-developers), given that the other one (mmwiki) seems dormant, and apologise for not having done so earlier, but this initiative really needs input from the actual Wiki users to be worthwhile, in my opinion.
Please let me know if you want to take this work any further.
Paul
Hi Paul,
On Dec 11, 2012, at 11:15 PM, Paul Boddie wrote:
I was under the impression that people would be following the dedicated mailing list for this work (mmwiki), but it would appear that this is not the case.
Dang. You probably made me/us aware of the mailing list at one time, and if so, I apologize for not engaging on it.
In fact, it touches upon the very issue that seems to be causing problems now: Confluence appears to have changed and useful functionality has been removed. Although this affects Confluence users in a negative way, it may have an impact on the exported form used in any migration work as well.
Again, darn. I don't know if it helps but for this particular case, we can get you access to whatever data you need, that might not be publicly available via wiki.list.org.
In short, after Bradley Dean's initial research, I have written something which can convert Confluence content, although there will undoubtedly be things that need finishing, and if you take a look at the ConfluenceConverter pages, you may also be able to identify functionality that needs deploying or implementing in MoinMoin depending on the project's needs.
So, I would recommend that you don't start from scratch on this. If you like, I can even make an example of the migrated content available on the Internet so that you can see what needs doing, but you will, of course, need to let me know. I was going to mail this list (mailman-developers), given that the other one (mmwiki) seems dormant, and apologise for not having done so earlier, but this initiative really needs input from the actual Wiki users to be worthwhile, in my opinion.
Please let me know if you want to take this work any further.
Mark and Terri probably should weigh in, but my own feeling is that the conversion doesn't have to be of the highest fidelity. E.g. if it gets us 80-90% of the way, that's probably good enough. As Terri implies, I do think the wiki could use a good gardening, probably splitting content for MM2 and MM3 among other things. I'm loathe to do much gardening on the current wiki if we're going to make a switch.
If this is something you're interested in helping with, it would certainly be greatly appreciated.
Is Moin 2.0 far enough along that we can just start using that?
(I love that you'll be able to author pages in reST. :)
We also need hosting, but I think we've had offers for that (sorry, I can't remember the details, but they're in the list archives I'm sure). Once we have hosting, I can ask Matt and John to give us some A records.
Cheers, -Barry
On Wednesday 12 December 2012 04:17:54 Barry Warsaw wrote:
Hi Paul,
On Dec 11, 2012, at 11:15 PM, Paul Boddie wrote:
I was under the impression that people would be following the dedicated mailing list for this work (mmwiki), but it would appear that this is not the case.
Dang. You probably made me/us aware of the mailing list at one time, and if so, I apologize for not engaging on it.
Actually, it was Bradley Dean who tried to get MoinMoin developers involved and who set up the list. I found the following message on this list about it:
http://mail.python.org/pipermail/mailman-developers/2011-July/021509.html
In fact, it touches upon the very issue that seems to be causing problems now: Confluence appears to have changed and useful functionality has been removed. Although this affects Confluence users in a negative way, it may have an impact on the exported form used in any migration work as well.
Again, darn. I don't know if it helps but for this particular case, we can get you access to whatever data you need, that might not be publicly available via wiki.list.org.
The first priority is to find out whether Confluence content can still be exported as XML. The data dumps that I originally used were XML serialisations of Hibernate databases, but given the user-visible changes from Confluence 3 to 4, I would need reassuring that Atlassian haven't gone and changed the back-end stuff as well.
To investigate this, I have just been attempting to use the "XML export" function from the "Advanced" tab of each space on wiki.list.org. Here's the link to the COM space's "XML export" function:
http://wiki.list.org/spaces/exportspacexml.action?key=COM
This did yield an export file that appears to contain data in a similar format to the original data dumps I managed to obtain. I don't know whether the files I have exported are comprehensive because I'm not even a user of the Wiki, let alone an administrator or someone with privileges, but maybe all the pages are public anyway.
Aside from the general structure of the exported files, I can see that the markup has been preserved in the textual content, but only for revisions before the Confluence 4 migration. Migrated markup is actually in some XHTML-like format, which is in some ways easier to work with than the original markup, but it will obviously need a different translator than the one handling the original markup.
[...]
Please let me know if you want to take this work any further.
Mark and Terri probably should weigh in, but my own feeling is that the conversion doesn't have to be of the highest fidelity. E.g. if it gets us 80-90% of the way, that's probably good enough. As Terri implies, I do think the wiki could use a good gardening, probably splitting content for MM2 and MM3 among other things. I'm loathe to do much gardening on the current wiki if we're going to make a switch.
The aim would be to try and get the conversion as high-fidelity as possible with some experimentation around editing and playing with any required Moin features, and then we'd convert the whole thing one last time.
Some discussion about what should converted can be found here:
http://moinmo.in/ConfluenceConverter/DevelopmentNotes/TransformProcess
Lacking from my current converter is any handling of attachments or identities, with the latter probably requiring some special modification of the import code to write specific user identities into the edit log.
Confluence has some weird functionality that doesn't always map to Moin concepts, like spaces, blog posts and page comments, but as I note on the above page these can be accommodated in Moin according to various page-naming conventions.
If this is something you're interested in helping with, it would certainly be greatly appreciated.
Is Moin 2.0 far enough along that we can just start using that?
Not really. It's something you can use, but there are still things that need to settle down in Moin 2 and there is obviously functionality that isn't yet ported. I aim to port much of my own work to Moin 2 at some point, but there's still a lot of mileage in Moin 1.x. (It's like Python 2 versus Python 3.)
(I love that you'll be able to author pages in reST. :)
You lose some of the more interesting features doing that, though, I think.
We also need hosting, but I think we've had offers for that (sorry, I can't remember the details, but they're in the list archives I'm sure). Once we have hosting, I can ask Matt and John to give us some A records.
For testing purposes, I can easily host this myself, but you'll obviously have to consider where to put the final Wiki. It's possible that the FSF already host things using MoinMoin, so maybe it would fit into their existing infrastructure, but this is something for you to decide.
For now, I have made the Wiki content available at the following location:
As noted, all current page revisions will look wrong, but historical (before Confluence 4) revisions should have been translated to a certain extent.
Paul
Paul Boddie writes:
(I love that you'll be able to author pages in reST. :)
You lose some of the more interesting features doing that, though, I think.
Like what?
And of course if you don't need those features you can use ReST, right? That is, there's some way to specify markup language per page, no?
I'm personally about +5 on having a ReST option. If available, I doubt I'd ever use anything else on a software project "dev and support" wiki.
On Thursday 13 December 2012 07:04:27 Stephen J. Turnbull wrote:
Paul Boddie writes:
(I love that you'll be able to author pages in reST. :)
You lose some of the more interesting features doing that, though, I think.
Like what?
I was thinking about macros, but you can apparently use them via a special directive.
And of course if you don't need those features you can use ReST, right? That is, there's some way to specify markup language per page, no?
Yes, you can specify "#format rst" at the top of the page to set the default syntax and also "#!rst" in a page region to set the syntax used within that region:
http://moinmo.in/HelpOnParsers/ReStructuredText
I'm personally about +5 on having a ReST option. If available, I doubt I'd ever use anything else on a software project "dev and support" wiki.
Well, it would be interesting to know which Confluence features are actively used so that we can focus on supporting their equivalents in Moin.
Paul
Paul Boddie wrote:
For now, I have made the Wiki content available at the following location:
As noted, all current page revisions will look wrong, but historical (before Confluence 4) revisions should have been translated to a certain extent.
I like it. And, I note that for a large majority of pages there have been no changes since the "Migrated to Confluence 4.0" change, so the current-1 rev of the page is good.
I haven't looked at much, but the biggest problem I see is for example <http://mmwiki.boddie.org.uk/DOC/Frequently%20Asked%20Questions?action=recall&rev=5> versus <http://wiki.list.org/display/DOC/Frequently+Asked+Questions>. The issue is the FAQ in particular relies on the listing of child pages to make a hierarchical TOC, and the converted pages have lost the child pages listing.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On Thursday 13 December 2012 18:59:40 Mark Sapiro wrote:
Paul Boddie wrote:
For now, I have made the Wiki content available at the following location:
As noted, all current page revisions will look wrong, but historical (before Confluence 4) revisions should have been translated to a certain extent.
I like it. And, I note that for a large majority of pages there have been no changes since the "Migrated to Confluence 4.0" change, so the current-1 rev of the page is good.
I actually don't think it would be hard to migrate the XHTML-like content, but if the bulk of the content is more readily translated, then we can avoid a lot of work.
I haven't looked at much, but the biggest problem I see is for example <http://mmwiki.boddie.org.uk/DOC/Frequently%20Asked%20Questions?action=reca ll&rev=5> versus <http://wiki.list.org/display/DOC/Frequently+Asked+Questions>. The issue is the FAQ in particular relies on the listing of child pages to make a hierarchical TOC, and the converted pages have lost the child pages listing.
Yes, the child pages are recorded, but they need to appear below their parents in the page hierarchy. Another problem I've found is related to pages with question marks in their titles. For example:
http://mmwiki.boddie.org.uk/DOC/Where%20do%20I%20go%20for%20help%3F
For some reason Moin encodes these titles correctly but doesn't interpret them properly, and this could be a legitimate bug. I'll look into this.
Paul
On 12-12-13 1:48 PM, Paul Boddie wrote:
I actually don't think it would be hard to migrate the XHTML-like content, but if the bulk of the content is more readily translated, then we can avoid a lot of work.
I'm making dumps of all the confluence spaces we've got (You can't dump the whole wiki at once easily but you *can* apparently dump each space.). I'll dump all the public ones here:
https://www.dropbox.com/sh/tztu8sk6oet69oz/wP0_L7kcf7
(We've also got one private "cabal" space, but it looks like everything in it is really out of date, so I've done a dump for posterity but there's no point in spending time converting it.)
We can also do dumps as html and pdf, but I'm pretty sure xml will be more useful for data conversion. I don't know off the top of my head if they include change history or just the most recent content, but I do know that they contain attachments and comments on the pages.
On Saturday 15 December 2012 18:56:10 Terri Oda wrote:
On 12-12-13 1:48 PM, Paul Boddie wrote:
I actually don't think it would be hard to migrate the XHTML-like content, but if the bulk of the content is more readily translated, then we can avoid a lot of work.
I'm making dumps of all the confluence spaces we've got (You can't dump the whole wiki at once easily but you *can* apparently dump each space.). I'll dump all the public ones here:
Yes, I dumped each space to see what the Confluence 4 migration had done. Unless you get to download more because you have an account, I suppose it isn't difficult for me (or anyone else) to obtain these dumps when they need them.
(We've also got one private "cabal" space, but it looks like everything in it is really out of date, so I've done a dump for posterity but there's no point in spending time converting it.)
We can also do dumps as html and pdf, but I'm pretty sure xml will be more useful for data conversion. I don't know off the top of my head if they include change history or just the most recent content, but I do know that they contain attachments and comments on the pages.
The XML dumps are serialisations of the Hibernate datastore, and they contain all the page versions plus comments and other related data. The attachments appear as separate files in the archive but will be referenced by the XML file. The page content itself is in an XHTML-like form for Confluence 4 but the Confluence Wiki markup for Confluence 3 and earlier.
Paul
On Thursday 13 December 2012 21:48:25 Paul Boddie wrote:
Another problem I've found is related to pages with question marks in their titles. For example:
http://mmwiki.boddie.org.uk/DOC/Where%20do%20I%20go%20for%20help%3F
For some reason Moin encodes these titles correctly but doesn't interpret them properly, and this could be a legitimate bug. I'll look into this.
As far as this is concerned, it appears to be a problem with mod_rewrite that only applies to the way I have hosted the content:
http://moinmo.in/MoinMoinBugs/CannotAccessPagesEndingWithAQuestionMark https://issues.apache.org/bugzilla/show_bug.cgi?id=49642
It shouldn't be a problem for a Wiki deployed using the Alias, ScriptAlias or other full-privilege directives.
Paul
On Dec 13, 2012, at 01:47 AM, Paul Boddie wrote:
The first priority is to find out whether Confluence content can still be exported as XML. The data dumps that I originally used were XML serialisations of Hibernate databases, but given the user-visible changes from Confluence 3 to 4, I would need reassuring that Atlassian haven't gone and changed the back-end stuff as well.
The admin page provides a link to generate a backup in zipped XML of the entire site plus attachments. I don't know what's in that because I can't download the backup. I'll have to make a support request to either get the backup or expose the download link on the admin pages.
Confluence has some weird functionality that doesn't always map to Moin concepts, like spaces, blog posts and page comments, but as I note on the above page these can be accommodated in Moin according to various page-naming conventions.
Cool. I don't necessarily think we need to keep the spaces distinction. I stopped using the blog feature a while ago.
If this is something you're interested in helping with, it would certainly be greatly appreciated.
Is Moin 2.0 far enough along that we can just start using that?
Not really. It's something you can use, but there are still things that need to settle down in Moin 2 and there is obviously functionality that isn't yet ported. I aim to port much of my own work to Moin 2 at some point, but there's still a lot of mileage in Moin 1.x. (It's like Python 2 versus Python 3.)
Or Mailman 2 vs. Mailman 3? I wonder what that's like... :)
For testing purposes, I can easily host this myself, but you'll obviously have to consider where to put the final Wiki. It's possible that the FSF already host things using MoinMoin, so maybe it would fit into their existing infrastructure, but this is something for you to decide.
We'll work that out. I definitely think we want something in the list.org domain.
For now, I have made the Wiki content available at the following location:
As noted, all current page revisions will look wrong, but historical (before Confluence 4) revisions should have been translated to a certain extent.
Cool, thanks for moving this forward. I'll make a support request to get the backup.
Cheers, -Barry
Hello,
Here's a quick update on the ConfluenceConverter effort. A few issues were raised after I let you know about the work originally done, and I have investigated them to yield the following results.
The availability of XML exports does not seem to be a problem. It doesn't look like my access to the export tools is any different from registered users, and the tools seem to give me enough to work with anyway.
The nature of Confluence 4 markup appears to be an XHTML variant which is in some ways easier to parse than the previous markup (in that the tokenisation is at least done by an XML parser), although the normalisation of whitespace is a bit tricky (as is often the case with XML dialects). Possibly the bulk of the work with this is to assess the use and nature of the markup and to write reasonable translations.
The notion of child pages does not exist in MoinMoin, but it is possible to construct lists of them and to add them to parent pages so that the relationships are at least recorded.
Similarly, MoinMoin does not have comment items on pages in the way Confluence does, but comments can be represented as subpages and then these pages can be included in the owner page.
I noticed that pages with question marks in their names weren't being correctly served, but this is actually a mod_rewrite issue specific to the way I am currently hosting the test site. My own local site does not exhibit the problem, nor should any decent way of hosting the site.
The current state of the conversion can be seen here:
The principal updates to this test site are that revisions migrated to Confluence 4 markup are translated, child pages are referenced from owner pages, and comments are included. However...
Some pages will look very wrong because things like tables are not yet translated. I also have to fine-tune the translation of links and combine the logic for both markup types.
Some child pages don't seem to be available, but this is a combination of the question mark issue (see above) and probably my lack of page name quoting when incorporating child page information.
Comments are not well presented at the moment, and I aim to investigate a few ways of improving their appearance.
For now, this will have to be all the work I am able to do on this project, but I will resume my efforts again in January. I hope that everyone feels reassured that this project is no longer forgotten. :-)
Paul
P.S. Some resources:
http://moinmo.in/ConfluenceConverter http://moinmo.in/ConfluenceConverter/DevelopmentNotes/TaskList
On Dec 21, 2012, at 12:22 AM, Paul Boddie wrote:
For now, this will have to be all the work I am able to do on this project, but I will resume my efforts again in January. I hope that everyone feels reassured that this project is no longer forgotten. :-)
This is really great news Paul, thanks for all your efforts. Hopefully in 2013 we can actually get migrated over. I'll start the ball rolling with getting a site we can host it on.
Cheers, -Barry
Paul Boddie writes:
I was just reading the discussion about Wiki migration from Confluence to MoinMoin on this list (mailman-developers). When this topic was first raised, a mailing list was set up
IMO, this list is the appropriate place, and one shouldn't hesitate to post here about it as long as subjects are appropriate. If the traffic about the wiki migration starts to dominate, make a [topic] for it.
participants (5)
-
Barry Warsaw
-
Mark Sapiro
-
Paul Boddie
-
Stephen J. Turnbull
-
Terri Oda