Wiki Migration Update
Hello,
It's been a couple of months or so since my last update, and I finally got round to doing some more work on the content migration from Confluence to MoinMoin.
As always, the results can be found here:
At the moment, I'm still using the archived content from April 2013, and I took the opportunity this time round to fix various parsing and translation issues, particularly with the Confluence wiki markup (not the XHTML variant that the most recent revisions of pages use), also reworking some fairly fundamental mechanisms involved in parsing the wiki markup.
Some previously mentioned fixes have been made:
http://mmwiki.boddie.org.uk/COM/Organizations_that_use_Mailman (Anchors are now generated for the headings.)
Some support for macros now exists, such as for the "anchor", "color" and "toc" macros.
Meanwhile, some things wait for more time and energy to be spent on them:
Author information is still not preserved in the page import process, but this will be tested in future.
The way comments are presented on pages still needs improving.
It might also be nice to have a list of attachments on pages that have them, and I will take a look to see how Confluence tends to present such things.
And some things won't get fixed until the hosting is changed, such as for pages ending in a question mark.
As I have mentioned previously, the source code for the converter can be found here:
http://hgweb.boddie.org.uk/ConfluenceConverter/
Please take a look at your favourite pages and let me know where improvements can be made to the conversion process.
Paul
On Jun 15, 2013, at 09:09 PM, Paul Boddie wrote:
It's been a couple of months or so since my last update, and I finally got round to doing some more work on the content migration from Confluence to MoinMoin.
As always, the results can be found here:
It's looking great Paul. I really appreciate your continued efforts here. I'd like to hear especially from Mark and the other top wiki editors what they think is still necessary before we can pull off the migration. We all know it won't be perfect, but I think it has to be just good enough that a manual cleanup is tractable. A migration would provide a good opportunity to do some much needed gardening. ;)
Here are some of my thoughts:
Once we migrate we can probably get rid of the spaces. I think that's a Confluence-ism that doesn't translate as well to Moin. That should be easy enough to do manually, right?
We'll want a moderate amount of theming to be more consistent with the web site, but the latter also is in dire need of an update.
The top link on the FAQ page doesn't work. http://mmwiki.boddie.org.uk/DOC/Frequently%20Asked%20Questions
Only the FAQ 4 page has sub-FAQ numbers. (BTW, do you know of any Moin feature to make creating and managing a FAQ nicer?)
How will we control wiki spam on the new site? Right now, we allow anyone to sign up and read, but they must request write access. When they do, we add their userid to a special group that has write to any wiki page (except the currently unused private pages). Can we have the same setup for Moin? I think it's *probably* okay to just have people re-request write access after a migration (no need to automate the user/group migration I think).
It seems to me the Moin wiki is pretty darn close. If Mark and others agree, I can start the ball rolling to request the necessary resources and DNS shuffles.
Cheers, -Barry
On Sunday 16 June 2013 17:43:48 Barry Warsaw wrote:
On Jun 15, 2013, at 09:09 PM, Paul Boddie wrote:
It's looking great Paul. I really appreciate your continued efforts here. I'd like to hear especially from Mark and the other top wiki editors what they think is still necessary before we can pull off the migration. We all know it won't be perfect, but I think it has to be just good enough that a manual cleanup is tractable. A migration would provide a good opportunity to do some much needed gardening. ;)
Indeed. Thanks for the feedback!
Here are some of my thoughts:
- Once we migrate we can probably get rid of the spaces. I think that's a Confluence-ism that doesn't translate as well to Moin. That should be easy enough to do manually, right?
The only apparent purpose of the spaces is to separate the content and to prevent name collisions, but I'm not sure there are any such collisions. Maybe I can check this.
We'll want a moderate amount of theming to be more consistent with the web site, but the latter also is in dire need of an update.
The top link on the FAQ page doesn't work. http://mmwiki.boddie.org.uk/DOC/Frequently%20Asked%20Questions
Yes, that page has a name ending in a question mark, and the way I host this publicly doesn't like that. It's a bug in mod_rewrite, and if I host it on my own local Apache instance with full control over the configuration, I don't get this problem. It will go away in future, I promise. :-)
Actually, I could have changed the page title generation to remove trailing question marks for this exercise; I already shorten page names where the filesystem would otherwise be upset (Moin needing to use the page names when storing pages).
- Only the FAQ 4 page has sub-FAQ numbers. (BTW, do you know of any Moin feature to make creating and managing a FAQ nicer?)
I'm sorry but I don't quite follow the first sentence here. All of the pages should show a list of questions in their respective section, but I see that only section 4 has numbered pages. Is that what you meant? The page names I take straight from Confluence, and you can see the same phenomenon in the existing wiki:
http://wiki.list.org/display/DOC/Frequently+Asked+Questions
Of other Moin sites providing FAQs, I can think of the Mercurial Wiki:
http://mercurial.selenic.com/wiki/FAQ
Here, they use the Include macro to bring in subpages providing each section, and the Moin TableOfContents macro is smart enough to see all the included content and make a huge TOC. They could go further and also provide edit links when including content: then, if anyone wanted to edit a section or a question, they would be able to find the link for the subpage and do so; editing the main page only really permits editing of the Include macros and little else, as seen in the raw text of the page:
http://mercurial.selenic.com/wiki/FAQ?action=raw
As you can see from the existing translation of the Mailman Wiki, it's possible to include many pages without having to name them all; take a look at the last line of the following, which is used to drag in all the comments on the page:
http://mmwiki.boddie.org.uk/COM/Home?action=raw
The ordering seems to be fixed on the page names concerned, however, making it somewhat awkward if you prefer a different ordering, but we could always provide a variant of the Include macro that supported other ordering capabilities, I imagine.
- How will we control wiki spam on the new site? Right now, we allow anyone to sign up and read, but they must request write access. When they do, we add their userid to a special group that has write to any wiki page (except the currently unused private pages). Can we have the same setup for Moin? I think it's *probably* okay to just have people re-request write access after a migration (no need to automate the user/group migration I think).
Moin is very flexible about access control, so we can almost certainly support what is needed. As for registration, I think there are extensions that require people to verify themselves using e-mail - the Debian Wiki may be using this, I think - and it's probably completely feasible to support this kind of mechanism.
As for migration, I haven't looked into this, but I don't see too many problems at least replicating the Confluence accounts, even if we can't migrate all the details.
(I think it's interesting to consider issues of authentication, and coincidentally with respect to the Summer of Code work, I've been playing with PGP-signed/encrypted interactions with Moin. So I look forward to seeing what people come up with around such interactions with Mailman.)
It seems to me the Moin wiki is pretty darn close. If Mark and others agree, I can start the ball rolling to request the necessary resources and DNS shuffles.
My main concern is that I've missed some weird markup behaviour and that we end up with pages where the markup is completely wrong throughout the history of the page (both Confluence markup and the XHTML variant that Confluence 4 and later use). But I'd like to think that I'm reaching the second half of the exercise at the very least. ;-)
Paul
P.S. I can perhaps regenerate the site to work around the question mark issue, if you want. Then, all of the content should be navigable.
On Jun 16, 2013, at 08:17 PM, Paul Boddie wrote:
- Once we migrate we can probably get rid of the spaces. I think that's a Confluence-ism that doesn't translate as well to Moin. That should be easy enough to do manually, right?
The only apparent purpose of the spaces is to separate the content and to prevent name collisions, but I'm not sure there are any such collisions. Maybe I can check this.
I'd be surprised if there were any collisions. Spaces seemed to be baked into Confluence, but for our modest wiki I never really saw much value in them.
We'll want a moderate amount of theming to be more consistent with the web site, but the latter also is in dire need of an update.
The top link on the FAQ page doesn't work. http://mmwiki.boddie.org.uk/DOC/Frequently%20Asked%20Questions
Yes, that page has a name ending in a question mark, and the way I host this publicly doesn't like that. It's a bug in mod_rewrite, and if I host it on my own local Apache instance with full control over the configuration, I don't get this problem. It will go away in future, I promise. :-)
Cool. That's another thing to take notice of: are there any deployment issues we'd need to inform the python.org admins of? We already run a couple of Moins on that infrastructure, so I'm hoping that it'll be pretty easy to bring up another one.
Actually, I could have changed the page title generation to remove trailing question marks for this exercise; I already shorten page names where the filesystem would otherwise be upset (Moin needing to use the page names when storing pages).
That's probably fine too.
- Only the FAQ 4 page has sub-FAQ numbers. (BTW, do you know of any Moin feature to make creating and managing a FAQ nicer?)
I'm sorry but I don't quite follow the first sentence here. All of the pages should show a list of questions in their respective section, but I see that only section 4 has numbered pages. Is that what you meant?
Yep.
The page names I take straight from Confluence, and you can see the same phenomenon in the existing wiki:
Yay. ;)
Of other Moin sites providing FAQs, I can think of the Mercurial Wiki:
http://mercurial.selenic.com/wiki/FAQ
Here, they use the Include macro to bring in subpages providing each section, and the Moin TableOfContents macro is smart enough to see all the included content and make a huge TOC. They could go further and also provide edit links when including content: then, if anyone wanted to edit a section or a question, they would be able to find the link for the subpage and do so; editing the main page only really permits editing of the Include macros and little else, as seen in the raw text of the page:
http://mercurial.selenic.com/wiki/FAQ?action=raw
As you can see from the existing translation of the Mailman Wiki, it's possible to include many pages without having to name them all; take a look at the last line of the following, which is used to drag in all the comments on the page:
http://mmwiki.boddie.org.uk/COM/Home?action=raw
The ordering seems to be fixed on the page names concerned, however, making it somewhat awkward if you prefer a different ordering, but we could always provide a variant of the Include macro that supported other ordering capabilities, I imagine.
Oh, I'm just pining for Guido's old FAQwizard. It was nice to be able to just add questions and answers, with a minimal amount of categorization and ordering, and then have them all collected and formatted correctly. It's not that big of a deal - I was mostly wondering how other projects maintained their FAQ.
- How will we control wiki spam on the new site? Right now, we allow anyone to sign up and read, but they must request write access. When they do, we add their userid to a special group that has write to any wiki page (except the currently unused private pages). Can we have the same setup for Moin? I think it's *probably* okay to just have people re-request write access after a migration (no need to automate the user/group migration I think).
Moin is very flexible about access control, so we can almost certainly support what is needed. As for registration, I think there are extensions that require people to verify themselves using e-mail - the Debian Wiki may be using this, I think - and it's probably completely feasible to support this kind of mechanism.
As for migration, I haven't looked into this, but I don't see too many problems at least replicating the Confluence accounts, even if we can't migrate all the details.
Sounds good, thanks.
(I think it's interesting to consider issues of authentication, and coincidentally with respect to the Summer of Code work, I've been playing with PGP-signed/encrypted interactions with Moin. So I look forward to seeing what people come up with around such interactions with Mailman.)
Me too!
It seems to me the Moin wiki is pretty darn close. If Mark and others agree, I can start the ball rolling to request the necessary resources and DNS shuffles.
My main concern is that I've missed some weird markup behaviour and that we end up with pages where the markup is completely wrong throughout the history of the page (both Confluence markup and the XHTML variant that Confluence 4 and later use). But I'd like to think that I'm reaching the second half of the exercise at the very least. ;-)
Paul
P.S. I can perhaps regenerate the site to work around the question mark issue, if you want. Then, all of the content should be navigable.
That would be fantastic, thanks. Let's wait for Mark's feedback and then we can start thinking about next steps.
Cheers, -Barry
On Monday 17 June 2013 17:13:01 Barry Warsaw wrote:
Oh, I'm just pining for Guido's old FAQwizard. It was nice to be able to just add questions and answers, with a minimal amount of categorization and ordering, and then have them all collected and formatted correctly. It's not that big of a deal - I was mostly wondering how other projects maintained their FAQ.
The FAQ wizard: those were the days! :-)
One thing I forgot about was the Python Wiki's "Asking for help" page, which seems to have been broken after the restoration of that wiki:
http://wiki.python.org/moin/Asking_for_Help
A similar thing is done by MoinMoin:
The person with the question (or bug in the latter case) makes a new page using the form given for that purpose. They then fill out the template and save the page. The page will then appear in the list of subpages on the original page.
Again, this lacks control over things like ordering, and it's arguably one step too many - it would be better to just be able to write a question into a box and submit it - so perhaps something more sophisticated would be nice. I have been working on a forms solution for Moin, and it probably isn't quite ready, but maybe I can just finish it off and give it a spin.
Paul
On 06/17/2013 08:13 AM, Barry Warsaw wrote:
That would be fantastic, thanks. Let's wait for Mark's feedback and then we can start thinking about next steps.
I am traveling and visiting family through the end of the month and haven't been following this too closely, but there is only one issue I'm concerned about and that's URLs to FAQ pages in archived list mail.
It's not a show stopper. We had the same issue when we converted from the FAQWizard, and it didn't seem to be a big deal, but here it is.
Every page (I think) in the wiki has three URLs.
E.g.
- <http://wiki.list.org/display/DOC/Frequently+Asked+Questions>
- <http://wiki.list.org/pages/viewpage.action?pageId=3604482>
- <http://wiki.list.org/x/AgA3>
I gather that in the migration to Moin, the pages will have names/URLs like 1. but with spaces instead of pluses. This is fine, and if necessary a mapping from old to new URLs can be easily constructed.
Form 2. is trickier. Confluence seems to use somewhat arbitrarily either
- or 2. in internal links, so form 2. is sometimes seen in list posts referring to FAQ articles.
Form 3. is what Confluence calls "Tiny Link: (useful for email)" and is available in the pages 'info' and (along with form 1.) in the "Link to this Page" Tool dialog. I use this all the time in list posts referring to FAQ articles.
I don't know what info is available in the Confluence dump, but it would be nice to have at least the 'tiny link' and maybe the pageId info so that mappings and maybe eventually redirections can be constructed to get from the old URLs to the new pages.
As I said, I don't think it's a show stopper, but it would be helpful.
Regarding the article numbers appearing in section 4 only: in the FAQWizard, all the articles were numbered. When Teri converted the FAQWizard, she dropped the numbers from the page titles. This proved controversial and Duncan Drury added them back, but only in section 4.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On Monday 17 June 2013 18:22:55 Mark Sapiro wrote:
Every page (I think) in the wiki has three URLs.
E.g.
- <http://wiki.list.org/display/DOC/Frequently+Asked+Questions>
- <http://wiki.list.org/pages/viewpage.action?pageId=3604482>
- <http://wiki.list.org/x/AgA3>
I gather that in the migration to Moin, the pages will have names/URLs like 1. but with spaces instead of pluses. This is fine, and if necessary a mapping from old to new URLs can be easily constructed.
I had imagined that browsers will convert to spaces in form 1, anyway, given that "+" is traditionally the encoding for a space, but it appears that Firefox may be encoding "+" and presenting the encoded URL to the server. Thus, the following fails:
http://mmwiki.boddie.org.uk/DOC/Frequently+Asked+Questions
Maybe some kind of rewrite rule could be used, or I could generate alias pages which redirect to the real pages.
Form 2. is trickier. Confluence seems to use somewhat arbitrarily either
- or 2. in internal links, so form 2. is sometimes seen in list posts referring to FAQ articles.
The page/version identifiers are used by the converter, and they even appear in the raw text of the Moin pages as a "pragma", but we'd probably extract all these correspondences and deploy some kind of mapping resource that takes an identifier and performs a redirect to the appropriate Moin page.
Form 3. is what Confluence calls "Tiny Link: (useful for email)" and is available in the pages 'info' and (along with form 1.) in the "Link to this Page" Tool dialog. I use this all the time in list posts referring to FAQ articles.
I don't know what info is available in the Confluence dump, but it would be nice to have at least the 'tiny link' and maybe the pageId info so that mappings and maybe eventually redirections can be constructed to get from the old URLs to the new pages.
As I said, I don't think it's a show stopper, but it would be helpful.
I'm not sure these tiny links are in the Confluence dump, but if there's an algorithm to generate them, then maybe we can provide a similar mapping resource.
Regarding the article numbers appearing in section 4 only: in the FAQWizard, all the articles were numbered. When Teri converted the FAQWizard, she dropped the numbers from the page titles. This proved controversial and Duncan Drury added them back, but only in section 4.
The history of such matters is always interesting. :-)
Paul
On Jun 17, 2013, at 06:57 PM, Paul Boddie wrote:
I'm not sure these tiny links are in the Confluence dump, but if there's an algorithm to generate them, then maybe we can provide a similar mapping resource.
Just as a general feature request, it would be cool if Moin supported tiny urls. I use them in email too!
-Barry
participants (3)
-
Barry Warsaw
-
Mark Sapiro
-
Paul Boddie