[Moin-user] multilingual regexes (need expert regex writer)

Mariano Absatz el.baby at gmail.com
Mon Feb 8 09:31:45 EST 2010


Hi,

I have an old moin 1.6 wiki almost abandoned which I had configured to
my taste...

Now I need another wiki, so I installed (in another server) a fresh 1.9.1.

Both wikis are somehow bilingual (Spanish/English)... not in the sense
as dual-language-pages but most pages are in Spanish and some are in
English.

I had originally configured my old wiki to use Spanish regexes like this:

    page_category_regex = u'^Categoría[A-Z]'
    page_dict_regex = u'^Dicc(ionario)[A-Z]'
    page_form_regex = u'^Form(ulario)?[A-Z]'
    page_group_regex = u'^Grupo[A-Z]'
    page_template_regex = u'^Plantilla[A-Z]'


After sometime, since some of the people writing in English were used
to the English versions of these, I came up with a somehow complex set
of bi-lingual regexes:

    page_category_regex = u'^Categor(y|ía)[A-Z]'
    page_dict_regex = u'([a-z]Dict$)|(^Dicc(ionario)?[A-Z])'
    page_form_regex = u'([a-z]Form$)|(^Form(ulario)?[A-Z])'
    page_group_regex = u'([a-z]Group$)|(^Grupo[A-Z])'
    page_template_regex =
u'([a-z]Template$)|(^Plantilla[A-Z])|([a-z]Plantilla$)'

which worked OK.

Now, in version 1.9, I see moin is using the (?P<name>) Python regex
extension in order to set an "all" and a "key" named groups...

I don't know much of Python and even when I write a bit of "regexese",
I'm no regex-wizard at all...

My first attempt was to mimmick what I had in 1.6:

    page_group_regex = ur'(?P<all>((?P<key>\S+)Group)|(Grupo(?P<key>\S+)))'

However, even when I can see that the left and right side of the '|'
are mutually exclusive, the regex-engine can't and yelds the following
error:
error: redefinition of group name u'key' as group 5; was group 3,
referer: https://wiki.example.org/GrupoDeAdministradores?action=fullsearch&value=linkto%3A%22GrupoDeAdministradores%22&context=180

I understand that the named groups are simply putting names to
existing numbered groups so my regex is clearly wrong...

Now I came with a workaround like this:

    page_group_regex = ur'(?P<all>(Grupo)?(?P<key>\S+)(Group)?)'

which seems to work, but actually matches ANY string starting with an
uppercase letter :-(

Is there any way to write a regex that matches GrupoKey or KeyGroup
where "Key" is matched as the named group "key"?


-- 
Mariano Absatz - El Baby
www.clueless.com.ar




More information about the Moin-user mailing list