[Moin-user] multilingual regexes (need expert regex writer)

Tobin Cataldo tcataldo at bham.lib.al.us
Mon Feb 8 13:48:25 EST 2010


Greetings,

I am no expert, but I think positive look ahead assertion and positive 
look behind assertion may be what you are after...


#!c:\python26\python.exe

import re

list = ['OneGroup', 'GrupoDos',  'GrupoMyGroup']

for item in list:
    m = 
re.match('(?P<all>(Grupo)?(?P<key>((?<=Grupo)\S+|\S+(?=Group)))(Group)?)', 
item)
    if m:
        print m.group('all') + " : " + m.group('key') + "\n"


--Output --

OneGroup : One
GrupoDos : Dos
GrupoMyGroup : MyGroup

Please notice case #3, which matches as Grupo -> MyGroup because the 
first conditional look behind assertion evaluates to true with a 
positive look behind assertion on Grupo.


Tobin Cataldo



Mariano Absatz wrote:
> Hi,
>
> I have an old moin 1.6 wiki almost abandoned which I had configured to
> my taste...
>
> Now I need another wiki, so I installed (in another server) a fresh 1.9.1.
>
> Both wikis are somehow bilingual (Spanish/English)... not in the sense
> as dual-language-pages but most pages are in Spanish and some are in
> English.
>
> I had originally configured my old wiki to use Spanish regexes like this:
>
>     page_category_regex = u'^Categoría[A-Z]'
>     page_dict_regex = u'^Dicc(ionario)[A-Z]'
>     page_form_regex = u'^Form(ulario)?[A-Z]'
>     page_group_regex = u'^Grupo[A-Z]'
>     page_template_regex = u'^Plantilla[A-Z]'
>
>
> After sometime, since some of the people writing in English were used
> to the English versions of these, I came up with a somehow complex set
> of bi-lingual regexes:
>
>     page_category_regex = u'^Categor(y|ía)[A-Z]'
>     page_dict_regex = u'([a-z]Dict$)|(^Dicc(ionario)?[A-Z])'
>     page_form_regex = u'([a-z]Form$)|(^Form(ulario)?[A-Z])'
>     page_group_regex = u'([a-z]Group$)|(^Grupo[A-Z])'
>     page_template_regex =
> u'([a-z]Template$)|(^Plantilla[A-Z])|([a-z]Plantilla$)'
>
> which worked OK.
>
> Now, in version 1.9, I see moin is using the (?P<name>) Python regex
> extension in order to set an "all" and a "key" named groups...
>
> I don't know much of Python and even when I write a bit of "regexese",
> I'm no regex-wizard at all...
>
> My first attempt was to mimmick what I had in 1.6:
>
>     page_group_regex = ur'(?P<all>((?P<key>\S+)Group)|(Grupo(?P<key>\S+)))'
>
> However, even when I can see that the left and right side of the '|'
> are mutually exclusive, the regex-engine can't and yelds the following
> error:
> error: redefinition of group name u'key' as group 5; was group 3,
> referer: https://wiki.example.org/GrupoDeAdministradores?action=fullsearch&value=linkto%3A%22GrupoDeAdministradores%22&context=180
>
> I understand that the named groups are simply putting names to
> existing numbered groups so my regex is clearly wrong...
>
> Now I came with a workaround like this:
>
>     page_group_regex = ur'(?P<all>(Grupo)?(?P<key>\S+)(Group)?)'
>
> which seems to work, but actually matches ANY string starting with an
> uppercase letter :-(
>
> Is there any way to write a regex that matches GrupoKey or KeyGroup
> where "Key" is matched as the named group "key"?
>
>
>   





More information about the Moin-user mailing list