MoinMoin WikiName and python regexes

Bengt Richter bokr at oz.net
Sun Jun 26 13:47:21 EDT 2005


On Wed, 8 Jun 2005 09:49:51 -0600, "Ara.T.Howard" <Ara.T.Howard at noaa.gov> wrote:

>
>hi-
>
>i know nada about python so please forgive me if this is way off base.  i'm
>trying to fix a bug in MoinMoin whereby
>
>   WordsWithTwoCapsInARowLike
>                     ^^
>                     ^^
>                     ^^
>
>do not become WikiNames.  this is because the the wikiname pattern is
>basically
>
>   /([A-Z][a-z]+){2,}/
>
>but should be (IMHO)
>
>   /([A-Z]+[a-z]+){2,}/
That would take care of the example above, but does it change an official spec?

>
>however, the way the patterns are constructed like
>
>   word_rule = ur'(?:(?<![%(l)s])|^)%(parent)s(?:%(subpages)s(?:[%(u)s][%(l)s]+){2,})+(?![%(u)s%(l)s]+)' % {
>       'u': config.chars_upper,
>       'l': config.chars_lower,
>       'subpages': config.allow_subpages and (wikiutil.CHILD_PREFIX + '?') or '',
>       'parent': config.allow_subpages and (ur'(?:%s)?' % re.escape(PARENT_PREFIX)) or '',
>   }
>
>
>and i'm not that familiar with python syntax.  to me this looks like a map
>used to bind variables into the regex - or is it binding into a string then
>compiling that string into a regex - regexs don't seem to be literal objects
>in pythong AFAIK...  i'm thinking i need something like
>
>   word_rule = ur'(?:(?<![%(l)s])|^)%(parent)s(?:%(subpages)s(?:[%(u)s]+[%(l)s]+){2,})+(?![%(u)s%(l)s]+)' % {
>                                                                       ^
>                                                                       ^
>                                                                       ^
>       'u': config.chars_upper,
>       'l': config.chars_lower,
>       'subpages': config.allow_subpages and (wikiutil.CHILD_PREFIX + '?') or '',
>       'parent': config.allow_subpages and (ur'(?:%s)?' % re.escape(PARENT_PREFIX)) or '',
>   }
>
>and this seems to work - but i'm wondering what the 's' in '%(u)s' implies?
>obviously the u is the char range (unicode?)...  but what's the 's'?
'u' doesn't stand for unicode here. It is the key to look up config.chars_upper from the dict. That could
be unicode, and probably is. The 's' is the final part of a formatting spec which says how to convert the
data looked up, and 's' is for string, which doesn't change string data (unless, and UIAM, a conversion to unicode is required).

All of the above is making use of the % operator of strings, as in the expression
    fmt % data
where fmt is a string containing ordinary characters and formatting specs in the form
of substrings escaped by a leading character '%'. The formatting specs take two basic
alternative forms: %<spec> or %(name)<spec>. If any '%' is followed by a parenthesized name,
as in '%(u)s' it means that the data to be formatted is retrieved from data['u'] for the latter example.
If there is no parenthesized name, the data is retrieved from data[i] where data must be a tuple and
i is the positional count of format specs in fmt. In some cases where there is no ambiguity,
and there is only one datum, data[0] may be written as the non-tuple value expression, e.g.,
instead of (123,) that data could be written as (123,)[0] or plain 123.

In the word_rule above, %(u)s uses 'u' as a key to get data from the dictionary { 'u': config.chars_upper, ...}
to substitute in the [%(u)s] as a string (that's what the 's' specifies), so config.chars_upper will
presumably have had a string value such as u'ABC..Z' and that will then be inserted in place of the %(u)s to
get u'...[ABC..Z]...'  (if fmt is unicode, the resulting string will be unicode, UIAM)

>
>i'm looking at
>
>   http://docs.python.org/lib/re-syntax.html
>   http://www.amk.ca/python/howto/regex/
>
See also
    http://www.python.org/doc/current/lib/typesseq-strings.html
(which IMO should be easier to find, but if you click on the index square
at the top right of any library reference page, you can see a "%formatting" link)

>and coming up dry.  sorry i don't have more time to rtfm - just want to
>implement this simple fix and get on to fcgi configuration! ;-)
>
>cheers.
>
>-a
>-- 
>===============================================================================
>| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
>| phone :: 303.497.6469
>| My religion is very simple.  My religion is kindness.
>| --Tenzin Gyatso
>===============================================================================
>

Regards,
Bengt Richter



More information about the Python-list mailing list