MoinMoin WikiName and python regexes
Bengt Richter
bokr at oz.net
Sun Jun 26 13:47:21 EDT 2005
On Wed, 8 Jun 2005 09:49:51 -0600, "Ara.T.Howard" <Ara.T.Howard at noaa.gov> wrote:
>
>hi-
>
>i know nada about python so please forgive me if this is way off base. i'm
>trying to fix a bug in MoinMoin whereby
>
> WordsWithTwoCapsInARowLike
> ^^
> ^^
> ^^
>
>do not become WikiNames. this is because the the wikiname pattern is
>basically
>
> /([A-Z][a-z]+){2,}/
>
>but should be (IMHO)
>
> /([A-Z]+[a-z]+){2,}/
That would take care of the example above, but does it change an official spec?
>
>however, the way the patterns are constructed like
>
> word_rule = ur'(?:(?<![%(l)s])|^)%(parent)s(?:%(subpages)s(?:[%(u)s][%(l)s]+){2,})+(?![%(u)s%(l)s]+)' % {
> 'u': config.chars_upper,
> 'l': config.chars_lower,
> 'subpages': config.allow_subpages and (wikiutil.CHILD_PREFIX + '?') or '',
> 'parent': config.allow_subpages and (ur'(?:%s)?' % re.escape(PARENT_PREFIX)) or '',
> }
>
>
>and i'm not that familiar with python syntax. to me this looks like a map
>used to bind variables into the regex - or is it binding into a string then
>compiling that string into a regex - regexs don't seem to be literal objects
>in pythong AFAIK... i'm thinking i need something like
>
> word_rule = ur'(?:(?<![%(l)s])|^)%(parent)s(?:%(subpages)s(?:[%(u)s]+[%(l)s]+){2,})+(?![%(u)s%(l)s]+)' % {
> ^
> ^
> ^
> 'u': config.chars_upper,
> 'l': config.chars_lower,
> 'subpages': config.allow_subpages and (wikiutil.CHILD_PREFIX + '?') or '',
> 'parent': config.allow_subpages and (ur'(?:%s)?' % re.escape(PARENT_PREFIX)) or '',
> }
>
>and this seems to work - but i'm wondering what the 's' in '%(u)s' implies?
>obviously the u is the char range (unicode?)... but what's the 's'?
'u' doesn't stand for unicode here. It is the key to look up config.chars_upper from the dict. That could
be unicode, and probably is. The 's' is the final part of a formatting spec which says how to convert the
data looked up, and 's' is for string, which doesn't change string data (unless, and UIAM, a conversion to unicode is required).
All of the above is making use of the % operator of strings, as in the expression
fmt % data
where fmt is a string containing ordinary characters and formatting specs in the form
of substrings escaped by a leading character '%'. The formatting specs take two basic
alternative forms: %<spec> or %(name)<spec>. If any '%' is followed by a parenthesized name,
as in '%(u)s' it means that the data to be formatted is retrieved from data['u'] for the latter example.
If there is no parenthesized name, the data is retrieved from data[i] where data must be a tuple and
i is the positional count of format specs in fmt. In some cases where there is no ambiguity,
and there is only one datum, data[0] may be written as the non-tuple value expression, e.g.,
instead of (123,) that data could be written as (123,)[0] or plain 123.
In the word_rule above, %(u)s uses 'u' as a key to get data from the dictionary { 'u': config.chars_upper, ...}
to substitute in the [%(u)s] as a string (that's what the 's' specifies), so config.chars_upper will
presumably have had a string value such as u'ABC..Z' and that will then be inserted in place of the %(u)s to
get u'...[ABC..Z]...' (if fmt is unicode, the resulting string will be unicode, UIAM)
>
>i'm looking at
>
> http://docs.python.org/lib/re-syntax.html
> http://www.amk.ca/python/howto/regex/
>
See also
http://www.python.org/doc/current/lib/typesseq-strings.html
(which IMO should be easier to find, but if you click on the index square
at the top right of any library reference page, you can see a "%formatting" link)
>and coming up dry. sorry i don't have more time to rtfm - just want to
>implement this simple fix and get on to fcgi configuration! ;-)
>
>cheers.
>
>-a
>--
>===============================================================================
>| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
>| phone :: 303.497.6469
>| My religion is very simple. My religion is kindness.
>| --Tenzin Gyatso
>===============================================================================
>
Regards,
Bengt Richter
More information about the Python-list
mailing list