[Tutor] miniwiki 1.3 BETA bugs
Barnaby Scott
bds at waywood.co.uk
Mon Feb 26 13:12:32 CET 2007
Kirk Z Bailey wrote:
> RE leaves me totally confuzzzeddded. Yep, so confuised I'm having
> trouble spelling it. Sp this one line will replace both words and give a
> reliable result?
>
> Barnaby Scott wrote:
> [snip]
>> No idea if it has anything to do with your problem, but it struck me
>> that the iswikiword() function (and processword() which seems to be a
>> helper for it) could be replaced with one line, and it would be reliable!
>>
>> def iswikiword(word):
>> return bool(re.match('^([A-Z][a-z]+){2,}$', word))
>>
>> Of course you need to import re, but that seems a small price to pay!
>>
>> HTH
>>
>> Barnaby Scott
>>
>>
>
As far as I know this is 100% reliable - at least it works for me
(www.waywood.co.uk/MonkeyWiki/). I suggest you test the function to your
own satisfaction - feed it tricky 'possible' WikiWords, and see how it does!
I know what you mean - RE syntax is an unruly beast to try and wrestle
with, but I *so* glad I made the effort. I don't claim to be anything
like an expert, but I now find it very useful indeed.
Here's how the function's statement works in case you're interested:
bool(re.match('^([A-Z][a-z]+){2,}$', word))
re.match() will look for a match for us, according to the RE given as
the first argument, and the string you want to match against as the second
^ means we demand that the pattern matches from the beginning of the
string to be tested - we don't want to say yes to
anEmbeddedWikiWordLikeThis. (In fact because we are using re.match
instead of re.search this is not strictly necessary, but makes it clearer)
([A-Z][a-z]+) means we want a group of letters, starting with a one in
the range [A-Z] i.e. a capital, followed by [a-z]+ , meaning one or more
lowercase letters ('one or more' is specified by the +). That whole
pattern is parenthesised because we want the next element to refer to
the whole thing
{2,} means we want a match only if our preceding pattern (i.e. a
capitalised word) occurs a minimum of 2 times in a row, and a maximum of
- well, we don't want to specify a maximum, so we leave it out.
(YouMightInFactWantToSpecifyTheMaximumNumberOfWordsThatAreAllowedToAppearInYourWikiLinksToStopPeopleDoingSillyThingsLikeThis).
$ means we want a match only if the pattern reaches the end of the test
string - i.e. we don't want to match a WordLikeThis62734.
As for bool() - nothing to do with RE, but if a match occurs, the result
returned by re.match() is a MatchObject instance, otherwise None. I have
used bool() to convert these two possible results into True or False
though I guess this is not strictly necessary - the truth testing would
happen implicitly outside the function anyway. However it seems right to
return a boolean if that's what the function's obvious intent is.
HTH
Barnaby Scott
More information about the Tutor
mailing list