[Tutor] miniwiki 1.3 BETA bugs

Barnaby Scott bds at waywood.co.uk
Mon Feb 26 13:12:32 CET 2007


Kirk Z Bailey wrote:
> RE leaves me totally confuzzzeddded. Yep, so confuised I'm having 
> trouble spelling it. Sp this one line will replace both words and give a 
> reliable result?
> 
> Barnaby Scott wrote:
> [snip]
>> No idea if it has anything to do with your problem, but it struck me 
>> that the iswikiword() function (and processword() which seems to be a 
>> helper for it) could be replaced with one line, and it would be reliable!
>>
>> def iswikiword(word):
>>         return bool(re.match('^([A-Z][a-z]+){2,}$', word))
>>
>> Of course you need to import re, but that seems a small price to pay!
>>
>> HTH
>>
>> Barnaby Scott
>>
>>
> 

As far as I know this is 100% reliable - at least it works for me 
(www.waywood.co.uk/MonkeyWiki/). I suggest you test the function to your 
own satisfaction - feed it tricky 'possible' WikiWords, and see how it does!

I know what you mean - RE syntax is an unruly beast to try and wrestle 
with, but I *so* glad I made the effort. I don't claim to be anything 
like an expert, but I now find it very useful indeed.

Here's how the function's statement works in case you're interested:

bool(re.match('^([A-Z][a-z]+){2,}$', word))

re.match() will look for a match for us, according to the RE given as
the first argument, and the string you want to match against as the second

^ means we demand that the pattern matches from the beginning of the 
string to be tested - we don't want to say yes to 
anEmbeddedWikiWordLikeThis. (In fact because we are using re.match 
instead of re.search this is not strictly necessary, but makes it clearer)

([A-Z][a-z]+) means we want a group of letters, starting with a one in 
the range [A-Z] i.e. a capital, followed by [a-z]+ , meaning one or more 
lowercase letters ('one or more' is specified by the +). That whole 
pattern is parenthesised because we want the next element to refer to 
the whole thing

{2,} means we want a match only if our preceding pattern (i.e. a 
capitalised word) occurs a minimum of 2 times in a row, and a maximum of 
- well, we don't want to specify a maximum, so we leave it out. 
(YouMightInFactWantToSpecifyTheMaximumNumberOfWordsThatAreAllowedToAppearInYourWikiLinksToStopPeopleDoingSillyThingsLikeThis).

$ means we want a match only if the pattern reaches the end of the test 
string - i.e. we don't want to match a WordLikeThis62734.

As for bool() - nothing to do with RE, but if a match occurs, the result 
returned by re.match() is a MatchObject instance, otherwise None. I have 
used bool() to convert these two possible results into True or False 
though I guess this is not strictly necessary - the truth testing would 
happen implicitly outside the function anyway. However it seems right to 
return a boolean if that's what the function's obvious intent is.

HTH


Barnaby Scott









More information about the Tutor mailing list