[Tutor] is this a state machine? [designing a program to detect
WikiWords]
Kirk Bailey
idiot1 at netzero.net
Thu Aug 7 22:52:22 EDT 2003
Again I am impressed by my own ignorance of my craft. And by the wisdom of my
betters.
Danny Yoo wrote:
>
> On Thu, 7 Aug 2003, Kirk Bailey wrote:
>
>
>>def iswikiword(word): # test to see is a wikiword
>> caps='ABCDEFGHIJKLMNOPQRSTUVWXYZ' # define capitalletters
>> prior=0 # set inital state for flags
>> count=0 # wikiwords have 2+transitions
>> for letter in word: # examine each letter
>> if letter in caps: # if it IS a capital letter,
>> prior=1 # store that state
>> else: # and if not,
>> if prior==1: # but the previous one WAS,
>> prior=0 # reset the flag,
>> count=count+1 # anc count the transition!
>> else: # otherwise,
>> pass # chill out.
>> if count > 1: # ok, the results are in;
>> return 1 # if there are 2+, itis.
>> else: # and if not,
>> return 0 # say so.
>
>
>
> [Warning: case study of using finite state machines to recognize
> WikiWords.]
>
>
> Hi Kirk,
>
>
>
> Let's define what a "wiki word" is in English, before defining it in code.
> According to:
>
> """It is a concatenation of words (of two or more letters each)
> without white space between them, and with the first letter (only) of
> each component word capitalized."""
>
> http://c2.com/cgi/wiki?WikiWords
>
>
> The code above will work for a lot of cases, but will break on something
> like this:
>
> ###
>
>>>>iswikiword("ThisIsNOTAWikiWord")
>
> 1
> ###
>
>
> A finite state machine is also not allowed to do counting, so the 'count'
> variable in the function will disqualify it from being a finite state
> machine.
>
>
>
>
> We can imagine a finite state machine to recognize WikiWords that might
> look like this:
>
>
> capital?
> START --------------> On1stCapitalLetter
> |
> |
> | lowercase?
> |
> |
> | <--------+
> | |
> | | lowercase?
> | |
> V |
> FirstWord -----+
> |
> |
> | capital?
> |
> |
> V
> +---> OnAnotherCapitalLetter
> | |
> | |
> | | lowercase?
> | |
> | |
> | | <---------+
> capital? | | |
> | | | lowercase?
> | | |
> | V |
> | **ANOTHERWORD ----+
> | |
> | |
> | |
> +-------------+
>
>
>
> This looks suspiciously like a flow diagram, for which I apologize: it's
> hard drawing circles in ASCII, so I've omitted them. *grin*
>
>
> The diagram above has 5 states:
>
> [START, On1stCapitalLetter, FirstWord, OnAnotherCapitalLetter,
> ANOTHERWORD]
>
>
> Whew. That's a mouthful. Let's shorten the names slightly:
>
> [START, On1st, FirstWord, OnAnother, ANOTHERWORD]
>
>
> I've labeled the state that we start on as "START". We jump from state to
> state, depending if the next letter is uppercase or lowercase --- and if
> we ever "fall off" the machine, then the word definitely isn't a Wiki
> Word. Think of it as a board game.
>
>
> If it's a board game, how do we win? How do we know if we are recognizing
> a WikiWord? Well, if we're on the ANOTHERWORD state by the time we're at
> the end of our word, we'll consider that to be a WikiWord: that's the goal
> of any word that wants to have the Wiki nature.
>
>
>
> Before we put down any Python code, let's try it out on something like
> "WikiWord":
>
>
> ------
>
> time state next letter comments
> ---- ----- ----------- --------
>
> 1 START W 'W' is capitalized, so we jump to
> On1stCapitalLetter
>
> 2 On1st i 'i' is lowercased, so we jump to Firstword
>
> 3 FirstWord k 'k' is lowercased. There's an "lowercase?"
> arrow that points back to FirstWord, so we
> jump back to FirstWord.
>
> 4 FirstWord i Ditto.
>
> 5 FirstWord W Ok, we've seen another capital.
>
> 6 OnAnother o Yeah, now we can jump onto ANOTHERWORD!
>
> 7 ANOTHERWORD r ... but we can't celebrate until we finish
> reading the rest of our input!
>
> 8 ANOTHERWORD d Almost there.
>
> 9 ANOTHERWORD Done! And yes, 'WikiWord' is a WikiWord
>
> ------
>
>
> Once we try a few words against our game board, we'll probably feel more
> comfortable that it's doing the right thing. Now to implement it in
> Python! *grin*
>
>
>
> The diagram above is easy for us humans to trace, but not so convenient
> for computers. Let's make it slightly easier by "redrawing" that diagram
> as a table:
>
>
> capital? lowercase?
> -----------------------------------------------
> START | On1st FAIL
> On1st | FAIL FirstWord
> FirstWord | OnAnother FirstWord
> OnAnother | FAIL ANOTHERWORD
> ANOTHERWORD | OnAnother ANOTHERWORD
> FAIL | FAIL FAIL
>
>
> I've sneaked in one more state, a "FAIL" state that catches the situation
> when we "fall" off the diagram. Think of it as a black hole for failed
> WikiWords.
>
>
> If we want to be concise, we can do another renaming of the states of our
> machine, from words to numbers. Let's do that.
>
>
> capital? lowercase?
> --------------------------
> 0 | 1 5
> 1 | 5 2
> 2 | 3 2
> 3 | 5 4
> 4 | 3 4
> 5 | 5 5
>
>
>
> And something like this is easy to represent as a list:
>
> ###
>
>>>>transitions = [[1, 5],
>
> ... [5, 2],
> ... [3, 2],
> ... [5, 4],
> ... [3, 4],
> ... [5, 5]]
> ###
>
>
>
> And now, we can write something that uses this table to jump around our
> state diagram:
>
> ###
> def isWikiWord(word):
> """Code to demonstrate a Finite State Machine that recognizes
> WikiWords."""
> SUCCESS_STATE = 4
> TRANSITIONS = [[1, 5],
> [5, 2],
> [3, 2],
> [5, 4],
> [3, 4],
> [5, 5]]
> current_state = 0
> for letter in word:
> if letter.upper() == letter:
> current_state = TRANSITIONS[current_state][0]
> else:
> current_state = TRANSITIONS[current_state][1]
> return current_state == SUCCESS_STATE
> ###
>
>
> The code itself is simple --- if completely nonobvious. *grin* The real
> complexity of the code has been hidden by our transition state table, but
> the actual running of the code is pretty tame.
>
>
> But does it work?
>
> ###
>
>>>>isWikiWord("WikiWord")
>
> 1
>
>>>>isWikiWord("WikiWordS")
>
> 0
>
>>>>isWikiWord("The")
>
> 0
>
>>>>isWikiWord("TheS")
>
> 0
>
>>>>isWikiWord("SeeMe")
>
> 1
>
>>>>[(word, isWikiWord(word))
>
> ... for word in '''WikiWords is a MixedCase "word"
> ... which is useful in identifying
> ... and linking WikiPages'''.split()]
> [('WikiWords', 1), ('is', 0), ('a', 0),
> ('MixedCase', 1), ('"word"', 0), ('which', 0),
> ('is', 0), ('useful', 0), ('in', 0),
> ('identifying', 0), ('and', 0), ('linking', 0),
> ('WikiPages', 1)]
> ###
>
>
>
>
> One very cute thing about finite state machines is that they can be
> transformed into regular expressions. That is, we can first draw a
> diagram, and then calculate an equivalent regular expression that does the
> same work as the finite state machine.
>
>
> The process is slightly involved, and this post is too way long as it is.
> *grin* But here's the end result of the transformation:
>
> ###
>
>>>>regex = re.compile('''
>
> ... [A-Z][a-z]+
> ... [A-Z][a-z]
> ... ([A-Z][a-z] | [a-z])*
> ... ''', re.VERBOSE)
>
>>>>regex.match('Wiki')
>>>>regex.match('WikiWord')
>
> <_sre.SRE_Match object at 0x8168128>
>
>>>>regex.match('WikiWordsAreNeat')
>
> <_sre.SRE_Match object at 0x8126060>
>
>>>>regex.match('FooBAR')
>>>>regex.match('FooB')
>>>>regex.match('FooBar')
>
> <_sre.SRE_Match object at 0x8168128>
> ###
>
>
> If you're interested in this sort of stuff, you may want to look through
> "Introduction to the Theory of Computation":
>
> http://www-math.mit.edu/~sipser/book.html
>
>
> It's a pricey book, but awesome in its presentation of these ideas.
>
>
>
> I hope this helps!
>
>
>
--
--
end
Cheers!
Kirk D Bailey
+ think +
http://www.howlermonkey.net +-----+ http://www.tinylist.org
http://www.listville.net | BOX | http://www.sacredelectron.org
Thou art free"-ERIS +-----+ 'Got a light?'-Promethieus
+ think +
Fnord.
More information about the Tutor
mailing list