[Tutor] is this a state machine? [designing a program to detect WikiWords]

Kirk Bailey idiot1 at netzero.net
Thu Aug 7 22:52:22 EDT 2003


Again I am impressed by my own ignorance of my craft. And by the wisdom of my 
betters.

Danny Yoo wrote:

> 
> On Thu, 7 Aug 2003, Kirk Bailey wrote:
> 
> 
>>def iswikiword(word):				# test to see is a wikiword
>>	caps='ABCDEFGHIJKLMNOPQRSTUVWXYZ'	# define capitalletters
>>	prior=0					# set inital state for flags
>>	count=0					# wikiwords have 2+transitions
>>	for letter in word:			# examine each letter
>>		if letter in caps:		# if it IS a capital letter,
>>			prior=1			# store that state
>>		else:					# and if not,
>>			if prior==1:		# but the previous one WAS,
>>				prior=0		# reset the flag,
>>				count=count+1	# anc count the transition!
>>			else:				# otherwise,
>>				pass 			# chill out.
>>	if count > 1:				# ok, the results are in;
>>		return 1				# if there are 2+, itis.
>>	else:						# and if not,
>>		return 0				# say so.
> 
> 
> 
> [Warning: case study of using finite state machines to recognize
> WikiWords.]
> 
> 
> Hi Kirk,
> 
> 
> 
> Let's define what a "wiki word" is in English, before defining it in code.
> According to:
> 
>     """It is a concatenation of words (of two or more letters each)
>     without white space between them, and with the first letter (only) of
>     each component word capitalized."""
> 
>     http://c2.com/cgi/wiki?WikiWords
> 
> 
> The code above will work for a lot of cases, but will break on something
> like this:
> 
> ###
> 
>>>>iswikiword("ThisIsNOTAWikiWord")
> 
> 1
> ###
> 
> 
> A finite state machine is also not allowed to do counting, so the 'count'
> variable in the function will disqualify it from being a finite state
> machine.
> 
> 
> 
> 
> We can imagine a finite state machine to recognize WikiWords that might
> look like this:
> 
> 
>                    capital?
>           START --------------> On1stCapitalLetter
>                                       |
>                                       |
>                                       | lowercase?
>                                       |
>                                       |
>                                       | <--------+
>                                       |          |
>                                       |          | lowercase?
>                                       |          |
>                                       V          |
>                                   FirstWord -----+
>                                       |
>                                       |
>                                       | capital?
>                                       |
>                                       |
>                                       V
>                         +---> OnAnotherCapitalLetter
>                         |             |
>                         |             |
>                         |             | lowercase?
>                         |             |
>                         |             |
>                         |             | <---------+
>               capital?  |             |           |
>                         |             |           | lowercase?
>                         |             |           |
>                         |             V           |
>                         |       **ANOTHERWORD ----+
>                         |             |
>                         |             |
>                         |             |
>                         +-------------+
> 
> 
> 
> This looks suspiciously like a flow diagram, for which I apologize: it's
> hard drawing circles in ASCII, so I've omitted them.  *grin*
> 
> 
> The diagram above has 5 states:
> 
>     [START, On1stCapitalLetter, FirstWord, OnAnotherCapitalLetter,
>      ANOTHERWORD]
> 
> 
> Whew.  That's a mouthful.  Let's shorten the names slightly:
> 
>     [START, On1st, FirstWord, OnAnother, ANOTHERWORD]
> 
> 
> I've labeled the state that we start on as "START".  We jump from state to
> state, depending if the next letter is uppercase or lowercase --- and if
> we ever "fall off" the machine, then the word definitely isn't a Wiki
> Word.  Think of it as a board game.
> 
> 
> If it's a board game, how do we win?  How do we know if we are recognizing
> a WikiWord?  Well, if we're on the ANOTHERWORD state by the time we're at
> the end of our word, we'll consider that to be a WikiWord: that's the goal
> of any word that wants to have the Wiki nature.
> 
> 
> 
> Before we put down any Python code, let's try it out on something like
> "WikiWord":
> 
> 
> ------
> 
> time   state   next letter   comments
> ----   -----   -----------   --------
> 
>  1     START       W          'W' is capitalized, so we jump to
>                               On1stCapitalLetter
> 
>  2     On1st       i          'i' is lowercased, so we jump to Firstword
> 
>  3     FirstWord   k          'k' is lowercased.  There's an "lowercase?"
>                               arrow that points back to FirstWord, so we
>                               jump back to FirstWord.
> 
>  4     FirstWord   i          Ditto.
> 
>  5     FirstWord   W          Ok, we've seen another capital.
> 
>  6     OnAnother   o          Yeah, now we can jump onto ANOTHERWORD!
> 
>  7     ANOTHERWORD r          ... but we can't celebrate until we finish
>                               reading the rest of our input!
> 
>  8     ANOTHERWORD d          Almost there.
> 
>  9     ANOTHERWORD            Done!  And yes, 'WikiWord' is a WikiWord
> 
> ------
> 
> 
> Once we try a few words against our game board, we'll probably feel more
> comfortable that it's doing the right thing.  Now to implement it in
> Python!  *grin*
> 
> 
> 
> The diagram above is easy for us humans to trace, but not so convenient
> for computers.  Let's make it slightly easier by "redrawing" that diagram
> as a table:
> 
> 
>                     capital?      lowercase?
> -----------------------------------------------
> START        |      On1st          FAIL
> On1st        |      FAIL           FirstWord
> FirstWord    |      OnAnother      FirstWord
> OnAnother    |      FAIL           ANOTHERWORD
> ANOTHERWORD  |      OnAnother      ANOTHERWORD
> FAIL         |      FAIL           FAIL
> 
> 
> I've sneaked in one more state, a "FAIL" state that catches the situation
> when we "fall" off the diagram.  Think of it as a black hole for failed
> WikiWords.
> 
> 
> If we want to be concise, we can do another renaming of the states of our
> machine, from words to numbers.  Let's do that.
> 
> 
>       capital?   lowercase?
> --------------------------
> 0   |    1           5
> 1   |    5           2
> 2   |    3           2
> 3   |    5           4
> 4   |    3           4
> 5   |    5           5
> 
> 
> 
> And something like this is easy to represent as a list:
> 
> ###
> 
>>>>transitions = [[1, 5],
> 
> ...                [5, 2],
> ...                [3, 2],
> ...                [5, 4],
> ...                [3, 4],
> ...                [5, 5]]
> ###
> 
> 
> 
> And now, we can write something that uses this table to jump around our
> state diagram:
> 
> ###
> def isWikiWord(word):
>     """Code to demonstrate a Finite State Machine that recognizes
>        WikiWords."""
>     SUCCESS_STATE = 4
>     TRANSITIONS = [[1, 5],
>                    [5, 2],
>                    [3, 2],
>                    [5, 4],
>                    [3, 4],
>                    [5, 5]]
>     current_state = 0
>     for letter in word:
>         if letter.upper() == letter:
>             current_state = TRANSITIONS[current_state][0]
>         else:
>             current_state = TRANSITIONS[current_state][1]
>     return current_state == SUCCESS_STATE
> ###
> 
> 
> The code itself is simple --- if completely nonobvious.  *grin* The real
> complexity of the code has been hidden by our transition state table, but
> the actual running of the code is pretty tame.
> 
> 
> But does it work?
> 
> ###
> 
>>>>isWikiWord("WikiWord")
> 
> 1
> 
>>>>isWikiWord("WikiWordS")
> 
> 0
> 
>>>>isWikiWord("The")
> 
> 0
> 
>>>>isWikiWord("TheS")
> 
> 0
> 
>>>>isWikiWord("SeeMe")
> 
> 1
> 
>>>>[(word, isWikiWord(word))
> 
> ...  for word in '''WikiWords is a MixedCase "word"
> ...                 which is useful in identifying
> ...                 and linking WikiPages'''.split()]
> [('WikiWords', 1), ('is', 0), ('a', 0),
>  ('MixedCase', 1), ('"word"', 0), ('which', 0),
>  ('is', 0), ('useful', 0), ('in', 0),
>  ('identifying', 0), ('and', 0), ('linking', 0),
>  ('WikiPages', 1)]
> ###
> 
> 
> 
> 
> One very cute thing about finite state machines is that they can be
> transformed into regular expressions.  That is, we can first draw a
> diagram, and then calculate an equivalent regular expression that does the
> same work as the finite state machine.
> 
> 
> The process is slightly involved, and this post is too way long as it is.
> *grin* But here's the end result of the transformation:
> 
> ###
> 
>>>>regex = re.compile('''
> 
> ...     [A-Z][a-z]+
> ...     [A-Z][a-z]
> ...     ([A-Z][a-z] | [a-z])*
> ...     ''', re.VERBOSE)
> 
>>>>regex.match('Wiki')
>>>>regex.match('WikiWord')
> 
> <_sre.SRE_Match object at 0x8168128>
> 
>>>>regex.match('WikiWordsAreNeat')
> 
> <_sre.SRE_Match object at 0x8126060>
> 
>>>>regex.match('FooBAR')
>>>>regex.match('FooB')
>>>>regex.match('FooBar')
> 
> <_sre.SRE_Match object at 0x8168128>
> ###
> 
> 
> If you're interested in this sort of stuff, you may want to look through
> "Introduction to the Theory of Computation":
> 
>     http://www-math.mit.edu/~sipser/book.html
> 
> 
> It's a pricey book, but awesome in its presentation of these ideas.
> 
> 
> 
> I hope this helps!
> 
> 
> 

-- 

-- 

end

Cheers!
         Kirk D Bailey

  +                              think                                +
   http://www.howlermonkey.net  +-----+        http://www.tinylist.org
   http://www.listville.net     | BOX |  http://www.sacredelectron.org
   Thou art free"-ERIS          +-----+     'Got a light?'-Promethieus
  +                              think                                +

Fnord.




More information about the Tutor mailing list