Help Rewriting/Refactoring/Rethinking Parsing Algorythm

Boopy Bootles aschneid at
Mon Mar 19 02:56:43 CET 2001

I'm trying to write a simple piece of code to make programming by voice
recognition software easier to do.  I wrote a very simple function that
would, for example, convert "number customers equals 5" to
"numberCustomers = 5".  But once I started using it, I quickly
discovered several more cases I had to handle.  So far, this is the

number customers            - >     numberCustomers
Abstract Class                 ->    AbstractClass
if error type == 5 then     ->    if errorType == 5:
whiskey x-ray python equals 6     ->     wxPython = 6
normal whiskey type = 'peachy'        ->    whiskeyType = 'peachy'
print "what's wrong with %s?" % first name     ->     print "what's
wrong with %s?" % firstName

(the whiskey x-ray stuff is the International Communications Alphabet,
which you use when spelling something out in Dragon NaturallySpeaking if
NaturallySpeaking is having trouble understanding you).

I've gotten all but the last case to work.  But once I started trying to
incorporate the last case--not messing up quoted strings--my already
overly messy code turned into a hideous snarl.  I know there's _got_ to
be a better way to parse this input, but I don't have a clue where to
start.  I'd like to avoid building a full-blown language parser, which
seems like overkill.

I've included the code below, which correctly translates all but the
last case.  Any thoughts would be greatly appreciated, esp. thoughts re:
a simple object-oriented approach (I'm pretty sure there is one, I just
don't have enough experience writing OO code to figure it out).

Anders Schneiderman

P.S. Once I've gotten this code in better shape, I'll post it
somewhere.  Even at this primitive stage, it makes a _huge_ difference
in writing code by voice (I use NaturallySpeaking plus Natlink, Joe
Gould's terrific system for writing NaturallySpeaking macros using


"""  routines for translating and otherwise manipulating
voice input into code.

from string import *

SpecialWords = {'equals': ' = ', '=': ' = ', '==': ' == ', '%': ' % ',
   'if': 'if ', 'then': ':', 'elsif': 'elif', 'dot': '.', '.':'.',
'open': ' = open(',
   'try': 'try:', 'init': '__init__ (self, ', 'define': 'def ',
   'finally':'finally:', 'tab':'   ', 'blank': ' '}
ICA = {'alpha': 1, 'bravo': 1, 'charlie':1, 'delta':1, 'echo':1,
       'foxtrot': 1, 'golf':1, 'hotel':1, 'india':1, 'juliet': 1,
       'kilo':1, 'lima': 1, 'mike':1, 'november':1, 'oscar':1,
       'papa':1, 'quebec':1, 'romeo':1, 'sierra':1, 'tango':1,
       'uniform':1, 'whiskey':1, 'x-ray':1, 'xray':1, 'yankee':1,
'zulu':1 }

quote = {"'":1, '"':1 }

def translate(words):
    """Given an array of words, translate into code.
The rules are:
 * In general, convert lists of words into wordWordWord
 * For certain words/symbols, aka "special words", convert 'em
 * For ICA words--alpha, bravo, etc.--convert to a single letter
 * If the word is "normal", use the next word exactly as is

NOTE:  Right now, this will NOT work on lines that are quoted:
    print 'this is a test' will translate to 'thisIsATest'.
    I need to find a cleaner way to solve this problem.

    line = ''
    firstWord = 1
    normalWord = 0
    for word in words:
        if normalWord:
            # This word should be used exactly as is (prev word was
            if firstWord:
                # Don't capitalize the first word of a variable or a
word in a quote
                line = line + word
                firstWord = 0
                line = line + capitalize(word)
            normalWord = 0
        elif lower(word) == 'normal':
            normalWord = 1            # next word should be used exactly
as is
        elif SpecialWords.has_key(word):
            line = line + SpecialWords[word]
            firstWord = 1
        elif ICA.has_key(lower(word)):
            # International Alphabet -- convert 'alpha' to 'a', etc.
            line = line + word[0]
            firstWord = 0
        elif firstWord:
            # Don't capitalize the first word of a variable or a word in
a quote
            line = line + word  # Don't capitalize the first word of a
            firstWord = 0
            line = line + capitalize(word)
    return line

More information about the Python-list mailing list