Help Rewriting/Refactoring/Rethinking Parsing Algorythm

Sun Mar 18 23:32:02 EST 2001

Not precisely what you're trying to do, but I do most of my programming with
voice dictation (Dragon NS 5), and I find that a few command phrases are
fairly useful and let me program in just about any editor:

	py-equal (pie-equal) ->  ==
	py-init              -> __init__
	py-main              -> __main__
	py-name              -> __name__ # would be redefined for you...
	py-def               -> def
	dublex (dooblex)     -> wx\Caps Next Word\No-Space After
	triple-quote         -> \No-Space ''' \No-Space
	py-len               -> len

Looking at your list, adding:
	py-caps  -> \No-Space-On\Caps-On
	py-norm  -> \No-Space-Off\Caps-Off
	py-name  -> <word> \No-Space-On\Caps-On [ not sure if you can do this
without a macro... ]

Would give you:
	py-name number customers py-norm
	py-caps Abstract Class py-norm
	if error type py-equal 5 then
	dublex-python equalsign 6
	py-name whiskey type py-norm equalsign 'peachy'
	print "what's wrong with %s?" % py-name first name py-norm

For the indicated phrases.  I find that the py-phrases give a decent
recognition rate.  With vocedit (for Dragon) you can setup the options for
the words easily (e.g. for dublex-, and making _ put no spaces around
itself).

I like the idea of adding "py-name" modes to the mix (would save lots of
mucking around with \Cap \No-space.  Will give it a try.  The dublex- thing
saves lots of headaches for me, I had a similar one for Fox when I was using
that library.

As for travelling the parsing-path:
	I think Aycock's parsing framework would fairly easily handle this kind of
work, it's got an extremely flexible algo (which is apparently fairly slow,
but should hold for interactive work I'd think).

Good luck, will be interested to see what you finally build,
Mike

-----Original Message-----
From: Boopy Bootles [mailto:aschneid at mindspring.com]
Sent: Sunday, March 18, 2001 8:57 PM
To: python-list at python.org
Subject: Help Rewriting/Refactoring/Rethinking Parsing Algorythm

I'm trying to write a simple piece of code to make programming by voice
recognition software easier to do.  I wrote a very simple function that
would, for example, convert "number customers equals 5" to
"numberCustomers = 5".  But once I started using it, I quickly
discovered several more cases I had to handle.  So far, this is the
list:

number customers            - >     numberCustomers
Abstract Class                 ->    AbstractClass
if error type == 5 then     ->    if errorType == 5:
whiskey x-ray python equals 6     ->     wxPython = 6
normal whiskey type = 'peachy'        ->    whiskeyType = 'peachy'
print "what's wrong with %s?" % first name     ->     print "what's
wrong with %s?" % firstName

(the whiskey x-ray stuff is the International Communications Alphabet,
which you use when spelling something out in Dragon NaturallySpeaking if
NaturallySpeaking is having trouble understanding you).

I've gotten all but the last case to work.  But once I started trying to
incorporate the last case--not messing up quoted strings--my already
overly messy code turned into a hideous snarl.  I know there's _got_ to
be a better way to parse this input, but I don't have a clue where to
start.  I'd like to avoid building a full-blown language parser, which
seems like overkill.

I've included the code below, which correctly translates all but the
last case.  Any thoughts would be greatly appreciated, esp. thoughts re:
a simple object-oriented approach (I'm pretty sure there is one, I just
don't have enough experience writing OO code to figure it out).

Thanks,
Anders Schneiderman

P.S. Once I've gotten this code in better shape, I'll post it
somewhere.  Even at this primitive stage, it makes a _huge_ difference
in writing code by voice (I use NaturallySpeaking plus Natlink, Joe
Gould's terrific system for writing NaturallySpeaking macros using
Python).

---------------------------------------------------------------------------

""" voicecode.py:  routines for translating and otherwise manipulating
voice input into code.
"""

from string import *

SpecialWords = {'equals': ' = ', '=': ' = ', '==': ' == ', '%': ' % ',
   'if': 'if ', 'then': ':', 'elsif': 'elif', 'dot': '.', '.':'.',
'open': ' = open(',
   'try': 'try:', 'init': '__init__ (self, ', 'define': 'def ',
'except':'except:',
   'finally':'finally:', 'tab':'   ', 'blank': ' '}
ICA = {'alpha': 1, 'bravo': 1, 'charlie':1, 'delta':1, 'echo':1,
       'foxtrot': 1, 'golf':1, 'hotel':1, 'india':1, 'juliet': 1,
       'kilo':1, 'lima': 1, 'mike':1, 'november':1, 'oscar':1,
       'papa':1, 'quebec':1, 'romeo':1, 'sierra':1, 'tango':1,
       'uniform':1, 'whiskey':1, 'x-ray':1, 'xray':1, 'yankee':1,
'zulu':1 }

quote = {"'":1, '"':1 }

def translate(words):
    """Given an array of words, translate into code.
The rules are:
 * In general, convert lists of words into wordWordWord
 * For certain words/symbols, aka "special words", convert 'em
 * For ICA words--alpha, bravo, etc.--convert to a single letter
 * If the word is "normal", use the next word exactly as is

NOTE:  Right now, this will NOT work on lines that are quoted:
    print 'this is a test' will translate to 'thisIsATest'.
    I need to find a cleaner way to solve this problem.
"""

    line = ''
    firstWord = 1
    normalWord = 0
    for word in words:
        if normalWord:
            # This word should be used exactly as is (prev word was
'normal')
            if firstWord:
                # Don't capitalize the first word of a variable or a
word in a quote
                line = line + word
                firstWord = 0
            else:
                line = line + capitalize(word)
            normalWord = 0
        elif lower(word) == 'normal':
            normalWord = 1            # next word should be used exactly
as is
        elif SpecialWords.has_key(word):
            line = line + SpecialWords[word]
            firstWord = 1
        elif ICA.has_key(lower(word)):
            # International Alphabet -- convert 'alpha' to 'a', etc.
            line = line + word[0]
            firstWord = 0
        elif firstWord:
            # Don't capitalize the first word of a variable or a word in
a quote
            line = line + word  # Don't capitalize the first word of a
variable
            firstWord = 0
        else:
            line = line + capitalize(word)
    return line

--
http://mail.python.org/mailman/listinfo/python-list