[Python-mode] more speech driven how twos

Eric S. Johansson esj at harvee.org
Sat Jun 18 15:00:07 CEST 2011


> BTW what are suitable returns from Emacs report functions for you.
>
> As choices coming into my mind for the moment are:
>
> - simple get it returned
yes as in
position = Emacs_fetch ("current character position", buffer = "focus")
buffer_list = Emacs_fetch ("buffer list")

obviously we have to figure out what's a reasonable set of first arguments for 
this function. I am perfectly willing to use internal Lisp names including 
calling functions to get data. Actually, that seems like it might be very 
reasonable thing to do. Sort of like SQL queries only more understandable.

> - display in mode-line

that seems to be more of a Emacs_set function. Could you elaborate what you're 
thinking of?

> - message-buffer

actually, I was thinking of any buffer. If we had access to the current buffer, 
it would be possible to reimplement VR-mode in a more generic fashion.

> - tool-tip

Again, could you elaborate but I do suspect it's something you would set through 
the set data function

> - so-called speed-bar
>
> Instead of a simple return it might be send so a program...
>
>>
>> My currently preferred Emacs is Xemacs for political reasons[1]
>>
>> I'm not sure what you need in a technical description. Normally in a
>> speech recognition environment you use either fixed grammars or
>> contiguous dictation. I am building a hybrid where you use a fixed
>> grammar with contextually dependent elements and interact with GUI
>> elements to make an unspeakable process speakable.
>>
>> the process of making the unspeakable speakable involves identifying and
>> extracting information from the application and transforming it into a
>> speakable form before displaying it in a second application which can be
>> manipulated. See blog.esjworks.com for more complete examples.
>>
>> I expect that most of the action routines for a complete grammar will
>> just be Emacs keystrokes invoking Emacs methods via keyboard input. It
>> would be nice to do a direct injection of commands to eliminate problems
>> with errors in command execution caused by too fast a rate of injecting
>> characters. A direct access channel would also allows to query the
>> buffer for state information which could be used to influence the action
>> routine.
>>
>> The commands I asked for it which have no need to export information to
>> any external program would help me get a better feel for if I'm on the
>> right track or not. If there's something I use regularly and they "feel"
>> right" is a vocal damage through excessive use, then I'm on the right
>> path. If not, I need to look at the problem again they come up with a
>> better solution.
>>
>> An example of a more complicated spoken command is the "get method"
>> command. The first thing the command does is search to the right for the
>> next method. An alias for it would be get next method. Going in the
>> other direction would be get previous method. Once the method was
>> identified, it would be placed in the region, mark on the left, point on
>> the right. The action routine for the grammar would then invoke a GUI
>> helper program to manipulate symbol names that pass the existing name
>> along to it. The resulting change method would be returned via a
>> different grammar and action routine, "use < transformation type>", and
>> the result would be placed back into the buffer replacing what was in
>> the region.
>>
>> Making any sense?
>>
>>
>
> It does. However, it's a new and vast matter for me. So let's proceed step by 
> step and see how it goes.

I didn't get here overnight. It took me 18 years to become injured and 10 years 
to become frustrated with speech recognition and then another five years to 
figure out how to not become frustrated only become frustrated because I 
couldn't pay people to write the code for me. joys of being a serial 
entrepreneur self-employed type person.  If you think this is interesting, you 
should see what I'm doing for diabetes self-management tools. I really need to 
get that done so I can pass the demo to potential funders which is part of the 
reason why I need these Emacs extensions. Everybody has an ulterior motive. :-)

>
> Let's start with the report-function, telling where you are in code.
> Agreed? So I will dig a little bit into the question, how the results from 
> Emacs are taken up in your environment.

this has been problematic from day one. The blind spot in speech recognition has 
been that since getting information from application is "too hard" that commands 
have become open loop commands using the content of the command to generate a 
keystroke or menu injection commands to activate a function within an 
application. Emacs examples would be something like:

(date | time | log) stamp = stamp.string($1);

search (forward={ctrl+s}|back={ctrl+r}) = $1;
yank (again={Esc}y|it back={ctrl+y}) = $1;
go to line = {Alt+g};
repeat (it={ctrl+u} | twice={ctrl+u2} | thrice={ctrl+u3} ) = $1;

copy    (character={esc}xdelete-forward-char{enter}{ctrl+y} |
          word= {esc}d{ctrl+y}|
          line={esc}xset-mark{ctrl-e}{esc}w
         )= $1;

kill    (character = {ctrl+d} |
          word={esc}d|
          line={ctrl+k}
         )= $1;

delete  (character = {esc}xdelete-backward-charI{enter}|
          word={esc}xbackward-kill-word{enter}|
          line={ctrl+u}0k{}
         )= $1;

left    (character={ctrl+b}|
          word={esc}b|
          line={ctrl+u}0k{esc}xforward-line{enter}
         )= $1;

right   (character={ctrl+f}|
          word={esc}f|
          line={esc}xforward-line{enter}
         )= $1;

sorry if that doesn't make much sense but the grammar is expressed in a rather 
odd way in vocola, mixing action with grammar.

copy (character | word | line) is expressed as:

copy    (character={esc}xdelete-forward-char{enter}{ctrl+y} |
          word= {esc}d{ctrl+y}|
          line={esc}xset-mark{ctrl-e}{esc}w
         )= $1;

the right hand side of each grammar expression is the key sequence emitted. 
Note: in this example, not all of the key sequences are correct but I fixed them 
when I need them for the first time. You could say this is programming by 
placeholder.

there has been no activity trying to find a general way to return data to the 
speech recognition environment. There have been some hacks using the cut and 
paste buffer mechanism but that's a rather unsophisticated. I think the best 
model is "tell me what I want" rather than try to force anything into the speech 
recognition action routines. The only potential of exception to that would be 
the change of some major state which necessitates moving to a new grammar and 
action routine state. For example, if you change modes, you had want a new 
grammar. We might build a handle that by the first thing we ask is "what is the 
state" and if it's the same as the last time, we don't change anything. 
Otherwise we initiate a grammar change.

This also highlights another difference between current thinking and what I 
think is a better solution. Current thinking only changes grammar if you change 
applications. I believe we should change grammar if an internal state changes 
because that should allow you to disambiguate commands through reduction of 
scope (you'll hear me say that a lot) in python mode, an example that would be 
enabling commands for Shell operation and debugging only after you enter a shell 
buffer versus a globally active grammar which would require longer more unique 
commands.

in summary, I would say that the easiest model is to continue with the grammar 
and action routine triggers the way we do now and action routines query the 
application for data and push data to the application.

I should probably into here because I'm probably giving you intellectual 
heartburn at this point because it's a lot to digest and I'm late for my 
astronomy club's work party.  observing sites don't maintain themselves.

Later
--- eric


More information about the Python-mode mailing list