lexing nested parenthesis (for a Python Unix Shell)

Bengt Richter bokr at oz.net
Fri Aug 2 22:15:46 CEST 2002

On Wed, 31 Jul 2002 17:30:40 -0400, Dave Cinege <dcinege at psychosis.com> wrote:

>On Wednesday 31 July 2002 16:32, Bengt Richter wrote:
>> 	if 1 and (var1 or qm('-d /etc/')):
>> would already be legal Python.
>That's not the point. I'm not making legal Python but a 'short hand'
>subset, specifically a Python Unix Shell (aka bourne shell replacement)
>The idea right now is to parse and replace the short hand 
>with python constructs that are predefined, and then let python
>run all of it. It's either that or I basically handle ALL the
>parsing and pretty much recreate the wheel.
>You see I only want to deal with 'my' subset...python will then run
>(via compile()) and handle the remaining grammer, indentation, etc.
>Things get a bit more difficult in interactive mode, but I feel this
>is still the best route.
>> Maybe a few examples with un-nested and nested parens (and nested ?(...)
>> constructs??) together with what Python you would like to have them
>> transformed to would get you some useful help.
>I haven't speced it all out yet, but I'm pretty much decided I want to
>contain most all thing within ()'s and prefix the first ( with an identifier.
>To put things in perspective:
>	In bash sh:	[ -d /etc/ ]
>	In pysh:	=(-d /etc/)  (Maybe =('-d /etc/') )
>	At runtime it will be parsed and replaced by:
>	pysh_test('-d', '/etc/')
>Somethings will not be so easy as this, as they will not be a
>simple function name replacement. It can get ugly when I need
>to work recursivly through nested functions. I need to work
>on the next item first so I know how to handle output.
>IE if the return is normally a list, and it's nested in
>what requires and string, I have to account for that.
>In bash:
>	for line in $(cat *.py); do echo $line; done	# Yep time to retire this POS
>In pysh
>	for $line in !(cat *(*.py)): print $line ;;	# Ain't it pretty?
>	  $   == variable prefix (I might be able to avoid using this, dunno)
>	  !() == Command Substitution
>	  *() == Shell glob (might become seemless, ie I search for glob chars!)
>	  ;;  == explict newline
>Parsed to python:
>	for line in pysh_cmdsub_inpath('cat',pysh_ListToArgStr(pysh_glob('*.py'))):
>		print line  	# You can visualize the pysh functions...
>> 'Better' is a waste of time unless we're working on the real problem ;-)
>The problem is Python already works. : > I KNOW how I can do all this, I just
>don't feel like writing a complete parser if I can reuse something Python 
>itself already uses for parsing.
You might consider breaking the source of your 'shorthand' into tokens of interest
using re. Jonathan Hogg has already provided a leg up. Your special shorthand
expressions seem to be parentheses with a prefix character [!*], or ';;', or $name,
and presumably unprefixed parens should work as usual. I suspect you can just let
$name be name in the first place, unless you need to do something special with specially
designated names, but we'll keep it in. I don't know what ';;' is supposed to do, but
if you do this:

 >>> import re
 >>> sh = "for $line in !(cat *(*.py)): print $line ;;   # Ain't it pretty?"
 >>> splitre = re.compile(r'([$!*][(]|[()]|[$][a-zA-Z_]\w+|;;)')
 >>> pieces = splitre.split(sh)
 >>> pieces
 ['for ', '$line', ' in ', '!(', 'cat ', '*(', '*.py', ')', '', ')', ': print ', '$line',
 ' ', ';;', "   # Ain't it pretty?"]

maybe the pieces list will give you ideas on how to convert it, e.g.,
(ignoring indentation problems and other things we don't know about yet ;-)

 >>> def munge(pieces, i=0):
 ...     ret = []
 ...     ihi = len(pieces)
 ...     while i < ihi:
 ...         p = pieces[i]
 ...         ps = p.strip()
 ...         if not p:
 ...             pass
 ...         elif not ps or ps[0] not in '$!*();':
 ...             ret.append(p)
 ...         elif ps[0] == '$':
 ...             ret.append(p[1:]) # just strip dollar sign for now
 ...         elif ps == '!(':
 ...             # always recurse on any left paren, and return on right paren
 ...             ret.append('pysh_cmdsub_inpath(')
 ...             ret.append(`pieces[i+1].strip()`)
 ...             ret.append(', ') # comma after arg
 ...             s, i = munge(pieces, i+2) # returned i should be index of last item used
 ...             ret.append(s)
 ...         elif ps == '*(':
 ...             ret.append('pysh_ListToArgStr(pysh_glob(')
 ...             ret.append(`pieces[i+1].strip()`)
 ...             s, i = munge(pieces, i+2) # returned i should be index of last item used
 ...             ret.append(s+')') # need extra right paren to close ListToArgStr, whatever that is ;-)
 ...         elif ps == ';;':
 ...             ret.append('\n') # ??
 ...         elif ps == ')':
 ...             ret.append(')')
 ...             return ''.join(ret), i
 ...         elif ps == '(':
 ...             ret.append('(')
 ...             s, i = munge(pieces, i+1)
 ...             ret.append(s)
 ...         else:
 ...             ret.append(p)
 ...         i += 1
 ...     return ''.join(ret), i-1
 >>> munge(pieces)
 ("for line in pysh_cmdsub_inpath('cat', pysh_ListToArgStr(pysh_glob('*.py'))): print line
 \n   # Ain't it pretty?", 14)
 >>> print munge(pieces)[0]
 for line in pysh_cmdsub_inpath('cat', pysh_ListToArgStr(pysh_glob('*.py'))): print line
    # Ain't it pretty?

(Not tested beyond what you see, and surely not exactly what you need, but it might
give you some ideas. BTW, I think pysh_ListToArgStr probably doesn't belong there. I'd
either include its functionality in pysh_glob or let pysh_cmd_sub_inpath handle it
internally depending on what it got as a second arg, depending on how things factor
in your overall vision. And ';;' almost certainly doesn't do what you had in mind ;-)

Bengt Richter

More information about the Python-list mailing list