lexing nested parenthesis (for a Python Unix Shell)

Jim Meier jim at dsdd.org
Fri Aug 2 07:45:48 CEST 2002

On Wed, 31 Jul 2002 11:30:40 -0600, Dave Cinege wrote:

> On Wednesday 31 July 2002 16:32, Bengt Richter wrote:
>> 	if 1 and (var1 or qm('-d /etc/')):
>> would already be legal Python.
> That's not the point. I'm not making legal Python but a 'short hand'
> subset, specifically a Python Unix Shell (aka bourne shell replacement)
> To put things in perspective:
> 	In bash sh:	[ -d /etc/ ]
> 	In pysh:	=(-d /etc/)  (Maybe =('-d /etc/') )
> 	At runtime it will be parsed and replaced by:
> 	pysh_test('-d', '/etc/')

I think you definitely want to go check out section 18 of the standard
library reference, specifically the 'tokenize' and 'parser' modules. They
will save you huge amounts of wheel-reinventing.

A good approach might be to use the 'tokenize' module to lex your input,
then do simple fixups on patterns in the token stream. Then rebuild a
source string and have python run it. (the 'parser' module is, strangely,
missing a parsing function that takes tokens instead of strings)

If you want to work at the grammar level, have a look at John Aycock's
SPARK parsing toolkit, which comes with a skeleton python grammar already
implemented (for python 1.5.2, but it's a good start). You'll be able to
massage your parse tree into a nested-list representation that you can
feed directly into the 'parser' module's 'compileast' function.


> In bash:
> 	for line in $(cat *.py); do echo $line; done	# Yep time to retire this POS
> In pysh
> 	for $line in !(cat *(*.py)): print $line ;;	# Ain't it pretty?
> 	FYI
> 	  $   == variable prefix (I might be able to avoid using this, dunno)
> 	  !() == Command Substitution
> 	  *() == Shell glob (might become seemless, ie I search for glob chars!)
> 	  ;;  == explict newline

Ugh, definitely avoid the abhorrent '$' syntax - this is the year 2002, we
can do better. Just use python variables and provide a simple function or
statement to export particular variables to child processes

For command substitution, I'd personally prefer something like rc's
syntax, ie `{cat *.py} .. but since it basically comes down to your
favorite quoting character, it's not too important :)

the *() syntax will be difficult to parse around, and just gets in the way
of the user. I would try to avoid it.

I don't know what ';; == explicit newline' means .. can't the user just
press enter?

Let us know how the project goes ..


More information about the Python-list mailing list