
There are several `parser generator' tools that will compile a grammar specification to a parser; the best-known one is Bison, an open-source implementation of the classic Unix tool YACC (Yet Another Compiler Compiler).
Python has its own parser generator, SPARK. Unfortunately, while SPARK is quite powerful (that is, good for handling ambiguities in the spec), the Earley algorithm it uses gives O(n**3) performance in the generated parser. It's not usable for production on larger than toy grammars.
The Python standard library includes a lexer class suitable for a large class of shell-like syntaxes. As Guido has pointed out, regexps provide another attack on the problem.
SPARK can hardly make clame to the fame of being "Python's own parser generator". While it's a parser generator for Python programs and itself written in Python, it is not distributed with Python. "Python's own" would be pgen, which lives in the Parser subdirectory of the Python source tree. Pgen is used to parse the Python source code and construct a parse tree out of it. As parser generators go, pgen is appropriately (and pythonically) stupid -- its power is restricted to that of LL(1) languages, equivalent to recursive-descent parsers. Its only interesting feature may be that it uses a regex-like notation to feed it the grammar for which to generate a parser. (That's what the *, ?, [], | and () meta-symbols in the file Grammar/Grammar are for.) I would note that for small languages (much smaller than Python), writing a recursive-descent parser by hand is actually one of the most effective ways of creating a parser. I recently had the pleasure to write a recursive-descent parser for a simple Boolean query language; there was absolutely no need to involve a big gun like a parser generator. OTOH I would not consider writing a recursive-descent parser by hand for Python's Grammar -- that's why I created pgen in the first place. :-) Another note for Aahz: when it comes to scanning data that's not really a programming language, e.g. email messages, the words parsing, scanning, lexing and tokenizing are often used pretty much interchangeably. --Guido van Rossum (home page: http://www.python.org/~guido/)