[pypy-svn] r11423 - in pypy/dist/pypy/module/parser/recparser: . ebnf leftout python test test/samples tools

ludal at codespeak.net ludal at codespeak.net
Mon Apr 25 16:03:44 CEST 2005


Author: ludal
Date: Mon Apr 25 16:03:44 2005
New Revision: 11423

Added:
   pypy/dist/pypy/module/parser/recparser/
   pypy/dist/pypy/module/parser/recparser/README
   pypy/dist/pypy/module/parser/recparser/ebnf/
   pypy/dist/pypy/module/parser/recparser/ebnf/__init__.py
   pypy/dist/pypy/module/parser/recparser/ebnf/lexer.py
   pypy/dist/pypy/module/parser/recparser/ebnf/parse.py
   pypy/dist/pypy/module/parser/recparser/grammar.py
   pypy/dist/pypy/module/parser/recparser/leftout/
   pypy/dist/pypy/module/parser/recparser/leftout/builders.py
   pypy/dist/pypy/module/parser/recparser/leftout/compiler.py
   pypy/dist/pypy/module/parser/recparser/leftout/gen_ast.py
   pypy/dist/pypy/module/parser/recparser/leftout/parse_grammar.py
   pypy/dist/pypy/module/parser/recparser/leftout/pgen.py
   pypy/dist/pypy/module/parser/recparser/python/
   pypy/dist/pypy/module/parser/recparser/python/Grammar2.3
   pypy/dist/pypy/module/parser/recparser/python/Grammar2.4
   pypy/dist/pypy/module/parser/recparser/python/__init__.py
   pypy/dist/pypy/module/parser/recparser/python/lexer.py
   pypy/dist/pypy/module/parser/recparser/python/parse.py
   pypy/dist/pypy/module/parser/recparser/syntaxtree.py
   pypy/dist/pypy/module/parser/recparser/test/
   pypy/dist/pypy/module/parser/recparser/test/samples/
   pypy/dist/pypy/module/parser/recparser/test/samples/test_1.py
   pypy/dist/pypy/module/parser/recparser/test/samples/test_2.py
   pypy/dist/pypy/module/parser/recparser/test/samples/test_3.py
   pypy/dist/pypy/module/parser/recparser/test/samples/test_4.py
   pypy/dist/pypy/module/parser/recparser/test/samples/test_comment.py
   pypy/dist/pypy/module/parser/recparser/test/samples/test_encoding_declaration.py
   pypy/dist/pypy/module/parser/recparser/test/samples/test_encoding_declaration2.py
   pypy/dist/pypy/module/parser/recparser/test/samples/test_encoding_declaration3.py
   pypy/dist/pypy/module/parser/recparser/test/samples/test_function_calls.py
   pypy/dist/pypy/module/parser/recparser/test/samples/test_generator.py
   pypy/dist/pypy/module/parser/recparser/test/samples/test_import_statements.py
   pypy/dist/pypy/module/parser/recparser/test/samples/test_list_comps.py
   pypy/dist/pypy/module/parser/recparser/test/samples/test_numbers.py
   pypy/dist/pypy/module/parser/recparser/test/samples/test_ony_one_comment.py
   pypy/dist/pypy/module/parser/recparser/test/samples/test_redirected_prints.py
   pypy/dist/pypy/module/parser/recparser/test/samples/test_samples.py
   pypy/dist/pypy/module/parser/recparser/test/samples/test_simple_assignment.py
   pypy/dist/pypy/module/parser/recparser/test/samples/test_simple_class.py
   pypy/dist/pypy/module/parser/recparser/test/samples/test_simple_for_loop.py
   pypy/dist/pypy/module/parser/recparser/test/samples/test_simple_in_test.py
   pypy/dist/pypy/module/parser/recparser/test/samples/test_slice.py
   pypy/dist/pypy/module/parser/recparser/test/samples/test_whitespaces.py
   pypy/dist/pypy/module/parser/recparser/test/test_pytokenizer.py
   pypy/dist/pypy/module/parser/recparser/test/test_samples.py
   pypy/dist/pypy/module/parser/recparser/test/test_samples2.py
   pypy/dist/pypy/module/parser/recparser/tools/
   pypy/dist/pypy/module/parser/recparser/tools/tokenize.py
Log:
import into main svn
following is the partial revision info from darcs
  * rewrote test_samples.py for py.test
  * modified grammar for arglist
  * added dummy whitespaces test
  * added testcase for function various function calls
  * updated test_encoding_declaration2.py to have a non-normalized encoding test case
  * added encoding normalization
  * tokenize.py is not used anymore
  * HACK patch for encoding declarations (remove me when a better solution is found)
  * added test snippets (esp. for encoding declarations)
  * added regexp to check encoding declarations
  * updated python/parse.py's main
  * prefix each test file with 'test_'
  * encoding declarations are not parsed correctly
  * fixed redirected prints (print >> f) syntax errors
  * unittest_pysource.py is out of date (see test/test_pytokenizer.py)
  * misc tidy / removed unused imports
  * added testcases for comments and "is not"
  * modified official Python Grammar to remove ambiguity
  * added class testcase
  * removed debug output
  * record first appearing comment not last
  * revert comment regexp change
  * added unit tests for python tokenizer
  * added time info + improved test script
  * added several small tests snippets
  * added missing RBRACE symbol
  * fixed bugs with comments, numbers and slices
  * cleanup
  * grammar bugfix and recursion removal in Grammar2.3
  * improve grammar tree representation
  * Choose between python 2.3 and python 2.4 grammar
  * removed import lexers
  * added python.parse
  * Use the list of parsed keywords (from Grammar) instead of a hard-coded one
  * added parser tests
  * new tests and cleanup
  * rename simple_for_loop test to simple_in_test
  * make interface to tokenizer accept strings only
  * reorganization
  * export parse_grammar from ebnf
  * add __init__.py files
  * move back Grammar into python dir
  * rework python.lexer
  * move stuf around
  * add ebnf/lexer and move TokenSource to grammar
  * correct ebnf/parse
  * split python parsing and ebnf grammar parsing
  * new tests
  * disable debugging by default
  * move junk to leftout/
  * Reorganize grammar.py
  * Initial Revision



Added: pypy/dist/pypy/module/parser/recparser/README
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/README	Mon Apr 25 16:03:44 2005
@@ -0,0 +1,8 @@
+
+this is a 'standalone' version of the parser module
+as of now it needs '.' to be in the PYTHONPATH so that eg
+import ebnf # works
+
+This should change once we figure out how to integrate properly with
+pypy and add an option to switch between the two parsers
+

Added: pypy/dist/pypy/module/parser/recparser/ebnf/__init__.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/ebnf/__init__.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1,2 @@
+
+from parse import parse_grammar

Added: pypy/dist/pypy/module/parser/recparser/ebnf/lexer.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/ebnf/lexer.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1,64 @@
+"""This is a lexer for a Python recursive descent parser
+it obeys the TokenSource interface defined for the grammar
+analyser in grammar.py
+"""
+
+import re
+from grammar import TokenSource
+
+DEBUG = False
+
+## Lexer for Python's grammar ########################################
+g_symdef = re.compile(r"[a-zA-Z_][a-zA-Z0-9_]*:",re.M)
+g_symbol = re.compile(r"[a-zA-Z_][a-zA-Z0-9_]*",re.M)
+g_string = re.compile(r"'[^']+'",re.M)
+g_tok = re.compile(r"\[|\]|\(|\)|\*|\+|\|",re.M)
+g_skip = re.compile(r"\s*(#.*$)?",re.M)
+
+class GrammarSource(TokenSource):
+    """The grammar tokenizer"""
+    def __init__(self, inpstring ):
+        TokenSource.__init__(self)
+        self.input = inpstring
+        self.pos = 0
+
+    def context(self):
+        return self.pos
+
+    def restore(self, ctx ):
+        self.pos = ctx
+
+    def next(self):
+        pos = self.pos
+        inp = self.input
+        m = g_skip.match(inp, pos)
+        while m and pos!=m.end():
+            pos = m.end()
+            if pos==len(inp):
+                self.pos = pos
+                return None, None
+            m = g_skip.match(inp, pos)
+        m = g_symdef.match(inp,pos)
+        if m:
+            tk = m.group(0)
+            self.pos = m.end()
+            return 'SYMDEF',tk[:-1]
+        m = g_tok.match(inp,pos)
+        if m:
+            tk = m.group(0)
+            self.pos = m.end()
+            return tk,tk
+        m = g_string.match(inp,pos)
+        if m:
+            tk = m.group(0)
+            self.pos = m.end()
+            return 'STRING',tk[1:-1]
+        m = g_symbol.match(inp,pos)
+        if m:
+            tk = m.group(0)
+            self.pos = m.end()
+            return 'SYMBOL',tk
+        raise ValueError("Unknown token at pos=%d context='%s'" % (pos,inp[pos:pos+20]) )
+
+    def debug(self):
+        return self.input[self.pos:self.pos+20]

Added: pypy/dist/pypy/module/parser/recparser/ebnf/parse.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/ebnf/parse.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1,253 @@
+#!/usr/bin/env python
+from grammar import BaseGrammarBuilder, Alternative, Sequence, Token, \
+     KleenStar, GrammarElement
+from lexer import GrammarSource
+
+import re
+py_name = re.compile(r"[a-zA-Z_][a-zA-Z0-9_]*", re.M)
+
+punct=['>=', '<>', '!=', '<', '>', '<=', '==', '\\*=',
+       '//=', '%=', '^=', '<<=', '\\*\\*=', '\\', '=',
+       '\\+=', '>>=', '=', '&=', '/=', '-=', '\n,', '^', '>>', '&', '\\+', '\\*', '-', '/', '\\.', '\\*\\*', '%', '<<', '//', '\\', '', '\n\\)', '\\(', ';', ':', '@', '\\[', '\\]', '`', '\\{', '\\}']
+
+py_punct = re.compile(r"""
+>=|<>|!=|<|>|<=|==|~|
+\*=|//=|%=|\^=|<<=|\*\*=|\|=|\+=|>>=|=|&=|/=|-=|
+,|\^|>>|&|\+|\*|-|/|\.|\*\*|%|<<|//|\||
+\)|\(|;|:|@|\[|\]|`|\{|\}
+""", re.M | re.X)
+
+
+TERMINALS = [
+    'NAME', 'NUMBER', 'STRING', 'NEWLINE', 'ENDMARKER',
+    'INDENT', 'DEDENT' ]
+
+
+## Grammar Visitors ##################################################
+# FIXME: parsertools.py ? parser/__init__.py ?
+
+class NameToken(Token):
+    """A token that is not a keyword"""
+    def __init__(self, keywords=None ):
+        Token.__init__(self, "NAME")
+        self.keywords = keywords
+
+    def match(self, source, builder):
+        """Matches a token.
+        the default implementation is to match any token whose type
+        corresponds to the object's name. You can extend Token
+        to match anything returned from the lexer. for exemple
+        type, value = source.next()
+        if type=="integer" and int(value)>=0:
+            # found
+        else:
+            # error unknown or negative integer
+        """
+        ctx = source.context()
+        tk_type, tk_value = source.next()
+        if tk_type==self.name:
+            if tk_value not in self.keywords:
+                ret = builder.token( tk_type, tk_value, source )
+                return self.debug_return( ret, tk_type, tk_value )
+        source.restore( ctx )
+        return None
+        
+
+class EBNFVisitor(object):
+    def __init__(self):
+        self.rules = {}
+        self.terminals = {}
+        self.current_rule = None
+        self.current_subrule = 0
+        self.tokens = {}
+        self.items = []
+        self.terminals['NAME'] = NameToken()
+
+    def new_name( self ):
+        rule_name = ":%s_%s" % (self.current_rule, self.current_subrule)
+        self.current_subrule += 1
+        return rule_name
+
+    def new_item( self, itm ):
+        self.items.append( itm )
+        return itm
+    
+    def visit_grammar( self, node ):
+        # print "Grammar:"
+        for rule in node.nodes:
+            rule.visit(self)
+        # the rules are registered already
+        # we do a pass through the variables to detect
+        # terminal symbols from non terminals
+        for r in self.items:
+            for i,a in enumerate(r.args):
+                if a.name in self.rules:
+                    assert isinstance(a,Token)
+                    r.args[i] = self.rules[a.name]
+                    if a.name in self.terminals:
+                        del self.terminals[a.name]
+        # XXX .keywords also contains punctuations
+        self.terminals['NAME'].keywords = self.tokens.keys()
+
+    def visit_rule( self, node ):
+        symdef = node.nodes[0].value
+        self.current_rule = symdef
+        self.current_subrule = 0
+        alt = node.nodes[1]
+        rule = alt.visit(self)
+        if not isinstance( rule, Token ):
+            rule.name = symdef
+        self.rules[symdef] = rule
+        
+    def visit_alternative( self, node ):
+        items = [ node.nodes[0].visit(self) ]
+        items+= node.nodes[1].visit(self)        
+        if len(items)==1 and items[0].name.startswith(':'):
+            return items[0]
+        alt = Alternative( self.new_name(), *items )
+        return self.new_item( alt )
+
+    def visit_sequence( self, node ):
+        """ """
+        items = []
+        for n in node.nodes:
+            items.append( n.visit(self) )
+        if len(items)==1:
+            return items[0]
+        elif len(items)>1:
+            return self.new_item( Sequence( self.new_name(), *items) )
+        raise SyntaxError("Found empty sequence")
+
+    def visit_sequence_cont( self, node ):
+        """Returns a list of sequences (possibly empty)"""
+        return [n.visit(self) for n in node.nodes]
+##         L = []
+##         for n in node.nodes:
+##             L.append( n.visit(self) )
+##         return L
+
+    def visit_seq_cont_list(self, node):
+        return node.nodes[1].visit(self)
+    
+
+    def visit_symbol(self, node):
+        star_opt = node.nodes[1]
+        sym = node.nodes[0].value
+        terminal = self.terminals.get( sym )
+        if not terminal:
+            terminal = Token( sym )
+            self.terminals[sym] = terminal
+
+        return self.repeat( star_opt, terminal )
+
+    def visit_option( self, node ):
+        rule = node.nodes[1].visit(self)
+        return self.new_item( KleenStar( self.new_name(), 0, 1, rule ) )
+
+    def visit_group( self, node ):
+        rule = node.nodes[1].visit(self)
+        return self.repeat( node.nodes[3], rule )
+
+    def visit_STRING( self, node ):
+        value = node.value
+        tok = self.tokens.get(value)
+        if not tok:
+            if py_punct.match( value ):
+                tok = Token( value )
+            elif py_name.match( value ):
+                tok = Token('NAME', value)
+            else:
+                raise SyntaxError("Unknown STRING value ('%s')" % value )
+            self.tokens[value] = tok
+        return tok
+
+    def visit_sequence_alt( self, node ):
+        res = node.nodes[0].visit(self)
+        assert isinstance( res, GrammarElement )
+        return res
+
+    def repeat( self, star_opt, myrule ):
+        if star_opt.nodes:
+            rule_name = self.new_name()
+            tok = star_opt.nodes[0].nodes[0]
+            if tok.value == '+':
+                return self.new_item( KleenStar( rule_name, _min=1, rule = myrule ) )
+            elif tok.value == '*':
+                return self.new_item( KleenStar( rule_name, _min=0, rule = myrule ) )
+            else:
+                raise SyntaxError("Got symbol star_opt with value='%s'" % tok.value )
+        return myrule
+
+
+def grammar_grammar():
+    """Builds the grammar for the grammar file
+
+    Here's the description of the grammar's grammar ::
+
+      grammar: rule+
+      rule: SYMDEF alternative
+      
+      alternative: sequence ( '|' sequence )+
+      star: '*' | '+'
+      sequence: (SYMBOL star? | STRING | option | group star? )+
+      option: '[' alternative ']'
+      group: '(' alternative ')' star?    
+    """
+    # star: '*' | '+'
+    star          = Alternative( "star", Token('*'), Token('+') )
+    star_opt      = KleenStar  ( "star_opt", 0, 1, rule=star )
+
+    # rule: SYMBOL ':' alternative
+    symbol        = Sequence(    "symbol", Token('SYMBOL'), star_opt )
+    symboldef     = Token(       "SYMDEF" )
+    alternative   = Sequence(    "alternative" )
+    rule          = Sequence(    "rule", symboldef, alternative )
+
+    # grammar: rule+
+    grammar       = KleenStar(   "grammar", _min=1, rule=rule )
+
+    # alternative: sequence ( '|' sequence )*
+    sequence      = KleenStar(   "sequence", 1 )
+    seq_cont_list = Sequence(    "seq_cont_list", Token('|'), sequence )
+    sequence_cont = KleenStar(   "sequence_cont",0, rule=seq_cont_list )
+    
+    alternative.args = [ sequence, sequence_cont ]
+
+    # option: '[' alternative ']'
+    option        = Sequence(    "option", Token('['), alternative, Token(']') )
+
+    # group: '(' alternative ')'
+    group         = Sequence(    "group",  Token('('), alternative, Token(')'), star_opt )
+
+    # sequence: (SYMBOL | STRING | option | group )+
+    string = Token('STRING')
+    alt           = Alternative( "sequence_alt", symbol, string, option, group ) 
+    sequence.args = [ alt ]
+    
+    return grammar
+
+
+def parse_grammar(stream):
+    """parses the grammar file
+
+    stream : file-like object representing the grammar to parse
+    """
+    source = GrammarSource(stream.read())
+    rule = grammar_grammar()
+    builder = BaseGrammarBuilder()
+    result = rule.match(source, builder)
+    node = builder.stack[-1]
+    vis = EBNFVisitor()
+    node.visit(vis)
+    return vis
+
+
+from pprint import pprint
+if __name__ == "__main__":
+    grambuild = parse_grammar(file('../python/Grammar'))
+    for i,r in enumerate(grambuild.items):
+        print "%  3d : %s" % (i, r)
+    pprint(grambuild.terminals.keys())
+    pprint(grambuild.tokens)
+    print "|".join(grambuild.tokens.keys() )
+

Added: pypy/dist/pypy/module/parser/recparser/grammar.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/grammar.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1,296 @@
+"""
+a generic recursive descent parser
+the grammar is defined as a composition of objects
+the objects of the grammar are :
+Alternative : as in S -> A | B | C
+Sequence    : as in S -> A B C
+KleenStar   : as in S -> A* or S -> A+
+Token       : a lexer token
+"""
+
+DEBUG = 0
+
+#### Abstract interface for a lexer/tokenizer
+class TokenSource(object):
+    """Abstract base class for a source tokenizer"""
+    def context(self):
+        """Returns a context to restore the state of the object later"""
+
+    def restore(self, ctx):
+        """Restore the context"""
+
+    def next(self):
+        """Returns the next token from the source
+        a token is a tuple : (type,value) or (None,None) if the end of the
+        source has been found
+        """
+
+    def current_line(self):
+        """Returns the current line number"""
+        return 0
+
+
+######################################################################
+
+from syntaxtree import SyntaxNode, TempSyntaxNode, TokenNode
+
+class BaseGrammarBuilder(object):
+    """Base/default class for a builder"""
+    def __init__( self, rules=None, debug=0):
+        self.rules = rules or {} # a dictionary of grammar rules for debug/reference
+        self.debug = debug
+        self.stack = []
+
+    def context(self):
+        """Returns the state of the builder to be restored later"""
+        #print "Save Stack:", self.stack
+        return len(self.stack)
+
+    def restore(self, ctx):
+        del self.stack[ctx:]
+        #print "Restore Stack:", self.stack
+        
+    def alternative(self, rule, source):
+        # Do nothing, keep rule on top of the stack
+        if rule.is_root():
+            elems = self.stack[-1].expand()
+            self.stack[-1] = SyntaxNode(rule.name, source, *elems)
+            if self.debug:
+                self.stack[-1].dumpstr()
+        return True
+
+    def sequence(self, rule, source, elts_number):
+        """ """
+        items = []
+        for node in self.stack[-elts_number:]:
+            items += node.expand()
+        if rule.is_root():
+            node_type = SyntaxNode
+        else:
+            node_type = TempSyntaxNode
+        # replace N elements with 1 element regrouping them
+        if elts_number >= 1:
+            elem = node_type(rule.name, source, *items)
+            del self.stack[-elts_number:]
+            self.stack.append(elem)
+        elif elts_number == 0:
+            self.stack.append(node_type(rule.name, source))
+        if self.debug:
+            self.stack[-1].dumpstr()
+        return True
+
+    def token(self, name, value, source):
+        self.stack.append(TokenNode(name, source, value))
+        if self.debug:
+            self.stack[-1].dumpstr()
+        return True
+
+
+######################################################################
+# Grammar Elements Classes (Alternative, Sequence, KleenStar, Token) #
+######################################################################
+class GrammarElement(object):
+    """Base parser class"""
+    def __init__(self, name):
+        # the rule name
+        self.name = name
+        self.args = []
+        self._is_root = False
+
+    def is_root(self):
+        """This is a root node of the grammar, that is one that will
+        be included in the syntax tree"""
+        if self.name!=":" and self.name.startswith(":"):
+            return False
+        return True
+
+    def match(self, source, builder):
+        """Try to match a grammar rule
+
+        If next set of tokens matches this grammar element, use <builder>
+        to build an appropriate object, otherwise returns None.
+
+        /!\ If the sets of element didn't match the current grammar
+        element, then the <source> is restored as it was before the
+        call to the match() method
+        """
+        return None
+    
+    def __str__(self):
+        return self.display(0)
+
+    def __repr__(self):
+        return self.display(0)
+
+    def display(self, level):
+        """Helper function used to represent the grammar.
+        mostly used for debugging the grammar itself"""
+        return "GrammarElement"
+
+
+    def debug_return(self, ret, *args ):
+        # FIXME: use a wrapper of match() methods instead of debug_return()
+        #        to prevent additional indirection
+        if ret and DEBUG>0:
+            sargs = ",".join( [ str(i) for i in args ] )
+            print "matched %s (%s): %s" % (self.__class__.__name__, sargs, self.display() )
+        return ret
+
+class Alternative(GrammarElement):
+    """Represents an alternative in a grammar rule (as in S -> A | B | C)"""
+    def __init__(self, name, *args):
+        GrammarElement.__init__(self, name )
+        self.args = list(args)
+        for i in self.args:
+            assert isinstance( i, GrammarElement )
+
+    def match(self, source, builder):
+        """If any of the rules in self.args matches
+        returns the object built from the first rules that matches
+        """
+        if DEBUG>1:
+            print "try alt:", self.display()
+        for rule in self.args:
+            m = rule.match( source, builder )
+            if m:
+                ret = builder.alternative( self, source )
+                return self.debug_return( ret )
+        return False
+
+    def display(self, level=0):
+        if level==0:
+            name =  self.name + " -> "
+        elif not self.name.startswith(":"):
+            return self.name
+        else:
+            name = ""
+        items = [ a.display(1) for a in self.args ]
+        return name+"(" + "|".join( items ) + ")"
+        
+
+class Sequence(GrammarElement):
+    """Reprensents a Sequence in a grammar rule (as in S -> A B C)"""
+    def __init__(self, name, *args):
+        GrammarElement.__init__(self, name )
+        self.args = list(args)
+        for i in self.args:
+            assert isinstance( i, GrammarElement )
+
+    def match(self, source, builder):
+        """matches all of the symbols in order"""
+        if DEBUG>1:
+            print "try seq:", self.display()
+        ctx = source.context()
+        bctx = builder.context()
+        for rule in self.args:
+            m = rule.match(source, builder)
+            if not m:
+                # Restore needed because some rules may have been matched
+                # before the one that failed
+                source.restore(ctx)
+                builder.restore(bctx)
+                return None
+        ret = builder.sequence(self, source, len(self.args))
+        return self.debug_return( ret )
+
+    def display(self, level=0):
+        if level == 0:
+            name = self.name + " -> "
+        elif not self.name.startswith(":"):
+            return self.name
+        else:
+            name = ""
+        items = [a.display(1) for a in self.args]
+        return name + "(" + " ".join( items ) + ")"
+
+class KleenStar(GrammarElement):
+    """Represents a KleenStar in a grammar rule as in (S -> A+) or (S -> A*)"""
+    def __init__(self, name, _min = 0, _max = -1, rule=None):
+        GrammarElement.__init__( self, name )
+        self.args = [rule]
+        self.min = _min
+        if _max == 0:
+            raise ValueError("KleenStar needs max==-1 or max>1")
+        self.max = _max
+        self.star = "x"
+
+    def match(self, source, builder):
+        """matches a number of times self.args[0]. the number must be comprised
+        between self._min and self._max inclusive. -1 is used to represent infinity"""
+        if DEBUG>1:
+            print "try kle:", self.display()
+        ctx = source.context()
+        bctx = builder.context()
+        rules = 0
+        rule = self.args[0]
+        while True:
+            m = rule.match(source, builder)
+            if not m:
+                # Rule should be matched at least 'min' times
+                if rules<self.min:
+                    source.restore(ctx)
+                    builder.restore(bctx)
+                    return None
+                ret = builder.sequence(self, source, rules)
+                return self.debug_return( ret, rules )
+            rules += 1
+            if self.max>0 and rules == self.max:
+                ret = builder.sequence(self, source, rules)
+                return self.debug_return( ret, rules )
+
+    def display(self, level=0):
+        if level==0:
+            name =  self.name + " -> "
+        elif not self.name.startswith(":"):
+            return self.name
+        else:
+            name = ""
+        star = "{%d,%d}" % (self.min,self.max)
+        if self.min==0 and self.max==1:
+            star = "?"
+        elif self.min==0 and self.max==-1:
+            star = "*"
+        elif self.min==1 and self.max==-1:
+            star = "+"
+        s = self.args[0].display(1)
+        return name + "%s%s" % (s, star)
+
+            
+class Token(GrammarElement):
+    """Represents a Token in a grammar rule (a lexer token)"""
+    def __init__( self, name, value = None):
+        GrammarElement.__init__( self, name )
+        self.value = value
+
+    def match(self, source, builder):
+        """Matches a token.
+        the default implementation is to match any token whose type
+        corresponds to the object's name. You can extend Token
+        to match anything returned from the lexer. for exemple
+        type, value = source.next()
+        if type=="integer" and int(value)>=0:
+            # found
+        else:
+            # error unknown or negative integer
+        """
+        ctx = source.context()
+        tk_type, tk_value = source.next()
+        if tk_type==self.name:
+            if self.value is None:
+                ret = builder.token( tk_type, tk_value, source )
+                return self.debug_return( ret, tk_type )
+            elif self.value == tk_value:
+                ret = builder.token( tk_type, tk_value, source )
+                return self.debug_return( ret, tk_type, tk_value )
+        if DEBUG>1:
+            print "tried tok:", self.display()
+        source.restore( ctx )
+        return None
+
+    def display(self, level=0):
+        if self.value is None:
+            return "<%s>" % self.name
+        else:
+            return "<%s>=='%s'" % (self.name, self.value)
+    
+

Added: pypy/dist/pypy/module/parser/recparser/leftout/builders.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/leftout/builders.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1,185 @@
+"""DEPRECATED"""
+
+raise DeprecationWarning("This module is broken and out of date. Don't use it !")
+from grammar import BaseGrammarBuilder, Alternative, Token, Sequence, KleenStart
+
+class BuilderToken(object):
+    def __init__(self, name, value):
+        self.name = name
+        self.value = value
+
+    def __str__(self):
+        return "%s=(%s)" % (self.name, self.value)
+
+    def display(self, indent=""):
+        print indent,self.name,"=",self.value,
+        
+class BuilderRule(object):
+    def __init__(self, name, values):
+        self.name = name
+        self.values = values
+
+    def __str__(self):
+        return "%s=(%s)" % (self.name, self.values)
+
+    def display(self, indent=""):
+        print indent,self.name,'('
+        for v in self.values:
+            v.display(indent+"|  ")
+            print ","
+        print indent,')',
+
+class SimpleBuilder(object):
+    """Default builder class (print output)"""
+    def __init__(self):
+        self.gramrules = {}
+
+    def alternative( self, name, value, source ):
+        print "alt:", self.gramrules.get(name, name), "   --", source.debug()
+        #print "Alternative", name
+        return BuilderRule( name, [value] )
+
+    def sequence( self, name, values, source ):
+        print "seq:", self.gramrules.get(name, name), "   --", source.debug()
+        #print "Sequence", name
+        return BuilderRule( name, values)
+    
+    def token( self, name, value, source ):
+        print "tok:", self.gramrules.get(name, name), "   --", source.debug()
+        #print "Token", name, value
+        return BuilderToken( name, value )
+
+
+class GrammarBuilder(BaseGrammarBuilder):
+    """Builds a grammar from a grammar desc"""
+    def __init__(self):
+        self.rules = {}
+        self.terminals = {}
+        self.rule_idx = 0
+        self.items = []
+        self.tokens = {}
+
+    def alternative( self, name, source ):
+        pass
+    
+    def sequence( self, name, source, N ):
+        #print "seq:", name, "->", source.debug()
+        #print "Sequence", name
+        meth = getattr(self, "build_%s" % name, None)
+        if meth:
+            return meth(values)
+        raise RuntimeError( "symbol %s unhandled" % name )
+    
+    def token( self, name, value, source ):
+        #print "tok:", name, "->", source.debug()
+        #print "Token", name, value
+        if name=="SYMDEF":
+            return value
+        elif name=="STRING":
+            tok = self.tokens.get(value)
+            if not tok:
+                tok = Token(value)
+                self.tokens[value] = tok
+            return tok
+        elif name=="SYMBOL":
+            sym = self.terminals.get(value)
+            if not sym:
+                sym = Token(value)
+                self.terminals[value] = sym
+            return sym
+        elif name in ('*','+','(','[',']',')','|',):
+            return name
+        return BuilderToken( name, value )
+
+    def build_sequence( self, values ):
+        """sequence: sequence_alt+
+        sequence_alt: symbol | STRING | option | group star?
+        """
+        if len(values)==1:
+            return values[0]
+        if len(values)>1:
+            seq = Sequence( self.get_name(), *values )
+            self.items.append(seq)
+            debug_rule( seq )
+            return seq
+        return True
+
+    def get_name(self):
+        s = "Rule_%03d" % self.rule_idx
+        self.rule_idx += 1
+        return s
+    
+    def build_rule( self, values ):
+        rule_def = values[0]
+        rule_alt = values[1]
+        if not isinstance(rule_alt,Token):
+            rule_alt.name = rule_def
+        self.rules[rule_def] = rule_alt
+        return True
+
+    def build_alternative( self, values ):
+        if len(values[1])>0:
+            alt = Alternative( self.get_name(), values[0], *values[1] )
+            debug_rule( alt )
+            self.items.append(alt)
+            return alt
+        else:
+            return values[0]
+
+    def build_star_opt( self, values ):
+        """star_opt: star?"""
+        if values:
+            return values[0]
+        else:
+            return True
+
+    def build_seq_cont_list( self, values ):
+        """seq_cont_list: '|' sequence """
+        return values[1]
+
+    def build_symbol( self, values ):
+        """symbol: SYMBOL star?"""
+        sym = values[0]
+        star = values[1]
+        if star is True:
+            return sym
+        _min = 0
+        _max = -1
+        if star=='*':
+            _min = 0
+        elif star=='+':
+            _min = 1
+        sym = KleenStar( self.get_name(), _min, _max, rule=sym )
+        sym.star = star
+        debug_rule( sym )
+        self.items.append(sym)
+        return sym
+    
+    def build_group( self, values ):
+        """group:  '(' alternative ')' star?"""
+        return self.build_symbol( [ values[1], values[3] ] )
+     
+    def build_option( self, values ):
+        """option: '[' alternative ']'"""
+        sym = KleenStar( self.get_name(), 0, 1, rule=values[1] )
+        debug_rule( sym )
+        self.items.append(sym)
+        return sym
+
+    def build_sequence_cont( self, values ):
+        """sequence_cont: seq_cont_list*"""
+        return values
+
+    def build_grammar( self, values ):
+        """ grammar: rules+"""
+        # the rules are registered already
+        # we do a pass through the variables to detect
+        # terminal symbols from non terminals
+        for r in self.items:
+            for i,a in enumerate(r.args):
+                if a.name in self.rules:
+                    assert isinstance(a,Token)
+                    r.args[i] = self.rules[a.name]
+                    if a.name in self.terminals:
+                        del self.terminals[a.name]
+

Added: pypy/dist/pypy/module/parser/recparser/leftout/compiler.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/leftout/compiler.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1,19 @@
+
+
+class CompileContext(object):
+    pass
+
+class CompilerVisitor(object):
+    def __init__(self):
+        self.com = CompileContext()
+
+    def visit_single_input( self, n ):
+        pass
+
+    def visit_file_input( self, n ):
+        pass
+
+    
+
+    
+

Added: pypy/dist/pypy/module/parser/recparser/leftout/gen_ast.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/leftout/gen_ast.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1,26 @@
+
+
+
+
+
+from pgen import grammar_grammar, GrammarSource, GrammarVisitor
+from grammar import BaseGrammarBuilder
+
+
+def parse_grammar( fic ):
+    src = GrammarSource( fic )
+    rule = grammar_grammar()
+    builder = BaseGrammarBuilder()
+    result = rule.match( src, builder )
+    return builder
+
+if __name__ == "__main__":
+    import sys
+    fic = file('Grammar','r')
+    grambuild = parse_grammar( fic )
+    print grambuild.stack
+    node = grambuild.stack[-1]
+    vis = GrammarVisitor()
+    node.visit(vis)
+    for i,r in enumerate(vis.items):
+        print "%  3d : %s" % (i, r)

Added: pypy/dist/pypy/module/parser/recparser/leftout/parse_grammar.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/leftout/parse_grammar.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1,29 @@
+
+
+from pgen import grammar_grammar, GrammarSource, GrammarVisitor
+from grammar import BaseGrammarBuilder
+
+
+
+
+def parse_grammar( fic ):
+    src = GrammarSource( fic )
+    rule = grammar_grammar()
+    builder = BaseGrammarBuilder()
+    result = rule.match( src, builder )
+    if not result:
+        print src.debug()
+        raise SyntaxError("at %s" % src.debug() )
+    return builder
+
+if __name__ == "__main__":
+    import sys
+    fic = file('Grammar','r')
+    grambuild = parse_grammar( fic )
+    print grambuild.stack
+    node = grambuild.stack[-1]
+    vis = GrammarVisitor()
+    node.visit(vis)
+    for i,r in enumerate(vis.items):
+        print "%  3d : %s" % (i, r)
+

Added: pypy/dist/pypy/module/parser/recparser/leftout/pgen.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/leftout/pgen.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1,480 @@
+#
+# Generate a Python Syntax analyser from the Python's grammar
+# The grammar comes from the Grammar file in Python source tree
+# 
+from pylexer import PythonSource
+import pylexer
+DEBUG=0
+
+class BuilderToken(object):
+    def __init__(self, name, value):
+        self.name = name
+        self.value = value
+
+    def __str__(self):
+        return "%s=(%s)" % (self.name, self.value)
+
+    def display(self, indent=""):
+        print indent,self.name,"=",self.value,
+        
+class BuilderRule(object):
+    def __init__(self, name, values):
+        self.name = name
+        self.values = values
+
+    def __str__(self):
+        return "%s=(%s)" % (self.name, self.values)
+
+    def display(self, indent=""):
+        print indent,self.name,'('
+        for v in self.values:
+            v.display(indent+"|  ")
+            print ","
+        print indent,')',
+
+class SimpleBuilder(object):
+    """Default builder class (print output)"""
+    def __init__(self):
+        self.gramrules = {}
+
+    def alternative( self, name, value, source ):
+        print "alt:", self.gramrules.get(name, name), "   --", source.debug()
+        #print "Alternative", name
+        return BuilderRule( name, [value] )
+
+    def sequence( self, name, values, source ):
+        print "seq:", self.gramrules.get(name, name), "   --", source.debug()
+        #print "Sequence", name
+        return BuilderRule( name, values)
+    
+    def token( self, name, value, source ):
+        print "tok:", self.gramrules.get(name, name), "   --", source.debug()
+        #print "Token", name, value
+        return BuilderToken( name, value )
+        
+
+import re
+import grammar
+from grammar import Token, Alternative, KleenStar, Sequence, TokenSource, BaseGrammarBuilder, Proxy, Pgen
+
+g_symdef = re.compile(r"[a-zA-Z_][a-zA-Z0-9_]*:",re.M)
+g_symbol = re.compile(r"[a-zA-Z_][a-zA-Z0-9_]*",re.M)
+g_string = re.compile(r"'[^']+'",re.M)
+g_tok = re.compile(r"\[|\]|\(|\)|\*|\+|\|",re.M)
+g_skip = re.compile(r"\s*(#.*$)?",re.M)
+
+class GrammarSource(TokenSource):
+    """The grammar tokenizer"""
+    def __init__(self, inpstream ):
+        TokenSource.__init__(self)
+        self.input = inpstream.read()
+        self.pos = 0
+
+    def context(self):
+        return self.pos
+
+    def restore(self, ctx ):
+        self.pos = ctx
+
+    def next(self):
+        pos = self.pos
+        inp = self.input
+        m = g_skip.match(inp, pos)
+        while m and pos!=m.end():
+            pos = m.end()
+            if pos==len(inp):
+                self.pos = pos
+                return None, None
+            m = g_skip.match(inp, pos)
+        m = g_symdef.match(inp,pos)
+        if m:
+            tk = m.group(0)
+            self.pos = m.end()
+            return 'SYMDEF',tk[:-1]
+        m = g_tok.match(inp,pos)
+        if m:
+            tk = m.group(0)
+            self.pos = m.end()
+            return tk,tk
+        m = g_string.match(inp,pos)
+        if m:
+            tk = m.group(0)
+            self.pos = m.end()
+            return 'STRING',tk[1:-1]
+        m = g_symbol.match(inp,pos)
+        if m:
+            tk = m.group(0)
+            self.pos = m.end()
+            return 'SYMBOL',tk
+        raise ValueError("Unknown token at pos=%d context='%s'" % (pos,inp[pos:pos+20]) )
+
+    def debug(self):
+        return self.input[self.pos:self.pos+20]
+
+def debug_rule( rule ):
+    nm = rule.__class__.__name__
+    print nm, rule.name, "->",
+    if nm=='KleenStar':
+        print "(%d,%d,%s)" % (rule.min, rule.max, rule.star),
+    for x in rule.args:
+        print x.name,
+    print
+
+def debug_rule( *args ):
+    pass
+
+
+class GrammarBuilder(BaseGrammarBuilder):
+    """Builds a grammar from a grammar desc"""
+    def __init__(self):
+        self.rules = {}
+        self.terminals = {}
+        self.rule_idx = 0
+        self.items = []
+        self.tokens = {}
+
+    def alternative( self, name, source ):
+        pass
+    
+    def sequence( self, name, source, N ):
+        #print "seq:", name, "->", source.debug()
+        #print "Sequence", name
+        meth = getattr(self, "build_%s" % name, None)
+        if meth:
+            return meth(values)
+        raise RuntimeError( "symbol %s unhandled" % name )
+    
+    def token( self, name, value, source ):
+        #print "tok:", name, "->", source.debug()
+        #print "Token", name, value
+        if name=="SYMDEF":
+            return value
+        elif name=="STRING":
+            tok = self.tokens.get(value)
+            if not tok:
+                tok = Token(value)
+                self.tokens[value] = tok
+            return tok
+        elif name=="SYMBOL":
+            sym = self.terminals.get(value)
+            if not sym:
+                sym = Token(value)
+                self.terminals[value] = sym
+            return sym
+        elif name in ('*','+','(','[',']',')','|',):
+            return name
+        return BuilderToken( name, value )
+
+    def build_sequence( self, values ):
+        """sequence: sequence_alt+
+        sequence_alt: symbol | STRING | option | group star?
+        """
+        if len(values)==1:
+            return values[0]
+        if len(values)>1:
+            seq = Sequence( self.get_name(), *values )
+            self.items.append(seq)
+            debug_rule( seq )
+            return seq
+        return True
+
+    def get_name(self):
+        s = "Rule_%03d" % self.rule_idx
+        self.rule_idx += 1
+        return s
+    
+    def build_rule( self, values ):
+        rule_def = values[0]
+        rule_alt = values[1]
+        if not isinstance(rule_alt,Token):
+            rule_alt.name = rule_def
+        self.rules[rule_def] = rule_alt
+        return True
+
+    def build_alternative( self, values ):
+        if len(values[1])>0:
+            alt = Alternative( self.get_name(), values[0], *values[1] )
+            debug_rule( alt )
+            self.items.append(alt)
+            return alt
+        else:
+            return values[0]
+
+    def build_star_opt( self, values ):
+        """star_opt: star?"""
+        if values:
+            return values[0]
+        else:
+            return True
+
+    def build_seq_cont_list( self, values ):
+        """seq_cont_list: '|' sequence """
+        return values[1]
+
+    def build_symbol( self, values ):
+        """symbol: SYMBOL star?"""
+        sym = values[0]
+        star = values[1]
+        if star is True:
+            return sym
+        _min = 0
+        _max = -1
+        if star=='*':
+            _min = 0
+        elif star=='+':
+            _min = 1
+        sym = KleenStar( self.get_name(), _min, _max, rule=sym )
+        sym.star = star
+        debug_rule( sym )
+        self.items.append(sym)
+        return sym
+    
+    def build_group( self, values ):
+        """group:  '(' alternative ')' star?"""
+        return self.build_symbol( [ values[1], values[3] ] )
+     
+    def build_option( self, values ):
+        """option: '[' alternative ']'"""
+        sym = KleenStar( self.get_name(), 0, 1, rule=values[1] )
+        debug_rule( sym )
+        self.items.append(sym)
+        return sym
+
+    def build_sequence_cont( self, values ):
+        """sequence_cont: seq_cont_list*"""
+        return values
+
+    def build_grammar( self, values ):
+        """ grammar: rules+"""
+        # the rules are registered already
+        # we do a pass through the variables to detect
+        # terminal symbols from non terminals
+        for r in self.items:
+            for i,a in enumerate(r.args):
+                if a.name in self.rules:
+                    assert isinstance(a,Token)
+                    r.args[i] = self.rules[a.name]
+                    if a.name in self.terminals:
+                        del self.terminals[a.name]
+
+
+class GrammarVisitor(object):
+    def __init__(self):
+        self.rules = {}
+        self.terminals = {}
+        self.current_rule = None
+        self.current_subrule = 0
+        self.tokens = {}
+        self.items = []
+
+    def new_name( self ):
+        rule_name = ":%s_%s" % (self.current_rule, self.current_subrule)
+        self.current_subrule += 1
+        return rule_name
+
+    def new_item( self, itm ):
+        self.items.append( itm )
+        return itm
+    
+    def visit_grammar( self, node ):
+        print "Grammar:"
+        for rule in node.nodes:
+            rule.visit(self)
+        # the rules are registered already
+        # we do a pass through the variables to detect
+        # terminal symbols from non terminals
+        for r in self.items:
+            for i,a in enumerate(r.args):
+                if a.name in self.rules:
+                    assert isinstance(a,Token)
+                    r.args[i] = self.rules[a.name]
+                    if a.name in self.terminals:
+                        del self.terminals[a.name]
+
+    def visit_rule( self, node ):
+        symdef = node.nodes[0].value
+        self.current_rule = symdef
+        self.current_subrule = 0
+        alt = node.nodes[1]
+        rule = alt.visit(self)
+        if not isinstance( rule, Token ):
+            rule.name = symdef
+        self.rules[symdef] = rule
+        
+    def visit_alternative( self, node ):
+        items = [ node.nodes[0].visit(self) ]
+        items+= node.nodes[1].visit(self)        
+        if len(items)==1:
+            return items[0]
+        alt = Alternative( self.new_name(), *items )
+        return self.new_item( alt )
+
+    def visit_sequence( self, node ):
+        """ """
+        items = []
+        for n in node.nodes:
+            items.append( n.visit(self) )
+        if len(items)==1:
+            return items[0]
+        elif len(items)>1:
+            return self.new_item( Sequence( self.new_name(), *items) )
+        raise SyntaxError("Found empty sequence")
+
+    def visit_sequence_cont( self, node ):
+        """Returns a list of sequences (possibly empty)"""
+        L = []
+        for n in node.nodes:
+            L.append( n.visit(self) )
+        return L
+
+    def visit_seq_cont_list( self, node ):
+        return node.nodes[1].visit(self)
+    
+
+    def visit_symbol( self, node ):
+        star_opt = node.nodes[1]
+        sym = node.nodes[0].value
+        terminal = self.terminals.get( sym )
+        if not terminal:
+            terminal = Token( sym )
+            self.terminals[sym] = terminal
+
+        return self.repeat( star_opt, terminal )
+
+    def visit_option( self, node ):
+        rule = node.nodes[1].visit(self)
+        return self.new_item( KleenStar( self.new_name(), 0, 1, rule ) )
+
+    def visit_group( self, node ):
+        rule = node.nodes[1].visit(self)
+        return self.repeat( node.nodes[3], rule )
+
+    def visit_STRING( self, node ):
+        value = node.value
+        tok = self.tokens.get(value)
+        if not tok:
+            if pylexer.py_punct.match( value ):
+                tok = Token( value )
+            elif pylexer.py_name.match( value ):
+                tok = Token('NAME',value)
+            else:
+                raise SyntaxError("Unknown STRING value ('%s')" % value )
+            self.tokens[value] = tok
+        return tok
+
+    def visit_sequence_alt( self, node ):
+        res = node.nodes[0].visit(self)
+        assert isinstance( res, Pgen )
+        return res
+
+    def repeat( self, star_opt, myrule ):
+        if star_opt.nodes:
+            rule_name = self.new_name()
+            tok = star_opt.nodes[0].nodes[0]
+            if tok.value == '+':
+                return self.new_item( KleenStar( rule_name, _min=1, rule = myrule ) )
+            elif tok.value == '*':
+                return self.new_item( KleenStar( rule_name, _min=0, rule = myrule ) )
+            else:
+                raise SyntaxError("Got symbol star_opt with value='%s'" % tok.value )
+        return myrule
+        
+    
+_grammar = """
+grammar: rule+
+rule: SYMDEF alternative
+
+alternative: sequence ( '|' sequence )+
+star: '*' | '+'
+sequence: (SYMBOL star? | STRING | option | group star? )+
+option: '[' alternative ']'
+group: '(' alternative ')' star?
+"""
+def grammar_grammar():
+    """Builds the grammar for the grammar file
+    """
+    # star: '*' | '+'
+    star          = Alternative( "star", Token('*'), Token('+') )
+    star_opt      = KleenStar  ( "star_opt", 0, 1, rule=star )
+
+    # rule: SYMBOL ':' alternative
+    symbol        = Sequence(    "symbol", Token('SYMBOL'), star_opt )
+    symboldef     = Token(       "SYMDEF" )
+    alternative   = Sequence(    "alternative" )
+    rule          = Sequence(    "rule", symboldef, alternative )
+
+    # grammar: rule+
+    grammar       = KleenStar(   "grammar", _min=1, rule=rule )
+
+    # alternative: sequence ( '|' sequence )*
+    sequence      = KleenStar(   "sequence", 1 )
+    seq_cont_list = Sequence(    "seq_cont_list", Token('|'), sequence )
+    sequence_cont = KleenStar(   "sequence_cont",0, rule=seq_cont_list )
+    
+    alternative.args = [ sequence, sequence_cont ]
+
+    # option: '[' alternative ']'
+    option        = Sequence(    "option", Token('['), alternative, Token(']') )
+
+    # group: '(' alternative ')'
+    group         = Sequence(    "group",  Token('('), alternative, Token(')'), star_opt )
+
+    # sequence: (SYMBOL | STRING | option | group )+
+    string = Token('STRING')
+    alt           = Alternative( "sequence_alt", symbol, string, option, group ) 
+    sequence.args = [ alt ]
+    
+    return grammar
+
+
+def parse_python( pyf, gram ):
+    target = gram.rules['file_input']
+    src = PythonSource( pyf.read() )
+    builder = BaseGrammarBuilder(debug=False, rules=gram.rules)
+    #    for r in gram.items:
+    #        builder.gramrules[r.name] = rg
+    result = target.match( src, builder )
+    print result, builder.stack
+    if not result:
+        print src.debug()
+        raise SyntaxError("at %s" % src.debug() )
+    return builder
+    
+
+from pprint import pprint
+def parse_grammar( fic ):
+    src = GrammarSource( fic )
+    rule = grammar_grammar()
+    builder = BaseGrammarBuilder()
+    result = rule.match( src, builder )
+    node = builder.stack[-1]
+    vis = GrammarVisitor()
+    node.visit(vis)
+
+    return vis
+
+
+if __name__ == "__main__":
+    grammar.DEBUG = False
+    import sys
+    fic = file('Grammar','r')
+    grambuild = parse_grammar( fic )
+    if len(sys.argv)>1:
+        print "-"*20
+        print
+        pyf = file(sys.argv[1],'r')
+        DEBUG = 0
+        builder = parse_python( pyf, grambuild )
+        #print "**", builder.stack
+        if builder.stack:
+            print builder.stack[-1].dumpstr()
+            tp1 = builder.stack[-1]
+            import parser
+            tp2 = parser.suite( file(sys.argv[1]).read() )
+        
+    else:
+        for i,r in enumerate(grambuild.items):
+            print "%  3d : %s" % (i, r)
+        pprint(grambuild.terminals.keys())
+        pprint(grambuild.tokens)
+        print "|".join(grambuild.tokens.keys() )

Added: pypy/dist/pypy/module/parser/recparser/python/Grammar2.3
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/python/Grammar2.3	Mon Apr 25 16:03:44 2005
@@ -0,0 +1,108 @@
+# Grammar for Python
+
+# Note:  Changing the grammar specified in this file will most likely
+#        require corresponding changes in the parser module
+#        (../Modules/parsermodule.c).  If you can't make the changes to
+#        that module yourself, please co-ordinate the required changes
+#        with someone who can; ask around on python-dev for help.  Fred
+#        Drake <fdrake at acm.org> will probably be listening there.
+
+# Commands for Kees Blom's railroad program
+#diagram:token NAME
+#diagram:token NUMBER
+#diagram:token STRING
+#diagram:token NEWLINE
+#diagram:token ENDMARKER
+#diagram:token INDENT
+#diagram:output\input python.bla
+#diagram:token DEDENT
+#diagram:output\textwidth 20.04cm\oddsidemargin  0.0cm\evensidemargin 0.0cm
+#diagram:rules
+
+# Start symbols for the grammar:
+#	single_input is a single interactive statement;
+#	file_input is a module or sequence of commands read from an input file;
+#	eval_input is the input for the eval() and input() functions.
+# NB: compound_stmt in single_input is followed by extra NEWLINE!
+single_input: NEWLINE | simple_stmt | compound_stmt NEWLINE
+file_input: (NEWLINE | stmt)* ENDMARKER
+eval_input: testlist NEWLINE* ENDMARKER
+
+funcdef: 'def' NAME parameters ':' suite
+parameters: '(' [varargslist] ')'
+varargslist: (fpdef ['=' test] ',')* ('*' NAME [',' '**' NAME] | '**' NAME) | fpdef ['=' test] (',' fpdef ['=' test])* [',']
+fpdef: NAME | '(' fplist ')'
+fplist: fpdef (',' fpdef)* [',']
+
+stmt: simple_stmt | compound_stmt
+simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE
+small_stmt: expr_stmt | print_stmt  | del_stmt | pass_stmt | flow_stmt | import_stmt | global_stmt | exec_stmt | assert_stmt
+expr_stmt: testlist (augassign testlist | ('=' testlist)*)
+augassign: '+=' | '-=' | '*=' | '/=' | '%=' | '&=' | '|=' | '^=' | '<<=' | '>>=' | '**=' | '//='
+# For normal assignments, additional restrictions enforced by the interpreter
+print_stmt: 'print' ( '>>' test [ (',' test)+ [','] ] | [ test (',' test)* [','] ] )
+del_stmt: 'del' exprlist
+pass_stmt: 'pass'
+flow_stmt: break_stmt | continue_stmt | return_stmt | raise_stmt | yield_stmt
+break_stmt: 'break'
+continue_stmt: 'continue'
+return_stmt: 'return' [testlist]
+yield_stmt: 'yield' testlist
+raise_stmt: 'raise' [test [',' test [',' test]]]
+import_stmt: 'import' dotted_as_name (',' dotted_as_name)* | 'from' dotted_name 'import' ('*' | import_as_name (',' import_as_name)*)
+import_as_name: NAME [NAME NAME]
+dotted_as_name: dotted_name [NAME NAME]
+dotted_name: NAME ('.' NAME)*
+global_stmt: 'global' NAME (',' NAME)*
+exec_stmt: 'exec' expr ['in' test [',' test]]
+assert_stmt: 'assert' test [',' test]
+
+compound_stmt: if_stmt | while_stmt | for_stmt | try_stmt | funcdef | classdef
+if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]
+while_stmt: 'while' test ':' suite ['else' ':' suite]
+for_stmt: 'for' exprlist 'in' testlist ':' suite ['else' ':' suite]
+try_stmt: ('try' ':' suite (except_clause ':' suite)+ #diagram:break
+           ['else' ':' suite] | 'try' ':' suite 'finally' ':' suite)
+# NB compile.c makes sure that the default except clause is last
+except_clause: 'except' [test [',' test]]
+suite: simple_stmt | NEWLINE INDENT stmt+ DEDENT
+
+test: and_test ('or' and_test)* | lambdef
+and_test: not_test ('and' not_test)*
+not_test: 'not' not_test | comparison
+comparison: expr (comp_op expr)*
+comp_op: '<'|'>'|'=='|'>='|'<='|'<>'|'!='|'in'|'not' 'in'|'is' 'not'|'is'
+expr: xor_expr ('|' xor_expr)*
+xor_expr: and_expr ('^' and_expr)*
+and_expr: shift_expr ('&' shift_expr)*
+shift_expr: arith_expr (('<<'|'>>') arith_expr)*
+arith_expr: term (('+'|'-') term)*
+term: factor (('*'|'/'|'%'|'//') factor)*
+factor: ('+'|'-'|'~') factor | power
+power: atom trailer* ['**' factor]
+atom: '(' [testlist] ')' | '[' [listmaker] ']' | '{' [dictmaker] '}' | '`' testlist1 '`' | NAME | NUMBER | STRING+
+listmaker: test ( list_for | (',' test)* [','] )
+lambdef: 'lambda' [varargslist] ':' test
+trailer: '(' ')' | '(' arglist ')' | '[' subscriptlist ']' | '.' NAME
+subscriptlist: subscript (',' subscript)* [',']
+subscript: '.' '.' '.' | [test] ':' [test] [sliceop] | test
+sliceop: ':' [test]
+exprlist: expr (',' expr)* [',']
+testlist: test (',' test)* [',']
+testlist_safe: test [(',' test)+ [',']]
+dictmaker: test ':' test (',' test ':' test)* [',']
+
+classdef: 'class' NAME ['(' testlist ')'] ':' suite
+
+# arglist: (argument ',')* (argument [',']| '*' test [',' '**' test] | '**' test)
+arglist: (argument ',')* ( '*' test [',' '**' test] | '**' test | argument | [argument ','] )
+argument: [test '='] test	# Really [keyword '='] test
+
+list_iter: list_for | list_if
+list_for: 'for' exprlist 'in' testlist_safe [list_iter]
+list_if: 'if' test [list_iter]
+
+testlist1: test (',' test)*
+
+# not used in grammar, but may appear in "node" passed from Parser to Compiler
+encoding_decl: NAME

Added: pypy/dist/pypy/module/parser/recparser/python/Grammar2.4
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/python/Grammar2.4	Mon Apr 25 16:03:44 2005
@@ -0,0 +1,118 @@
+# Grammar for Python
+
+# Note:  Changing the grammar specified in this file will most likely
+#        require corresponding changes in the parser module
+#        (../Modules/parsermodule.c).  If you can't make the changes to
+#        that module yourself, please co-ordinate the required changes
+#        with someone who can; ask around on python-dev for help.  Fred
+#        Drake <fdrake at acm.org> will probably be listening there.
+
+# Commands for Kees Blom's railroad program
+#diagram:token NAME
+#diagram:token NUMBER
+#diagram:token STRING
+#diagram:token NEWLINE
+#diagram:token ENDMARKER
+#diagram:token INDENT
+#diagram:output\input python.bla
+#diagram:token DEDENT
+#diagram:output\textwidth 20.04cm\oddsidemargin  0.0cm\evensidemargin 0.0cm
+#diagram:rules
+
+# Start symbols for the grammar:
+#	single_input is a single interactive statement;
+#	file_input is a module or sequence of commands read from an input file;
+#	eval_input is the input for the eval() and input() functions.
+# NB: compound_stmt in single_input is followed by extra NEWLINE!
+single_input: NEWLINE | simple_stmt | compound_stmt NEWLINE
+file_input: (NEWLINE | stmt)* ENDMARKER
+eval_input: testlist NEWLINE* ENDMARKER
+
+decorator: '@' dotted_name [ '(' [arglist] ')' ] NEWLINE
+decorators: decorator+
+funcdef: [decorators] 'def' NAME parameters ':' suite
+parameters: '(' [varargslist] ')'
+varargslist: (fpdef ['=' test] ',')* ('*' NAME [',' '**' NAME] | '**' NAME) | fpdef ['=' test] (',' fpdef ['=' test])* [',']
+fpdef: NAME | '(' fplist ')'
+fplist: fpdef (',' fpdef)* [',']
+
+stmt: simple_stmt | compound_stmt
+simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE
+small_stmt: expr_stmt | print_stmt  | del_stmt | pass_stmt | flow_stmt | import_stmt | global_stmt | exec_stmt | assert_stmt
+expr_stmt: testlist (augassign testlist | ('=' testlist)*)
+augassign: '+=' | '-=' | '*=' | '/=' | '%=' | '&=' | '|=' | '^=' | '<<=' | '>>=' | '**=' | '//='
+# For normal assignments, additional restrictions enforced by the interpreter
+print_stmt: 'print' ( '>>' test [ (',' test)+ [','] ] | [ test (',' test)* [','] ] )
+del_stmt: 'del' exprlist
+pass_stmt: 'pass'
+flow_stmt: break_stmt | continue_stmt | return_stmt | raise_stmt | yield_stmt
+break_stmt: 'break'
+continue_stmt: 'continue'
+return_stmt: 'return' [testlist]
+yield_stmt: 'yield' testlist
+raise_stmt: 'raise' [test [',' test [',' test]]]
+import_stmt: import_name | import_from
+import_name: 'import' dotted_as_names
+import_from: 'from' dotted_name 'import' ('*' | '(' import_as_names ')' | import_as_names)
+import_as_name: NAME [NAME NAME]
+dotted_as_name: dotted_name [NAME NAME]
+import_as_names: import_as_name (',' import_as_name)* [',']
+dotted_as_names: dotted_as_name (',' dotted_as_name)*
+dotted_name: NAME ('.' NAME)*
+global_stmt: 'global' NAME (',' NAME)*
+exec_stmt: 'exec' expr ['in' test [',' test]]
+assert_stmt: 'assert' test [',' test]
+
+compound_stmt: if_stmt | while_stmt | for_stmt | try_stmt | funcdef | classdef
+if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]
+while_stmt: 'while' test ':' suite ['else' ':' suite]
+for_stmt: 'for' exprlist 'in' testlist ':' suite ['else' ':' suite]
+try_stmt: ('try' ':' suite (except_clause ':' suite)+ #diagram:break
+           ['else' ':' suite] | 'try' ':' suite 'finally' ':' suite)
+# NB compile.c makes sure that the default except clause is last
+except_clause: 'except' [test [',' test]]
+suite: simple_stmt | NEWLINE INDENT stmt+ DEDENT
+
+test: and_test ('or' and_test)* | lambdef
+and_test: not_test ('and' not_test)*
+not_test: 'not' not_test | comparison
+comparison: expr (comp_op expr)*
+comp_op: '<'|'>'|'=='|'>='|'<='|'<>'|'!='|'in'|'not' 'in'|'is' 'not'|'is'
+expr: xor_expr ('|' xor_expr)*
+xor_expr: and_expr ('^' and_expr)*
+and_expr: shift_expr ('&' shift_expr)*
+shift_expr: arith_expr (('<<'|'>>') arith_expr)*
+arith_expr: term (('+'|'-') term)*
+term: factor (('*'|'/'|'%'|'//') factor)*
+factor: ('+'|'-'|'~') factor | power
+power: atom trailer* ['**' factor]
+atom: '(' [testlist_gexp] ')' | '[' [listmaker] ']' | '{' [dictmaker] '}' | '`' testlist1 '`' | NAME | NUMBER | STRING+
+listmaker: test ( list_for | (',' test)* [','] )
+testlist_gexp: test ( gen_for | (',' test)* [','] )
+lambdef: 'lambda' [varargslist] ':' test
+trailer: '(' [arglist] ')' | '[' subscriptlist ']' | '.' NAME
+subscriptlist: subscript (',' subscript)* [',']
+subscript: '.' '.' '.' | [test] ':' [test] [sliceop] | test
+sliceop: ':' [test]
+exprlist: expr (',' expr)* [',']
+testlist: test (',' test)* [',']
+testlist_safe: test [(',' test)+ [',']]
+dictmaker: test ':' test (',' test ':' test)* [',']
+
+classdef: 'class' NAME ['(' testlist ')'] ':' suite
+
+arglist: (argument ',')* (argument [',']| '*' test [',' '**' test] | '**' test)
+argument: [test '='] test [gen_for] # Really [keyword '='] test
+
+list_iter: list_for | list_if
+list_for: 'for' exprlist 'in' testlist_safe [list_iter]
+list_if: 'if' test [list_iter]
+
+gen_iter: gen_for | gen_if
+gen_for: 'for' exprlist 'in' test [gen_iter]
+gen_if: 'if' test [gen_iter]
+
+testlist1: test (',' test)*
+
+# not used in grammar, but may appear in "node" passed from Parser to Compiler
+encoding_decl: NAME

Added: pypy/dist/pypy/module/parser/recparser/python/__init__.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/python/__init__.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1,60 @@
+__all__ = [ "parse_file_input", "parse_single_input", "parse_eval_input",
+            "python_grammar", "PYTHON_GRAMMAR" ]
+
+from parse import parse_file_input, parse_single_input, parse_eval_input
+import os
+import sys
+
+_ver = ".".join([str(i) for i in sys.version_info[:2]])
+PYTHON_GRAMMAR = os.path.join( os.path.dirname(__file__), "Grammar" + _ver )
+
+def python_grammar():
+    """returns a """
+    from ebnf import parse_grammar
+    level = get_debug()
+    set_debug( 0 )
+    gram = parse_grammar( file(PYTHON_GRAMMAR) )
+    set_debug( level )
+    return gram
+
+def get_debug():
+    """Return debug level"""
+    import grammar
+    return grammar.DEBUG
+
+def set_debug( level ):
+    """sets debug mode to <level>"""
+    import grammar
+    grammar.DEBUG = level
+
+
+def python_parse(filename):
+    """parse <filename> using CPython's parser module and return nested tuples
+    """
+    pyf = file(filename)
+    import parser
+    tp2 = parser.suite(pyf.read())
+    return tp2.totuple()
+
+
+def _get_encoding(builder):
+    if hasattr(builder, '_source_encoding'):
+        return builder._source_encoding
+    return None
+
+def pypy_parse(filename):
+    """parse <filename> using PyPy's parser module and return nested tuples
+    """
+    pyf = file(filename)
+    builder = parse_file_input(pyf, python_grammar())
+    pyf.close()
+    if builder.stack:
+        # print builder.stack[-1]
+        root_node = builder.stack[-1]
+        nested_tuples = root_node.totuple()
+        source_encoding = _get_encoding(builder)
+        if source_encoding is None:
+            return nested_tuples
+        else:
+            return (323, nested_tuples, source_encoding)
+    return None # XXX raise an exception instead

Added: pypy/dist/pypy/module/parser/recparser/python/lexer.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/python/lexer.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1,387 @@
+"""This is a lexer for a Python recursive descent parser
+it obeys the TokenSource interface defined for the grammar
+analyser in grammar.py
+"""
+
+from grammar import TokenSource
+
+DEBUG = False
+import re
+
+KEYWORDS = [
+    'and', 'assert', 'break', 'class', 'continue', 'def', 'del',
+    'elif', 'if', 'import', 'in', 'is', 'finally', 'for', 'from',
+    'global', 'else', 'except', 'exec', 'lambda', 'not', 'or',
+    'pass', 'print', 'raise', 'return', 'try', 'while', 'yield'
+    ]
+
+py_keywords = re.compile(r'(%s)$' % ('|'.join(KEYWORDS)), re.M | re.X)
+
+py_punct = re.compile(r"""
+<>|!=|==|~|
+<=|<<=|<<|<|
+>=|>>=|>>|>|
+\*=|\*\*=|\*\*|\*|
+//=|/=|//|/|
+%=|\^=|\|=|\+=|=|&=|-=|
+,|\^|&|\+|-|\.|%|\||
+\)|\(|;|:|@|\[|\]|`|\{|\}
+""", re.M | re.X)
+
+g_symdef = re.compile(r"[a-zA-Z_][a-zA-Z0-9_]*:", re.M)
+g_string = re.compile(r"'[^']+'", re.M)
+py_name = re.compile(r"[a-zA-Z_][a-zA-Z0-9_]*", re.M)
+py_comment = re.compile(r"#.*$|[ \t\014]*$", re.M)
+py_ws = re.compile(r" *", re.M)
+py_skip = re.compile(r"[ \t\014]*(#.*$)?", re.M)
+py_encoding = re.compile(r"coding[:=]\s*([-\w.]+)")
+# py_number = re.compile(r"0x[0-9a-z]+|[0-9]+l|([0-9]+\.[0-9]*|\.[0-9]+|[0-9]+)(e[+-]?[0-9]+)?j?||[0-9]+", re.I)
+
+# 0x[\da-f]+l matches hexadecimal numbers, possibly defined as long
+# \d+l matches and only matches long integers
+# (\d+\.\d*|\.\d+|\d+)(e[+-]?\d+)?j? matches simple integers,
+#   exponential notations and complex
+py_number = re.compile(r"""0x[\da-f]+l?|
+\d+l|
+(\d+\.\d*|\.\d+|\d+)(e[+-]?\d+)?j?
+""", re.I | re.X)
+
+def _normalize_encoding(encoding):
+    """returns normalized name for <encoding>
+
+    see dist/src/Parser/tokenizer.c 'get_normal_name()'
+    for implementation details / reference
+
+    NOTE: for now, parser.suite() raises a MemoryError when
+          a bad encoding is used. (SF bug #979739)
+    """
+    # lower() + '_' / '-' conversion
+    encoding = encoding.replace('_', '-').lower()
+    if encoding.startswith('utf-8'):
+        return 'utf-8'
+    for variant in ('latin-1', 'iso-latin-1', 'iso-8859-1'):
+        if encoding.startswith(variant):
+            return 'iso-8859-1'
+    return encoding
+
+class PythonSource(TokenSource):
+    """The Python tokenizer"""
+    def __init__(self, inpstring):
+        TokenSource.__init__(self)
+        self.input = inpstring
+        self.pos = 0
+        self.indent = 0
+        self.indentstack = [ 0 ]
+        self.atbol = True
+        self.line = 1
+        self._current_line = 1
+        self.pendin = 0 # indentation change waiting to be reported
+        self.level = 0
+        self.linestart = 0
+        self.stack = []
+        self.stack_pos = 0
+        self.comment = ''
+        self.encoding = None
+        
+    def current_line(self):
+        return self._current_line
+
+    def context(self):
+        return self.stack_pos
+
+    def restore(self, ctx):
+        self.stack_pos = ctx
+
+    def _next(self):
+        """returns the next token from source"""
+        inp = self.input
+        pos = self.pos
+        input_length = len(inp)
+        if pos >= input_length:
+            return self.end_of_file()
+        # Beginning of line
+        if self.atbol:
+            self.linestart = pos
+            col = 0
+            m = py_ws.match(inp, pos)
+            pos = m.end()
+            col = pos - self.linestart
+            self.atbol = False
+            # skip blanklines
+            m = py_comment.match(inp, pos)
+            if m:
+                if not self.comment:
+                    self.comment = m.group(0)
+                # <HACK> XXX FIXME: encoding management
+                if self.line <= 2:
+                    # self.comment can be the previous comment, so don't use it
+                    comment = m.group(0)[1:]
+                    m_enc = py_encoding.search(comment)
+                    if m_enc is not None:
+                        self.encoding = _normalize_encoding(m_enc.group(1))
+                # </HACK>
+                self.pos = m.end() + 1
+                self.line += 1
+                self.atbol = True
+                return self._next()
+            # the current block is more indented than the previous one
+            if col > self.indentstack[-1]:
+                self.indentstack.append(col)
+                return "INDENT", None
+            # the current block is less indented than the previous one
+            while col < self.indentstack[-1]:
+                self.pendin += 1
+                self.indentstack.pop(-1)
+            if col != self.indentstack[-1]:
+                raise SyntaxError("Indentation Error")
+        if self.pendin > 0:
+            self.pendin -= 1
+            return "DEDENT", None
+        m = py_skip.match(inp, pos)
+        if m.group(0)[-1:] == '\n':
+            self.line += 1
+        self.comment = m.group(1) or ''
+        pos = m.end() # always match
+        if pos >= input_length:
+            return self.end_of_file()
+        self.pos = pos
+
+        # STRING
+        c = inp[pos]
+        if c in ('r','R'):
+            if pos < input_length-1 and inp[pos+1] in ("'",'"'):
+                return self.next_string(raw=1)
+        elif c in ('u','U'):
+            if pos < input_length-1:
+                if inp[pos+1] in ("r",'R'):
+                    if pos<input_length-2 and inp[pos+2] in ("'",'"'):
+                        return self.next_string( raw = 1, uni = 1 )
+                elif inp[pos+1] in ( "'", '"' ):
+                    return self.next_string( uni = 1 )
+        elif c in ( '"', "'" ):
+            return self.next_string()
+
+        # NAME
+        m = py_name.match(inp, pos)
+        if m:
+            self.pos = m.end()
+            val = m.group(0)
+#            if py_keywords.match(val):
+#                return val, None
+            return "NAME", val
+
+        # NUMBER
+        m = py_number.match(inp, pos)
+        if m:
+            self.pos = m.end()
+            return "NUMBER", m.group(0)
+
+        # NEWLINE
+        if c == '\n':
+            self.pos += 1
+            self.line += 1
+            if self.level > 0:
+                return self._next()
+            else:
+                self.atbol = True
+                comment = self.comment
+                self.comment = ''
+                return "NEWLINE", comment
+
+        if c == '\\':
+            if pos < input_length-1 and inp[pos+1] == '\n':
+                self.pos += 2
+                return self._next()
+        
+        m = py_punct.match(inp, pos)
+        if m:
+            punct = m.group(0)
+            if punct in ( '(', '{', '[' ):
+                self.level += 1
+            if punct in ( ')', '}', ']' ):
+                self.level -= 1
+            self.pos = m.end()
+            return punct, None
+        raise SyntaxError("Unrecognized token '%s'" % inp[pos:pos+20] )
+
+    def next(self):
+        if self.stack_pos >= len(self.stack):
+            tok, val = self._next()
+            self.stack.append( (tok, val, self.line) )
+            self._current_line = self.line
+        else:
+            tok,val,line = self.stack[self.stack_pos]
+            self._current_line = line
+        self.stack_pos += 1
+        if DEBUG:
+            print "%d/%d: %s, %s" % (self.stack_pos, len(self.stack), tok, val)
+        return (tok, val)
+            
+    def end_of_file(self):
+        """return DEDENT and ENDMARKER"""
+        if len(self.indentstack) == 1:
+            self.indentstack.pop(-1)
+            return "NEWLINE", '' #self.comment
+        elif len(self.indentstack) > 1:
+            self.indentstack.pop(-1)
+            return "DEDENT", None
+        return "ENDMARKER", None
+
+
+    def next_string(self, raw=0, uni=0):
+        pos = self.pos + raw + uni
+        inp = self.input
+        quote = inp[pos]
+        qsize = 1
+        if inp[pos:pos+3] == 3*quote:
+            pos += 3
+            quote = 3*quote
+            qsize = 3
+        else:
+            pos += 1
+        while True:
+            if inp[pos:pos+qsize] == quote:
+                s = inp[self.pos:pos+qsize]
+                self.pos = pos+qsize
+                return "STRING", s
+            # FIXME : shouldn't it be inp[pos] == os.linesep ?
+            if inp[pos:pos+2] == "\n" and qsize == 1:
+                return None, None
+            if inp[pos] == "\\":
+                pos += 1
+            pos += 1
+
+    def debug(self):
+        """return context for debug information"""
+        if not hasattr(self, '_lines'):
+            # split lines only once
+            self._lines = self.input.splitlines()
+        return 'line %s : %s' % (self.line, self._lines[self.line-1])
+
+    ## ONLY refactor ideas ###########################################
+##     def _mynext(self):
+##         """returns the next token from source"""
+##         inp = self.input
+##         pos = self.pos
+##         input_length = len(inp)
+##         if pos >= input_length:
+##             return self.end_of_file()
+##         # Beginning of line
+##         if self.atbol:
+##             self.linestart = pos
+##             col = 0
+##             m = py_ws.match(inp, pos)
+##             pos = m.end()
+##             col = pos - self.linestart
+##             self.atbol = False
+##             # skip blanklines
+##             m = py_comment.match(inp, pos)
+##             if m:
+##                 self.pos = m.end() + 1
+##                 self.line += 1
+##                 self.atbol = True
+##                 return self._next()
+##             # the current block is more indented than the previous one
+##             if col > self.indentstack[-1]:
+##                 self.indentstack.append(col)
+##                 return "INDENT", None
+##             # the current block is less indented than the previous one
+##             while col < self.indentstack[-1]:
+##                 self.pendin += 1
+##                 self.indentstack.pop(-1)
+##             if col != self.indentstack[-1]:
+##                 raise SyntaxError("Indentation Error")
+##         if self.pendin > 0:
+##             self.pendin -= 1
+##             return "DEDENT", None
+##         m = py_skip.match(inp, pos)
+##         if m.group(0)[-1:] == '\n':
+##             self.line += 1
+##         pos = m.end() # always match
+##         if pos >= input_length:
+##             return self.end_of_file()
+##         self.pos = pos
+
+##         c = inp[pos]
+##         chain = (self._check_string, self._check_name, self._check_number,
+##                  self._check_newline, self._check_backslash, self._check_punct)
+##         for check_meth in chain:
+##             token_val_pair = check_meth(c, pos)
+##             if token_val_pair is not None:
+##                 return token_val_pair
+        
+
+##     def _check_string(self, c, pos):
+##         inp = self.input
+##         input_length = len(inp)
+##         # STRING
+##         if c in ('r', 'R'):
+##             if pos < input_length-1 and inp[pos+1] in ("'",'"'):
+##                 return self.next_string(raw=1)
+##         elif c in ('u','U'):
+##             if pos < input_length - 1:
+##                 if inp[pos+1] in ("r", 'R'):
+##                     if pos<input_length-2 and inp[pos+2] in ("'",'"'):
+##                         return self.next_string(raw = 1, uni = 1)
+##                 elif inp[pos+1] in ( "'", '"' ):
+##                     return self.next_string(uni = 1)
+##         elif c in ( '"', "'" ):
+##             return self.next_string()
+##         return None
+
+##     def _check_name(self, c, pos):
+##         inp = self.input
+##         # NAME
+##         m = py_name.match(inp, pos)
+##         if m:
+##             self.pos = m.end()
+##             val = m.group(0)
+##             if py_keywords.match(val):
+##                 return val, None
+##             return "NAME", val
+##         return None
+
+##     def _check_number(self, c, pos):
+##         inp = self.input
+##         # NUMBER
+##         m = py_number.match(inp, pos)
+##         if m:
+##             self.pos = m.end()
+##             return "NUMBER", m.group(0)
+##         return None
+
+##     def _check_newline(self, c, pos):
+##         # NEWLINE
+##         if c == '\n':
+##             self.pos += 1
+##             self.line += 1
+##             if self.level > 0:
+##                 return self._next()
+##             else:
+##                 self.atbol = True
+##                 return "NEWLINE", None
+##         return None
+            
+##     def _check_backslash(self, c, pos):
+##         inp = self.input
+##         input_length = len(inp)
+##         if c == '\\':
+##             if pos < input_length-1 and inp[pos+1] == '\n':
+##                 self.pos += 2
+##                 return self._next()
+##         return None
+
+##     def _check_punct(self, c, pos):
+##         inp = self.input
+##         input_length = len(inp)
+##         m = py_punct.match(inp, pos)
+##         if m:
+##             punct = m.group(0)
+##             if punct in ( '(', '{' ):
+##                 self.level += 1
+##             if punct in ( ')', '}' ):
+##                 self.level -= 1
+##             self.pos = m.end()
+##             return punct, None
+##         raise SyntaxError("Unrecognized token '%s'" % inp[pos:pos+20] )
+

Added: pypy/dist/pypy/module/parser/recparser/python/parse.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/python/parse.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1,50 @@
+#!/usr/bin/env python
+from grammar import BaseGrammarBuilder
+from lexer import PythonSource
+from ebnf import parse_grammar
+from pprint import pprint
+import sys
+import python
+
+
+def parse_python_source( textsrc, gram, goal ):
+    """Parse a python source according to goal"""
+    target = gram.rules[goal]
+    src = PythonSource(textsrc)
+    builder = BaseGrammarBuilder(debug=False, rules=gram.rules)
+    result = target.match(src, builder)
+    # <HACK> XXX find a clean way to process encoding declarations
+    if src.encoding:
+        builder._source_encoding = src.encoding
+    # </HACK>
+    if not result:
+        print src.debug()
+        raise SyntaxError("at %s" % src.debug() )
+    return builder
+
+def parse_file_input(pyf, gram):
+    """Parse a python file"""
+    return parse_python_source( pyf.read(), gram, "file_input" )
+    
+def parse_single_input(textsrc, gram):
+    """Parse a python file"""
+    return parse_python_source( textsrc, gram, "single_input" )
+
+def parse_eval_input(textsrc, gram):
+    """Parse a python file"""
+    return parse_python_source( textsrc, gram, "eval_input" )
+
+if __name__ == "__main__":
+    if len(sys.argv) < 2:
+        print "python parse.py [-d N] test_file.py"
+        sys.exit(1)
+    if sys.argv[1] == "-d":
+        debug_level = int(sys.argv[2])
+        test_file = sys.argv[3]
+    else:
+        test_file = sys.argv[1]
+    print "-"*20
+    print
+    print "pyparse \n", python.pypy_parse(test_file)
+    print "parser  \n", python.python_parse(test_file)
+

Added: pypy/dist/pypy/module/parser/recparser/syntaxtree.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/syntaxtree.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1,151 @@
+import symbol
+import token
+
+TOKEN_MAP = {
+    "STRING" : token.STRING,
+    "NUMBER" : token.NUMBER,
+    "NAME" : token.NAME,
+    "NEWLINE" : token.NEWLINE,
+    "DEDENT" : token.DEDENT,
+    "ENDMARKER" : token.ENDMARKER,
+    "INDENT" : token.INDENT,
+    "NEWLINE" : token.NEWLINE,
+    "NT_OFFSET" : token.NT_OFFSET,
+    "N_TOKENS" : token.N_TOKENS,
+    "OP" : token.OP,
+    "?ERRORTOKEN" : token.ERRORTOKEN,
+    "&" : token.AMPER,
+    "&=" : token.AMPEREQUAL,
+    "`" : token.BACKQUOTE,
+    "^" : token.CIRCUMFLEX,
+    "^=" : token.CIRCUMFLEXEQUAL,
+    ":" : token.COLON,
+    "," : token.COMMA,
+    "." : token.DOT,
+    "//" : token.DOUBLESLASH,
+    "//=" : token.DOUBLESLASHEQUAL,
+    "**" : token.DOUBLESTAR,
+    "**=" : token.DOUBLESTAREQUAL,
+    "==" : token.EQEQUAL,
+    "=" : token.EQUAL,
+    ">" : token.GREATER,
+    ">=" : token.GREATEREQUAL,
+    "{" : token.LBRACE,
+    "}" : token.RBRACE,
+    "<<" : token.LEFTSHIFT,
+    "<<=" : token.LEFTSHIFTEQUAL,
+    "<" : token.LESS,
+    "<=" : token.LESSEQUAL,
+    "(" : token.LPAR,
+    "[" : token.LSQB,
+    "-=" : token.MINEQUAL,
+    "-" : token.MINUS,
+    "!=" : token.NOTEQUAL,
+    "<>" : token.NOTEQUAL,
+    "%" : token.PERCENT,
+    "%=" : token.PERCENTEQUAL,
+    "+" : token.PLUS,
+    "+=" : token.PLUSEQUAL,
+    ")" : token.RBRACE,
+    ">>" : token.RIGHTSHIFT,
+    ">>=" : token.RIGHTSHIFTEQUAL,
+    ")" : token.RPAR,
+    "]" : token.RSQB,
+    ";" : token.SEMI,
+    "/" : token.SLASH,
+    "/=" : token.SLASHEQUAL,
+    "*" : token.STAR,
+    "*=" : token.STAREQUAL,
+    "~" : token.TILDE,
+    "|" : token.VBAR,
+    "|=" : token.VBAREQUAL,
+    }
+    
+
+
+
+class SyntaxNode(object):
+    """A syntax node"""
+    def __init__(self, name, source, *args):
+        self.name = name
+        self.nodes = list(args)
+        self.lineno = source.current_line()
+        
+    def dumptree(self, treenodes, indent):
+        treenodes.append(self.name)
+        if len(self.nodes) > 1:
+            treenodes.append(" -> (\n")
+            treenodes.append(indent+" ")
+            for node in self.nodes:
+                node.dumptree(treenodes, indent+" ")
+            treenodes.append(")\n")
+            treenodes.append(indent)
+        elif len(self.nodes) == 1:
+            treenodes.append(" ->\n")
+            treenodes.append(indent+" ")
+            self.nodes[0].dumptree(treenodes, indent+" ")
+
+    def dumpstr(self):
+        treenodes = []
+        self.dumptree(treenodes, "")
+        return "".join(treenodes)
+
+    def __repr__(self):
+        return "<node [%s] at 0x%x>" % (self.name, id(self))
+
+    def __str__(self):
+        return "(%s)"  % self.name
+
+    def visit(self, visitor):
+        visit_meth = getattr(visitor, "visit_%s" % self.name, None)
+        if visit_meth:
+            return visit_meth(self)
+        # helper function for nodes that have only one subnode:
+        if len(self.nodes) == 1:
+            return self.nodes[0].visit(visitor)
+        raise RuntimeError("Unknonw Visitor for %r" % self.name)
+
+    def expand(self):
+        return [ self ]
+
+    def totuple(self):
+        l = [getattr(symbol, self.name, (0,self.name) )]
+        l += [node.totuple() for node in self.nodes]
+        return tuple(l)
+    
+
+class TempSyntaxNode(SyntaxNode):
+    """A temporary syntax node to represent intermediate rules"""
+    def expand(self):
+        return self.nodes
+
+class TokenNode(SyntaxNode):
+    """A token node"""
+    def __init__(self, name, source, value):
+        SyntaxNode.__init__(self, name, source)
+        self.value = value
+
+    def dumptree(self, treenodes, indent):
+        if self.value:
+            treenodes.append("%s='%s' (%d) " % (self.name, self.value, self.lineno))
+        else:
+            treenodes.append("'%s' (%d) " % (self.name, self.lineno))
+
+    def __repr__(self):
+        if self.value is not None:
+            return "<%s=%s>" % ( self.name, repr(self.value))
+        else:
+            return "<%s!>" % (self.name,)
+
+    def totuple(self):
+        num = TOKEN_MAP.get(self.name, -1)
+        if num == -1:
+            print "Unknown", self.name, self.value
+        if self.value is not None:
+            val = self.value
+        else:
+            if self.name not in ("NEWLINE", "INDENT", "DEDENT", "ENDMARKER"):
+                val = self.name
+            else:
+                val = self.value or ''
+        return (num, val)

Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_1.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_1.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1,3 @@
+
+x = y + 1
+

Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_2.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_2.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1,16 @@
+
+
+L = []
+print L[0:10]
+
+def f():
+    print 1
+   # commentaire foireux
+x = 1
+s = "asd"
+
+class A:
+    def f():
+        pass
+
+

Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_3.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_3.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1 @@
+a[1:]

Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_4.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_4.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1 @@
+a is not None

Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_comment.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_comment.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1,6 @@
+x = 0x1L # comment
+a = 1 # yo
+ # hello
+# world
+a = 2
+# end

Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_encoding_declaration.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_encoding_declaration.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1,2 @@
+# -*- coding: ISO-8859-1 -*-
+a = 1 # keep this statement for now (see test_only_one_comment.py)

Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_encoding_declaration2.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_encoding_declaration2.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1,3 @@
+#!/usr/bin/env python
+# coding: ISO_LATIN_1
+a = 1

Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_encoding_declaration3.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_encoding_declaration3.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1,5 @@
+
+
+# coding: ISO-8859-1
+# encoding on the third line <=> no encoding
+a = 1

Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_function_calls.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_function_calls.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1,11 @@
+f()
+f(a)
+f(a,)
+f(a,b)
+f(a, b,)
+f(*args)
+f(**kwargs)
+f(*args, **kwargs)
+f(a, *args, **kwargs)
+f(a, b, *args, **kwargs)
+a = 1

Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_generator.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_generator.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1,3 @@
+def f(n):
+    for i in range(n):
+        yield n

Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_import_statements.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_import_statements.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1,2 @@
+import os
+import os.path as osp

Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_list_comps.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_list_comps.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1,4 @@
+[i for i in range(10) if i%2 == 0]
+# same list on several lines
+[i for i in range(10)
+ if i%2 == 0]

Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_numbers.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_numbers.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1,8 @@
+a = 1
+a = -1
+a = 1.
+a = .2
+a = 1.2
+a = 1e3
+a = 1.3e4
+a = -1.3

Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_ony_one_comment.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_ony_one_comment.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1 @@
+# only one comment

Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_redirected_prints.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_redirected_prints.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1 @@
+print >> f

Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_samples.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_samples.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1,96 @@
+
+
+
+import os, os.path as osp
+import sys
+from ebnf import parse_grammar
+from python import python_parse, pypy_parse, set_debug
+from pprint import pprint
+import grammar
+grammar.DEBUG = False
+from symbol import sym_name
+
+
+def name(elt):
+    return "%s[%d]"% (sym_name.get(elt,elt),elt)
+
+def read_samples_dir():
+    return [osp.join('samples', fname) for fname in os.listdir('samples')
+            if fname.endswith('.py')]
+
+
+def print_sym_tuple( tup ):
+    print "\n(",
+    for elt in tup:
+        if type(elt)==int:
+            print name(elt),
+        elif type(elt)==str:
+            print repr(elt),
+        else:
+            print_sym_tuple(elt)
+    print ")",
+
+def assert_tuples_equal(tup1, tup2, curpos = (), disp=""):
+    if disp:
+        print "\n"+disp+"(",
+    for index, (elt1, elt2) in enumerate(zip(tup1, tup2)):
+        if disp and elt1==elt2 and type(elt1)==int:
+            print name(elt1),
+        if elt1 != elt2:
+            if type(elt1) is tuple and type(elt2) is tuple:
+                if disp:
+                    disp=disp+" "
+                assert_tuples_equal(elt1, elt2, curpos + (index,), disp)
+            print
+            print "TUP1"
+            print_sym_tuple(tup1)
+            print
+            print "TUP2"
+            print_sym_tuple(tup2)
+            
+            raise AssertionError('Found difference at %s : %s != %s' %
+                                 (curpos, name(elt1), name(elt2) ), curpos)
+    if disp:
+        print ")",
+
+def test_samples( samples ):
+    for sample in samples:
+        pypy_tuples = pypy_parse(sample)
+        python_tuples = python_parse(sample)
+        print "="*20
+        print file(sample).read()
+        print "-"*10
+        pprint(pypy_tuples)
+        print "-"*10
+        pprint(python_tuples)
+        try:
+            assert_tuples_equal( python_tuples, pypy_tuples, disp=" " )
+            assert python_tuples == pypy_tuples
+        except AssertionError,e:
+            print
+            print "python_tuples"
+            show( python_tuples, e.args[-1] )
+            print
+            print "pypy_tuples"
+            show( pypy_tuples, e.args[-1] )
+            raise
+
+
+def show( tup, idxs ):
+    for level, i in enumerate(idxs):
+        print " "*level , tup
+        tup=tup[i]
+    print tup
+
+if __name__=="__main__":
+    import getopt
+    opts, args = getopt.getopt( sys.argv[1:], "d:", [] )
+    for opt, val in opts:
+        if opt=="-d":
+            set_debug(int(val))
+    if args:
+        samples = args
+    else:
+        samples = read_samples_dir()
+
+    test_samples( samples )

Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_simple_assignment.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_simple_assignment.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1 @@
+x = 1

Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_simple_class.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_simple_class.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1,12 @@
+class A:
+    
+    def with_white_spaces_before(self):
+        pass
+
+
+    def another_method(self, foo):
+        """with a docstring
+        on several lines
+        # with a sharpsign
+        """
+        self.bar = foo

Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_simple_for_loop.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_simple_for_loop.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1,3 @@
+for x in range(10):
+   pass
+

Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_simple_in_test.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_simple_in_test.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1 @@
+x in range(10)

Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_slice.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_slice.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1 @@
+a[1:]

Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_whitespaces.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_whitespaces.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1,2 @@
+l = []
+l    .     append   (     12          )

Added: pypy/dist/pypy/module/parser/recparser/test/test_pytokenizer.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/test_pytokenizer.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1,111 @@
+import unittest
+from python.lexer import PythonSource, py_number, g_symdef, g_string, py_name, \
+     py_comment, py_ws, py_punct
+
+class TokenValPair(tuple):
+    token = 'Override me'
+    def __new__(cls, val = None):
+        return tuple.__new__(cls, (cls.token, val))
+
+TokenMap = {
+    'Equals' : "=",
+    'NonePair' : None,
+    }
+ctx = globals()
+for classname in ('Number', 'String', 'EndMarker', 'NewLine', 'Dedent', 'Name',
+                  'Equals', 'NonePair', 'SymDef', 'Symbol'):
+    classdict = {'token' : TokenMap.get(classname, classname.upper())}
+    ctx[classname] = type(classname, (TokenValPair,), classdict)
+
+
+PUNCTS = [ '>=', '<>', '!=', '<', '>', '<=', '==', '*=',
+           '//=', '%=', '^=', '<<=', '**=', '|=',
+           '+=', '>>=', '=', '&=', '/=', '-=', ',', '^',
+           '>>', '&', '+', '*', '-', '/', '.', '**',
+           '%', '<<', '//', '|', ')', '(', ';', ':',
+           '@', '[', ']', '`', '{', '}',
+           ]
+
+
+BAD_SYNTAX_STMTS = [
+    # "yo yo",
+    """for i in range(10):
+    print i
+  print 'bad dedent here'""",
+    """for i in range(10):
+  print i
+    print 'Bad indentation here'""",
+    ]
+
+def parse_source(source):
+    lexer = PythonSource(source)
+    tokens = []
+    last_token = ''
+    while last_token != 'ENDMARKER':
+        last_token, value = lexer.next()
+        tokens.append((last_token, value))
+    return tokens
+
+
+NUMBERS = [
+    '1', '1.23', '1.', '0',
+    '1L', '1l',
+    '0x12L', '0x12l', '0X12', '0x12',
+    '1j', '1J',
+    '1e2', '1.2e4',
+    '0.1', '0.', '0.12', '.2',
+    ]
+
+BAD_NUMBERS = [
+    'j', '0xg', '0xj', '0xJ',
+    ]
+
+class PythonSourceTC(unittest.TestCase):
+    """ """
+    def setUp(self):
+        pass
+
+    def test_empty_string(self):
+        """make sure defined regexps don't match empty string"""
+        rgxes = {'numbers' : py_number,
+                 'defsym'  : g_symdef,
+                 'strings' : g_string,
+                 'names'   : py_name,
+                 'punct'   : py_punct,
+                 }
+        for label, rgx in rgxes.items():
+            self.assert_(rgx.match('') is None, '%s matches empty string' % label)
+
+    def test_several_lines_list(self):
+        """tests list definition on several lines"""
+        s = """['a'
+        ]"""
+        tokens = parse_source(s)
+        self.assertEquals(tokens, [('[', None), ('STRING', "'a'"), (']', None),
+                                   ('NEWLINE', ''), ('ENDMARKER', None)])
+
+    def test_numbers(self):
+        """make sure all kind of numbers are correctly parsed"""
+        for number in NUMBERS:
+            self.assertEquals(parse_source(number)[0], ('NUMBER', number))
+            neg = '-%s' % number
+            self.assertEquals(parse_source(neg)[:2],
+                              [('-', None), ('NUMBER', number)])
+        for number in BAD_NUMBERS:
+            self.assertNotEquals(parse_source(number)[0], ('NUMBER', number))
+    
+    def test_hex_number(self):
+        tokens = parse_source("a = 0x12L")
+        self.assertEquals(tokens, [('NAME', 'a'), ('=', None),
+                                   ('NUMBER', '0x12L'), ('NEWLINE', ''),
+                                   ('ENDMARKER', None)])
+        
+    def test_punct(self):
+        for pstr in PUNCTS:
+            tokens = parse_source( pstr )
+            self.assertEqual( tokens[0][0], pstr )
+
+
+if __name__ == '__main__':
+    unittest.main()
+

Added: pypy/dist/pypy/module/parser/recparser/test/test_samples.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/test_samples.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1,95 @@
+"""test module for CPython / PyPy nested tuples comparison"""
+
+import os, os.path as osp
+import sys
+from ebnf import parse_grammar
+from python import python_parse, pypy_parse, set_debug
+from pprint import pprint
+import grammar
+grammar.DEBUG = False
+from symbol import sym_name
+
+
+def name(elt):
+    return "%s[%s]"% (sym_name.get(elt,elt),elt)
+
+def read_samples_dir():
+    return [osp.join('samples', fname) for fname in os.listdir('samples') if fname.endswith('.py')]
+
+def print_sym_tuple(nested, level=0, limit=15, names=False, trace=()):
+    buf = []
+    if level <= limit:
+        buf.append("%s(" % (" "*level))
+    else:
+        buf.append("(")
+    for index, elt in enumerate(nested):
+        # Test if debugging and if on last element of error path
+        if trace and not trace[1:] and index == trace[0]:
+            buf.append('\n----> ')
+        if type(elt) is int:
+            if names:
+                buf.append(name(elt))
+            else:
+                buf.append(str(elt))
+            buf.append(', ')
+        elif type(elt) is str:
+            buf.append(repr(elt))
+        else:
+            if level < limit:
+                buf.append('\n')
+            buf.extend(print_sym_tuple(elt, level+1, limit,
+                                       names, trace[1:]))
+    buf.append(')')
+    return buf
+
+def assert_tuples_equal(tup1, tup2, curpos = ()):
+    for index, (elt1, elt2) in enumerate(zip(tup1, tup2)):
+        if elt1 != elt2:
+            if type(elt1) is tuple and type(elt2) is tuple:
+                assert_tuples_equal(elt1, elt2, curpos + (index,))
+            raise AssertionError('Found difference at %s : %s != %s' %
+                                 (curpos, name(elt1), name(elt2) ), curpos)
+
+from time import time, clock
+def test_samples( samples ):
+    time_reports = {}
+    for sample in samples:
+        print "testing", sample
+        tstart1, cstart1 = time(), clock()
+        pypy_tuples = pypy_parse(sample)
+        tstart2, cstart2 = time(), clock()
+        python_tuples = python_parse(sample)
+        time_reports[sample] = (time() - tstart2, tstart2-tstart1, clock() - cstart2, cstart2-cstart1 )
+        #print "-"*10, "PyPy parse results", "-"*10
+        #print ''.join(print_sym_tuple(pypy_tuples, names=True))
+        #print "-"*10, "CPython parse results", "-"*10
+        #print ''.join(print_sym_tuple(python_tuples, names=True))
+        print
+        try:
+            assert_tuples_equal(pypy_tuples, python_tuples)
+        except AssertionError,e:
+            error_path = e.args[-1]
+            print "ERROR PATH =", error_path
+            print "="*80
+            print file(sample).read()
+            print "="*80
+            print "-"*10, "PyPy parse results", "-"*10
+            print ''.join(print_sym_tuple(pypy_tuples, names=True, trace=error_path))
+            print "-"*10, "CPython parse results", "-"*10
+            print ''.join(print_sym_tuple(python_tuples, names=True, trace=error_path))
+            print "Failed on (%s)" % sample
+            # raise
+    pprint(time_reports)
+
+if __name__=="__main__":
+    import getopt
+    opts, args = getopt.getopt( sys.argv[1:], "d:", [] )
+    for opt, val in opts:
+        if opt == "-d":
+            set_debug(int(val))
+    if args:
+        samples = args
+    else:
+        samples = read_samples_dir()
+
+    test_samples( samples )

Added: pypy/dist/pypy/module/parser/recparser/test/test_samples2.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/test_samples2.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1,70 @@
+"""test module for CPython / PyPy nested tuples comparison"""
+import os, os.path as osp
+from python import python_parse, pypy_parse
+from pprint import pprint
+import grammar
+grammar.DEBUG = False
+from symbol import sym_name
+
+def name(elt):
+    return "%s[%s]"% (sym_name.get(elt,elt),elt)
+
+def print_sym_tuple(nested, level=0, limit=15, names=False, trace=()):
+    buf = []
+    if level <= limit:
+        buf.append("%s(" % (" "*level))
+    else:
+        buf.append("(")
+    for index, elt in enumerate(nested):
+        # Test if debugging and if on last element of error path
+        if trace and not trace[1:] and index == trace[0]:
+            buf.append('\n----> ')
+        if type(elt) is int:
+            if names:
+                buf.append(name(elt))
+            else:
+                buf.append(str(elt))
+            buf.append(', ')
+        elif type(elt) is str:
+            buf.append(repr(elt))
+        else:
+            if level < limit:
+                buf.append('\n')
+            buf.extend(print_sym_tuple(elt, level+1, limit,
+                                       names, trace[1:]))
+    buf.append(')')
+    return buf
+
+def assert_tuples_equal(tup1, tup2, curpos = ()):
+    for index, (elt1, elt2) in enumerate(zip(tup1, tup2)):
+        if elt1 != elt2:
+            if type(elt1) is tuple and type(elt2) is tuple:
+                assert_tuples_equal(elt1, elt2, curpos + (index,))
+            raise AssertionError('Found difference at %s : %s != %s\n' %
+                                 (curpos, name(elt1), name(elt2) ), curpos)
+
+def test_samples():
+    samples_dir = osp.join(osp.dirname(__file__), 'samples')
+    for fname in os.listdir(samples_dir):
+        if not fname.endswith('.py'):
+            continue
+        abspath = osp.join(samples_dir, fname)
+        yield check_parse, abspath
+
+def check_parse(filepath):
+    pypy_tuples = pypy_parse(filepath)
+    python_tuples = python_parse(filepath)
+    try:
+        assert_tuples_equal(pypy_tuples, python_tuples)
+    except AssertionError, e:
+        error_path = e.args[-1]
+        print "ERROR PATH =", error_path
+        print "="*80
+        print file(filepath).read()
+        print "="*80
+        print "-"*10, "PyPy parse results", "-"*10
+        print ''.join(print_sym_tuple(pypy_tuples, names=True, trace=error_path))
+        print "-"*10, "CPython parse results", "-"*10
+        print ''.join(print_sym_tuple(python_tuples, names=True, trace=error_path))
+        assert False, filepath
+    

Added: pypy/dist/pypy/module/parser/recparser/tools/tokenize.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/tools/tokenize.py	Mon Apr 25 16:03:44 2005
@@ -0,0 +1,15 @@
+
+import sys
+from python.lexer import PythonSource
+
+
+def parse_file(filename):
+    f = file(filename).read()
+    src = PythonSource(f)
+    token = src.next()
+    while token!=("ENDMARKER",None) and token!=(None,None):
+        print token
+        token = src.next()
+
+if __name__ == '__main__':
+    parse_file(sys.argv[1])



More information about the Pypy-commit mailing list