[pypy-svn] r11423 - in pypy/dist/pypy/module/parser/recparser: . ebnf leftout python test test/samples tools
ludal at codespeak.net
ludal at codespeak.net
Mon Apr 25 16:03:44 CEST 2005
Author: ludal
Date: Mon Apr 25 16:03:44 2005
New Revision: 11423
Added:
pypy/dist/pypy/module/parser/recparser/
pypy/dist/pypy/module/parser/recparser/README
pypy/dist/pypy/module/parser/recparser/ebnf/
pypy/dist/pypy/module/parser/recparser/ebnf/__init__.py
pypy/dist/pypy/module/parser/recparser/ebnf/lexer.py
pypy/dist/pypy/module/parser/recparser/ebnf/parse.py
pypy/dist/pypy/module/parser/recparser/grammar.py
pypy/dist/pypy/module/parser/recparser/leftout/
pypy/dist/pypy/module/parser/recparser/leftout/builders.py
pypy/dist/pypy/module/parser/recparser/leftout/compiler.py
pypy/dist/pypy/module/parser/recparser/leftout/gen_ast.py
pypy/dist/pypy/module/parser/recparser/leftout/parse_grammar.py
pypy/dist/pypy/module/parser/recparser/leftout/pgen.py
pypy/dist/pypy/module/parser/recparser/python/
pypy/dist/pypy/module/parser/recparser/python/Grammar2.3
pypy/dist/pypy/module/parser/recparser/python/Grammar2.4
pypy/dist/pypy/module/parser/recparser/python/__init__.py
pypy/dist/pypy/module/parser/recparser/python/lexer.py
pypy/dist/pypy/module/parser/recparser/python/parse.py
pypy/dist/pypy/module/parser/recparser/syntaxtree.py
pypy/dist/pypy/module/parser/recparser/test/
pypy/dist/pypy/module/parser/recparser/test/samples/
pypy/dist/pypy/module/parser/recparser/test/samples/test_1.py
pypy/dist/pypy/module/parser/recparser/test/samples/test_2.py
pypy/dist/pypy/module/parser/recparser/test/samples/test_3.py
pypy/dist/pypy/module/parser/recparser/test/samples/test_4.py
pypy/dist/pypy/module/parser/recparser/test/samples/test_comment.py
pypy/dist/pypy/module/parser/recparser/test/samples/test_encoding_declaration.py
pypy/dist/pypy/module/parser/recparser/test/samples/test_encoding_declaration2.py
pypy/dist/pypy/module/parser/recparser/test/samples/test_encoding_declaration3.py
pypy/dist/pypy/module/parser/recparser/test/samples/test_function_calls.py
pypy/dist/pypy/module/parser/recparser/test/samples/test_generator.py
pypy/dist/pypy/module/parser/recparser/test/samples/test_import_statements.py
pypy/dist/pypy/module/parser/recparser/test/samples/test_list_comps.py
pypy/dist/pypy/module/parser/recparser/test/samples/test_numbers.py
pypy/dist/pypy/module/parser/recparser/test/samples/test_ony_one_comment.py
pypy/dist/pypy/module/parser/recparser/test/samples/test_redirected_prints.py
pypy/dist/pypy/module/parser/recparser/test/samples/test_samples.py
pypy/dist/pypy/module/parser/recparser/test/samples/test_simple_assignment.py
pypy/dist/pypy/module/parser/recparser/test/samples/test_simple_class.py
pypy/dist/pypy/module/parser/recparser/test/samples/test_simple_for_loop.py
pypy/dist/pypy/module/parser/recparser/test/samples/test_simple_in_test.py
pypy/dist/pypy/module/parser/recparser/test/samples/test_slice.py
pypy/dist/pypy/module/parser/recparser/test/samples/test_whitespaces.py
pypy/dist/pypy/module/parser/recparser/test/test_pytokenizer.py
pypy/dist/pypy/module/parser/recparser/test/test_samples.py
pypy/dist/pypy/module/parser/recparser/test/test_samples2.py
pypy/dist/pypy/module/parser/recparser/tools/
pypy/dist/pypy/module/parser/recparser/tools/tokenize.py
Log:
import into main svn
following is the partial revision info from darcs
* rewrote test_samples.py for py.test
* modified grammar for arglist
* added dummy whitespaces test
* added testcase for function various function calls
* updated test_encoding_declaration2.py to have a non-normalized encoding test case
* added encoding normalization
* tokenize.py is not used anymore
* HACK patch for encoding declarations (remove me when a better solution is found)
* added test snippets (esp. for encoding declarations)
* added regexp to check encoding declarations
* updated python/parse.py's main
* prefix each test file with 'test_'
* encoding declarations are not parsed correctly
* fixed redirected prints (print >> f) syntax errors
* unittest_pysource.py is out of date (see test/test_pytokenizer.py)
* misc tidy / removed unused imports
* added testcases for comments and "is not"
* modified official Python Grammar to remove ambiguity
* added class testcase
* removed debug output
* record first appearing comment not last
* revert comment regexp change
* added unit tests for python tokenizer
* added time info + improved test script
* added several small tests snippets
* added missing RBRACE symbol
* fixed bugs with comments, numbers and slices
* cleanup
* grammar bugfix and recursion removal in Grammar2.3
* improve grammar tree representation
* Choose between python 2.3 and python 2.4 grammar
* removed import lexers
* added python.parse
* Use the list of parsed keywords (from Grammar) instead of a hard-coded one
* added parser tests
* new tests and cleanup
* rename simple_for_loop test to simple_in_test
* make interface to tokenizer accept strings only
* reorganization
* export parse_grammar from ebnf
* add __init__.py files
* move back Grammar into python dir
* rework python.lexer
* move stuf around
* add ebnf/lexer and move TokenSource to grammar
* correct ebnf/parse
* split python parsing and ebnf grammar parsing
* new tests
* disable debugging by default
* move junk to leftout/
* Reorganize grammar.py
* Initial Revision
Added: pypy/dist/pypy/module/parser/recparser/README
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/README Mon Apr 25 16:03:44 2005
@@ -0,0 +1,8 @@
+
+this is a 'standalone' version of the parser module
+as of now it needs '.' to be in the PYTHONPATH so that eg
+import ebnf # works
+
+This should change once we figure out how to integrate properly with
+pypy and add an option to switch between the two parsers
+
Added: pypy/dist/pypy/module/parser/recparser/ebnf/__init__.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/ebnf/__init__.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1,2 @@
+
+from parse import parse_grammar
Added: pypy/dist/pypy/module/parser/recparser/ebnf/lexer.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/ebnf/lexer.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1,64 @@
+"""This is a lexer for a Python recursive descent parser
+it obeys the TokenSource interface defined for the grammar
+analyser in grammar.py
+"""
+
+import re
+from grammar import TokenSource
+
+DEBUG = False
+
+## Lexer for Python's grammar ########################################
+g_symdef = re.compile(r"[a-zA-Z_][a-zA-Z0-9_]*:",re.M)
+g_symbol = re.compile(r"[a-zA-Z_][a-zA-Z0-9_]*",re.M)
+g_string = re.compile(r"'[^']+'",re.M)
+g_tok = re.compile(r"\[|\]|\(|\)|\*|\+|\|",re.M)
+g_skip = re.compile(r"\s*(#.*$)?",re.M)
+
+class GrammarSource(TokenSource):
+ """The grammar tokenizer"""
+ def __init__(self, inpstring ):
+ TokenSource.__init__(self)
+ self.input = inpstring
+ self.pos = 0
+
+ def context(self):
+ return self.pos
+
+ def restore(self, ctx ):
+ self.pos = ctx
+
+ def next(self):
+ pos = self.pos
+ inp = self.input
+ m = g_skip.match(inp, pos)
+ while m and pos!=m.end():
+ pos = m.end()
+ if pos==len(inp):
+ self.pos = pos
+ return None, None
+ m = g_skip.match(inp, pos)
+ m = g_symdef.match(inp,pos)
+ if m:
+ tk = m.group(0)
+ self.pos = m.end()
+ return 'SYMDEF',tk[:-1]
+ m = g_tok.match(inp,pos)
+ if m:
+ tk = m.group(0)
+ self.pos = m.end()
+ return tk,tk
+ m = g_string.match(inp,pos)
+ if m:
+ tk = m.group(0)
+ self.pos = m.end()
+ return 'STRING',tk[1:-1]
+ m = g_symbol.match(inp,pos)
+ if m:
+ tk = m.group(0)
+ self.pos = m.end()
+ return 'SYMBOL',tk
+ raise ValueError("Unknown token at pos=%d context='%s'" % (pos,inp[pos:pos+20]) )
+
+ def debug(self):
+ return self.input[self.pos:self.pos+20]
Added: pypy/dist/pypy/module/parser/recparser/ebnf/parse.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/ebnf/parse.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1,253 @@
+#!/usr/bin/env python
+from grammar import BaseGrammarBuilder, Alternative, Sequence, Token, \
+ KleenStar, GrammarElement
+from lexer import GrammarSource
+
+import re
+py_name = re.compile(r"[a-zA-Z_][a-zA-Z0-9_]*", re.M)
+
+punct=['>=', '<>', '!=', '<', '>', '<=', '==', '\\*=',
+ '//=', '%=', '^=', '<<=', '\\*\\*=', '\\', '=',
+ '\\+=', '>>=', '=', '&=', '/=', '-=', '\n,', '^', '>>', '&', '\\+', '\\*', '-', '/', '\\.', '\\*\\*', '%', '<<', '//', '\\', '', '\n\\)', '\\(', ';', ':', '@', '\\[', '\\]', '`', '\\{', '\\}']
+
+py_punct = re.compile(r"""
+>=|<>|!=|<|>|<=|==|~|
+\*=|//=|%=|\^=|<<=|\*\*=|\|=|\+=|>>=|=|&=|/=|-=|
+,|\^|>>|&|\+|\*|-|/|\.|\*\*|%|<<|//|\||
+\)|\(|;|:|@|\[|\]|`|\{|\}
+""", re.M | re.X)
+
+
+TERMINALS = [
+ 'NAME', 'NUMBER', 'STRING', 'NEWLINE', 'ENDMARKER',
+ 'INDENT', 'DEDENT' ]
+
+
+## Grammar Visitors ##################################################
+# FIXME: parsertools.py ? parser/__init__.py ?
+
+class NameToken(Token):
+ """A token that is not a keyword"""
+ def __init__(self, keywords=None ):
+ Token.__init__(self, "NAME")
+ self.keywords = keywords
+
+ def match(self, source, builder):
+ """Matches a token.
+ the default implementation is to match any token whose type
+ corresponds to the object's name. You can extend Token
+ to match anything returned from the lexer. for exemple
+ type, value = source.next()
+ if type=="integer" and int(value)>=0:
+ # found
+ else:
+ # error unknown or negative integer
+ """
+ ctx = source.context()
+ tk_type, tk_value = source.next()
+ if tk_type==self.name:
+ if tk_value not in self.keywords:
+ ret = builder.token( tk_type, tk_value, source )
+ return self.debug_return( ret, tk_type, tk_value )
+ source.restore( ctx )
+ return None
+
+
+class EBNFVisitor(object):
+ def __init__(self):
+ self.rules = {}
+ self.terminals = {}
+ self.current_rule = None
+ self.current_subrule = 0
+ self.tokens = {}
+ self.items = []
+ self.terminals['NAME'] = NameToken()
+
+ def new_name( self ):
+ rule_name = ":%s_%s" % (self.current_rule, self.current_subrule)
+ self.current_subrule += 1
+ return rule_name
+
+ def new_item( self, itm ):
+ self.items.append( itm )
+ return itm
+
+ def visit_grammar( self, node ):
+ # print "Grammar:"
+ for rule in node.nodes:
+ rule.visit(self)
+ # the rules are registered already
+ # we do a pass through the variables to detect
+ # terminal symbols from non terminals
+ for r in self.items:
+ for i,a in enumerate(r.args):
+ if a.name in self.rules:
+ assert isinstance(a,Token)
+ r.args[i] = self.rules[a.name]
+ if a.name in self.terminals:
+ del self.terminals[a.name]
+ # XXX .keywords also contains punctuations
+ self.terminals['NAME'].keywords = self.tokens.keys()
+
+ def visit_rule( self, node ):
+ symdef = node.nodes[0].value
+ self.current_rule = symdef
+ self.current_subrule = 0
+ alt = node.nodes[1]
+ rule = alt.visit(self)
+ if not isinstance( rule, Token ):
+ rule.name = symdef
+ self.rules[symdef] = rule
+
+ def visit_alternative( self, node ):
+ items = [ node.nodes[0].visit(self) ]
+ items+= node.nodes[1].visit(self)
+ if len(items)==1 and items[0].name.startswith(':'):
+ return items[0]
+ alt = Alternative( self.new_name(), *items )
+ return self.new_item( alt )
+
+ def visit_sequence( self, node ):
+ """ """
+ items = []
+ for n in node.nodes:
+ items.append( n.visit(self) )
+ if len(items)==1:
+ return items[0]
+ elif len(items)>1:
+ return self.new_item( Sequence( self.new_name(), *items) )
+ raise SyntaxError("Found empty sequence")
+
+ def visit_sequence_cont( self, node ):
+ """Returns a list of sequences (possibly empty)"""
+ return [n.visit(self) for n in node.nodes]
+## L = []
+## for n in node.nodes:
+## L.append( n.visit(self) )
+## return L
+
+ def visit_seq_cont_list(self, node):
+ return node.nodes[1].visit(self)
+
+
+ def visit_symbol(self, node):
+ star_opt = node.nodes[1]
+ sym = node.nodes[0].value
+ terminal = self.terminals.get( sym )
+ if not terminal:
+ terminal = Token( sym )
+ self.terminals[sym] = terminal
+
+ return self.repeat( star_opt, terminal )
+
+ def visit_option( self, node ):
+ rule = node.nodes[1].visit(self)
+ return self.new_item( KleenStar( self.new_name(), 0, 1, rule ) )
+
+ def visit_group( self, node ):
+ rule = node.nodes[1].visit(self)
+ return self.repeat( node.nodes[3], rule )
+
+ def visit_STRING( self, node ):
+ value = node.value
+ tok = self.tokens.get(value)
+ if not tok:
+ if py_punct.match( value ):
+ tok = Token( value )
+ elif py_name.match( value ):
+ tok = Token('NAME', value)
+ else:
+ raise SyntaxError("Unknown STRING value ('%s')" % value )
+ self.tokens[value] = tok
+ return tok
+
+ def visit_sequence_alt( self, node ):
+ res = node.nodes[0].visit(self)
+ assert isinstance( res, GrammarElement )
+ return res
+
+ def repeat( self, star_opt, myrule ):
+ if star_opt.nodes:
+ rule_name = self.new_name()
+ tok = star_opt.nodes[0].nodes[0]
+ if tok.value == '+':
+ return self.new_item( KleenStar( rule_name, _min=1, rule = myrule ) )
+ elif tok.value == '*':
+ return self.new_item( KleenStar( rule_name, _min=0, rule = myrule ) )
+ else:
+ raise SyntaxError("Got symbol star_opt with value='%s'" % tok.value )
+ return myrule
+
+
+def grammar_grammar():
+ """Builds the grammar for the grammar file
+
+ Here's the description of the grammar's grammar ::
+
+ grammar: rule+
+ rule: SYMDEF alternative
+
+ alternative: sequence ( '|' sequence )+
+ star: '*' | '+'
+ sequence: (SYMBOL star? | STRING | option | group star? )+
+ option: '[' alternative ']'
+ group: '(' alternative ')' star?
+ """
+ # star: '*' | '+'
+ star = Alternative( "star", Token('*'), Token('+') )
+ star_opt = KleenStar ( "star_opt", 0, 1, rule=star )
+
+ # rule: SYMBOL ':' alternative
+ symbol = Sequence( "symbol", Token('SYMBOL'), star_opt )
+ symboldef = Token( "SYMDEF" )
+ alternative = Sequence( "alternative" )
+ rule = Sequence( "rule", symboldef, alternative )
+
+ # grammar: rule+
+ grammar = KleenStar( "grammar", _min=1, rule=rule )
+
+ # alternative: sequence ( '|' sequence )*
+ sequence = KleenStar( "sequence", 1 )
+ seq_cont_list = Sequence( "seq_cont_list", Token('|'), sequence )
+ sequence_cont = KleenStar( "sequence_cont",0, rule=seq_cont_list )
+
+ alternative.args = [ sequence, sequence_cont ]
+
+ # option: '[' alternative ']'
+ option = Sequence( "option", Token('['), alternative, Token(']') )
+
+ # group: '(' alternative ')'
+ group = Sequence( "group", Token('('), alternative, Token(')'), star_opt )
+
+ # sequence: (SYMBOL | STRING | option | group )+
+ string = Token('STRING')
+ alt = Alternative( "sequence_alt", symbol, string, option, group )
+ sequence.args = [ alt ]
+
+ return grammar
+
+
+def parse_grammar(stream):
+ """parses the grammar file
+
+ stream : file-like object representing the grammar to parse
+ """
+ source = GrammarSource(stream.read())
+ rule = grammar_grammar()
+ builder = BaseGrammarBuilder()
+ result = rule.match(source, builder)
+ node = builder.stack[-1]
+ vis = EBNFVisitor()
+ node.visit(vis)
+ return vis
+
+
+from pprint import pprint
+if __name__ == "__main__":
+ grambuild = parse_grammar(file('../python/Grammar'))
+ for i,r in enumerate(grambuild.items):
+ print "% 3d : %s" % (i, r)
+ pprint(grambuild.terminals.keys())
+ pprint(grambuild.tokens)
+ print "|".join(grambuild.tokens.keys() )
+
Added: pypy/dist/pypy/module/parser/recparser/grammar.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/grammar.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1,296 @@
+"""
+a generic recursive descent parser
+the grammar is defined as a composition of objects
+the objects of the grammar are :
+Alternative : as in S -> A | B | C
+Sequence : as in S -> A B C
+KleenStar : as in S -> A* or S -> A+
+Token : a lexer token
+"""
+
+DEBUG = 0
+
+#### Abstract interface for a lexer/tokenizer
+class TokenSource(object):
+ """Abstract base class for a source tokenizer"""
+ def context(self):
+ """Returns a context to restore the state of the object later"""
+
+ def restore(self, ctx):
+ """Restore the context"""
+
+ def next(self):
+ """Returns the next token from the source
+ a token is a tuple : (type,value) or (None,None) if the end of the
+ source has been found
+ """
+
+ def current_line(self):
+ """Returns the current line number"""
+ return 0
+
+
+######################################################################
+
+from syntaxtree import SyntaxNode, TempSyntaxNode, TokenNode
+
+class BaseGrammarBuilder(object):
+ """Base/default class for a builder"""
+ def __init__( self, rules=None, debug=0):
+ self.rules = rules or {} # a dictionary of grammar rules for debug/reference
+ self.debug = debug
+ self.stack = []
+
+ def context(self):
+ """Returns the state of the builder to be restored later"""
+ #print "Save Stack:", self.stack
+ return len(self.stack)
+
+ def restore(self, ctx):
+ del self.stack[ctx:]
+ #print "Restore Stack:", self.stack
+
+ def alternative(self, rule, source):
+ # Do nothing, keep rule on top of the stack
+ if rule.is_root():
+ elems = self.stack[-1].expand()
+ self.stack[-1] = SyntaxNode(rule.name, source, *elems)
+ if self.debug:
+ self.stack[-1].dumpstr()
+ return True
+
+ def sequence(self, rule, source, elts_number):
+ """ """
+ items = []
+ for node in self.stack[-elts_number:]:
+ items += node.expand()
+ if rule.is_root():
+ node_type = SyntaxNode
+ else:
+ node_type = TempSyntaxNode
+ # replace N elements with 1 element regrouping them
+ if elts_number >= 1:
+ elem = node_type(rule.name, source, *items)
+ del self.stack[-elts_number:]
+ self.stack.append(elem)
+ elif elts_number == 0:
+ self.stack.append(node_type(rule.name, source))
+ if self.debug:
+ self.stack[-1].dumpstr()
+ return True
+
+ def token(self, name, value, source):
+ self.stack.append(TokenNode(name, source, value))
+ if self.debug:
+ self.stack[-1].dumpstr()
+ return True
+
+
+######################################################################
+# Grammar Elements Classes (Alternative, Sequence, KleenStar, Token) #
+######################################################################
+class GrammarElement(object):
+ """Base parser class"""
+ def __init__(self, name):
+ # the rule name
+ self.name = name
+ self.args = []
+ self._is_root = False
+
+ def is_root(self):
+ """This is a root node of the grammar, that is one that will
+ be included in the syntax tree"""
+ if self.name!=":" and self.name.startswith(":"):
+ return False
+ return True
+
+ def match(self, source, builder):
+ """Try to match a grammar rule
+
+ If next set of tokens matches this grammar element, use <builder>
+ to build an appropriate object, otherwise returns None.
+
+ /!\ If the sets of element didn't match the current grammar
+ element, then the <source> is restored as it was before the
+ call to the match() method
+ """
+ return None
+
+ def __str__(self):
+ return self.display(0)
+
+ def __repr__(self):
+ return self.display(0)
+
+ def display(self, level):
+ """Helper function used to represent the grammar.
+ mostly used for debugging the grammar itself"""
+ return "GrammarElement"
+
+
+ def debug_return(self, ret, *args ):
+ # FIXME: use a wrapper of match() methods instead of debug_return()
+ # to prevent additional indirection
+ if ret and DEBUG>0:
+ sargs = ",".join( [ str(i) for i in args ] )
+ print "matched %s (%s): %s" % (self.__class__.__name__, sargs, self.display() )
+ return ret
+
+class Alternative(GrammarElement):
+ """Represents an alternative in a grammar rule (as in S -> A | B | C)"""
+ def __init__(self, name, *args):
+ GrammarElement.__init__(self, name )
+ self.args = list(args)
+ for i in self.args:
+ assert isinstance( i, GrammarElement )
+
+ def match(self, source, builder):
+ """If any of the rules in self.args matches
+ returns the object built from the first rules that matches
+ """
+ if DEBUG>1:
+ print "try alt:", self.display()
+ for rule in self.args:
+ m = rule.match( source, builder )
+ if m:
+ ret = builder.alternative( self, source )
+ return self.debug_return( ret )
+ return False
+
+ def display(self, level=0):
+ if level==0:
+ name = self.name + " -> "
+ elif not self.name.startswith(":"):
+ return self.name
+ else:
+ name = ""
+ items = [ a.display(1) for a in self.args ]
+ return name+"(" + "|".join( items ) + ")"
+
+
+class Sequence(GrammarElement):
+ """Reprensents a Sequence in a grammar rule (as in S -> A B C)"""
+ def __init__(self, name, *args):
+ GrammarElement.__init__(self, name )
+ self.args = list(args)
+ for i in self.args:
+ assert isinstance( i, GrammarElement )
+
+ def match(self, source, builder):
+ """matches all of the symbols in order"""
+ if DEBUG>1:
+ print "try seq:", self.display()
+ ctx = source.context()
+ bctx = builder.context()
+ for rule in self.args:
+ m = rule.match(source, builder)
+ if not m:
+ # Restore needed because some rules may have been matched
+ # before the one that failed
+ source.restore(ctx)
+ builder.restore(bctx)
+ return None
+ ret = builder.sequence(self, source, len(self.args))
+ return self.debug_return( ret )
+
+ def display(self, level=0):
+ if level == 0:
+ name = self.name + " -> "
+ elif not self.name.startswith(":"):
+ return self.name
+ else:
+ name = ""
+ items = [a.display(1) for a in self.args]
+ return name + "(" + " ".join( items ) + ")"
+
+class KleenStar(GrammarElement):
+ """Represents a KleenStar in a grammar rule as in (S -> A+) or (S -> A*)"""
+ def __init__(self, name, _min = 0, _max = -1, rule=None):
+ GrammarElement.__init__( self, name )
+ self.args = [rule]
+ self.min = _min
+ if _max == 0:
+ raise ValueError("KleenStar needs max==-1 or max>1")
+ self.max = _max
+ self.star = "x"
+
+ def match(self, source, builder):
+ """matches a number of times self.args[0]. the number must be comprised
+ between self._min and self._max inclusive. -1 is used to represent infinity"""
+ if DEBUG>1:
+ print "try kle:", self.display()
+ ctx = source.context()
+ bctx = builder.context()
+ rules = 0
+ rule = self.args[0]
+ while True:
+ m = rule.match(source, builder)
+ if not m:
+ # Rule should be matched at least 'min' times
+ if rules<self.min:
+ source.restore(ctx)
+ builder.restore(bctx)
+ return None
+ ret = builder.sequence(self, source, rules)
+ return self.debug_return( ret, rules )
+ rules += 1
+ if self.max>0 and rules == self.max:
+ ret = builder.sequence(self, source, rules)
+ return self.debug_return( ret, rules )
+
+ def display(self, level=0):
+ if level==0:
+ name = self.name + " -> "
+ elif not self.name.startswith(":"):
+ return self.name
+ else:
+ name = ""
+ star = "{%d,%d}" % (self.min,self.max)
+ if self.min==0 and self.max==1:
+ star = "?"
+ elif self.min==0 and self.max==-1:
+ star = "*"
+ elif self.min==1 and self.max==-1:
+ star = "+"
+ s = self.args[0].display(1)
+ return name + "%s%s" % (s, star)
+
+
+class Token(GrammarElement):
+ """Represents a Token in a grammar rule (a lexer token)"""
+ def __init__( self, name, value = None):
+ GrammarElement.__init__( self, name )
+ self.value = value
+
+ def match(self, source, builder):
+ """Matches a token.
+ the default implementation is to match any token whose type
+ corresponds to the object's name. You can extend Token
+ to match anything returned from the lexer. for exemple
+ type, value = source.next()
+ if type=="integer" and int(value)>=0:
+ # found
+ else:
+ # error unknown or negative integer
+ """
+ ctx = source.context()
+ tk_type, tk_value = source.next()
+ if tk_type==self.name:
+ if self.value is None:
+ ret = builder.token( tk_type, tk_value, source )
+ return self.debug_return( ret, tk_type )
+ elif self.value == tk_value:
+ ret = builder.token( tk_type, tk_value, source )
+ return self.debug_return( ret, tk_type, tk_value )
+ if DEBUG>1:
+ print "tried tok:", self.display()
+ source.restore( ctx )
+ return None
+
+ def display(self, level=0):
+ if self.value is None:
+ return "<%s>" % self.name
+ else:
+ return "<%s>=='%s'" % (self.name, self.value)
+
+
Added: pypy/dist/pypy/module/parser/recparser/leftout/builders.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/leftout/builders.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1,185 @@
+"""DEPRECATED"""
+
+raise DeprecationWarning("This module is broken and out of date. Don't use it !")
+from grammar import BaseGrammarBuilder, Alternative, Token, Sequence, KleenStart
+
+class BuilderToken(object):
+ def __init__(self, name, value):
+ self.name = name
+ self.value = value
+
+ def __str__(self):
+ return "%s=(%s)" % (self.name, self.value)
+
+ def display(self, indent=""):
+ print indent,self.name,"=",self.value,
+
+class BuilderRule(object):
+ def __init__(self, name, values):
+ self.name = name
+ self.values = values
+
+ def __str__(self):
+ return "%s=(%s)" % (self.name, self.values)
+
+ def display(self, indent=""):
+ print indent,self.name,'('
+ for v in self.values:
+ v.display(indent+"| ")
+ print ","
+ print indent,')',
+
+class SimpleBuilder(object):
+ """Default builder class (print output)"""
+ def __init__(self):
+ self.gramrules = {}
+
+ def alternative( self, name, value, source ):
+ print "alt:", self.gramrules.get(name, name), " --", source.debug()
+ #print "Alternative", name
+ return BuilderRule( name, [value] )
+
+ def sequence( self, name, values, source ):
+ print "seq:", self.gramrules.get(name, name), " --", source.debug()
+ #print "Sequence", name
+ return BuilderRule( name, values)
+
+ def token( self, name, value, source ):
+ print "tok:", self.gramrules.get(name, name), " --", source.debug()
+ #print "Token", name, value
+ return BuilderToken( name, value )
+
+
+class GrammarBuilder(BaseGrammarBuilder):
+ """Builds a grammar from a grammar desc"""
+ def __init__(self):
+ self.rules = {}
+ self.terminals = {}
+ self.rule_idx = 0
+ self.items = []
+ self.tokens = {}
+
+ def alternative( self, name, source ):
+ pass
+
+ def sequence( self, name, source, N ):
+ #print "seq:", name, "->", source.debug()
+ #print "Sequence", name
+ meth = getattr(self, "build_%s" % name, None)
+ if meth:
+ return meth(values)
+ raise RuntimeError( "symbol %s unhandled" % name )
+
+ def token( self, name, value, source ):
+ #print "tok:", name, "->", source.debug()
+ #print "Token", name, value
+ if name=="SYMDEF":
+ return value
+ elif name=="STRING":
+ tok = self.tokens.get(value)
+ if not tok:
+ tok = Token(value)
+ self.tokens[value] = tok
+ return tok
+ elif name=="SYMBOL":
+ sym = self.terminals.get(value)
+ if not sym:
+ sym = Token(value)
+ self.terminals[value] = sym
+ return sym
+ elif name in ('*','+','(','[',']',')','|',):
+ return name
+ return BuilderToken( name, value )
+
+ def build_sequence( self, values ):
+ """sequence: sequence_alt+
+ sequence_alt: symbol | STRING | option | group star?
+ """
+ if len(values)==1:
+ return values[0]
+ if len(values)>1:
+ seq = Sequence( self.get_name(), *values )
+ self.items.append(seq)
+ debug_rule( seq )
+ return seq
+ return True
+
+ def get_name(self):
+ s = "Rule_%03d" % self.rule_idx
+ self.rule_idx += 1
+ return s
+
+ def build_rule( self, values ):
+ rule_def = values[0]
+ rule_alt = values[1]
+ if not isinstance(rule_alt,Token):
+ rule_alt.name = rule_def
+ self.rules[rule_def] = rule_alt
+ return True
+
+ def build_alternative( self, values ):
+ if len(values[1])>0:
+ alt = Alternative( self.get_name(), values[0], *values[1] )
+ debug_rule( alt )
+ self.items.append(alt)
+ return alt
+ else:
+ return values[0]
+
+ def build_star_opt( self, values ):
+ """star_opt: star?"""
+ if values:
+ return values[0]
+ else:
+ return True
+
+ def build_seq_cont_list( self, values ):
+ """seq_cont_list: '|' sequence """
+ return values[1]
+
+ def build_symbol( self, values ):
+ """symbol: SYMBOL star?"""
+ sym = values[0]
+ star = values[1]
+ if star is True:
+ return sym
+ _min = 0
+ _max = -1
+ if star=='*':
+ _min = 0
+ elif star=='+':
+ _min = 1
+ sym = KleenStar( self.get_name(), _min, _max, rule=sym )
+ sym.star = star
+ debug_rule( sym )
+ self.items.append(sym)
+ return sym
+
+ def build_group( self, values ):
+ """group: '(' alternative ')' star?"""
+ return self.build_symbol( [ values[1], values[3] ] )
+
+ def build_option( self, values ):
+ """option: '[' alternative ']'"""
+ sym = KleenStar( self.get_name(), 0, 1, rule=values[1] )
+ debug_rule( sym )
+ self.items.append(sym)
+ return sym
+
+ def build_sequence_cont( self, values ):
+ """sequence_cont: seq_cont_list*"""
+ return values
+
+ def build_grammar( self, values ):
+ """ grammar: rules+"""
+ # the rules are registered already
+ # we do a pass through the variables to detect
+ # terminal symbols from non terminals
+ for r in self.items:
+ for i,a in enumerate(r.args):
+ if a.name in self.rules:
+ assert isinstance(a,Token)
+ r.args[i] = self.rules[a.name]
+ if a.name in self.terminals:
+ del self.terminals[a.name]
+
Added: pypy/dist/pypy/module/parser/recparser/leftout/compiler.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/leftout/compiler.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1,19 @@
+
+
+class CompileContext(object):
+ pass
+
+class CompilerVisitor(object):
+ def __init__(self):
+ self.com = CompileContext()
+
+ def visit_single_input( self, n ):
+ pass
+
+ def visit_file_input( self, n ):
+ pass
+
+
+
+
+
Added: pypy/dist/pypy/module/parser/recparser/leftout/gen_ast.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/leftout/gen_ast.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1,26 @@
+
+
+
+
+
+from pgen import grammar_grammar, GrammarSource, GrammarVisitor
+from grammar import BaseGrammarBuilder
+
+
+def parse_grammar( fic ):
+ src = GrammarSource( fic )
+ rule = grammar_grammar()
+ builder = BaseGrammarBuilder()
+ result = rule.match( src, builder )
+ return builder
+
+if __name__ == "__main__":
+ import sys
+ fic = file('Grammar','r')
+ grambuild = parse_grammar( fic )
+ print grambuild.stack
+ node = grambuild.stack[-1]
+ vis = GrammarVisitor()
+ node.visit(vis)
+ for i,r in enumerate(vis.items):
+ print "% 3d : %s" % (i, r)
Added: pypy/dist/pypy/module/parser/recparser/leftout/parse_grammar.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/leftout/parse_grammar.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1,29 @@
+
+
+from pgen import grammar_grammar, GrammarSource, GrammarVisitor
+from grammar import BaseGrammarBuilder
+
+
+
+
+def parse_grammar( fic ):
+ src = GrammarSource( fic )
+ rule = grammar_grammar()
+ builder = BaseGrammarBuilder()
+ result = rule.match( src, builder )
+ if not result:
+ print src.debug()
+ raise SyntaxError("at %s" % src.debug() )
+ return builder
+
+if __name__ == "__main__":
+ import sys
+ fic = file('Grammar','r')
+ grambuild = parse_grammar( fic )
+ print grambuild.stack
+ node = grambuild.stack[-1]
+ vis = GrammarVisitor()
+ node.visit(vis)
+ for i,r in enumerate(vis.items):
+ print "% 3d : %s" % (i, r)
+
Added: pypy/dist/pypy/module/parser/recparser/leftout/pgen.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/leftout/pgen.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1,480 @@
+#
+# Generate a Python Syntax analyser from the Python's grammar
+# The grammar comes from the Grammar file in Python source tree
+#
+from pylexer import PythonSource
+import pylexer
+DEBUG=0
+
+class BuilderToken(object):
+ def __init__(self, name, value):
+ self.name = name
+ self.value = value
+
+ def __str__(self):
+ return "%s=(%s)" % (self.name, self.value)
+
+ def display(self, indent=""):
+ print indent,self.name,"=",self.value,
+
+class BuilderRule(object):
+ def __init__(self, name, values):
+ self.name = name
+ self.values = values
+
+ def __str__(self):
+ return "%s=(%s)" % (self.name, self.values)
+
+ def display(self, indent=""):
+ print indent,self.name,'('
+ for v in self.values:
+ v.display(indent+"| ")
+ print ","
+ print indent,')',
+
+class SimpleBuilder(object):
+ """Default builder class (print output)"""
+ def __init__(self):
+ self.gramrules = {}
+
+ def alternative( self, name, value, source ):
+ print "alt:", self.gramrules.get(name, name), " --", source.debug()
+ #print "Alternative", name
+ return BuilderRule( name, [value] )
+
+ def sequence( self, name, values, source ):
+ print "seq:", self.gramrules.get(name, name), " --", source.debug()
+ #print "Sequence", name
+ return BuilderRule( name, values)
+
+ def token( self, name, value, source ):
+ print "tok:", self.gramrules.get(name, name), " --", source.debug()
+ #print "Token", name, value
+ return BuilderToken( name, value )
+
+
+import re
+import grammar
+from grammar import Token, Alternative, KleenStar, Sequence, TokenSource, BaseGrammarBuilder, Proxy, Pgen
+
+g_symdef = re.compile(r"[a-zA-Z_][a-zA-Z0-9_]*:",re.M)
+g_symbol = re.compile(r"[a-zA-Z_][a-zA-Z0-9_]*",re.M)
+g_string = re.compile(r"'[^']+'",re.M)
+g_tok = re.compile(r"\[|\]|\(|\)|\*|\+|\|",re.M)
+g_skip = re.compile(r"\s*(#.*$)?",re.M)
+
+class GrammarSource(TokenSource):
+ """The grammar tokenizer"""
+ def __init__(self, inpstream ):
+ TokenSource.__init__(self)
+ self.input = inpstream.read()
+ self.pos = 0
+
+ def context(self):
+ return self.pos
+
+ def restore(self, ctx ):
+ self.pos = ctx
+
+ def next(self):
+ pos = self.pos
+ inp = self.input
+ m = g_skip.match(inp, pos)
+ while m and pos!=m.end():
+ pos = m.end()
+ if pos==len(inp):
+ self.pos = pos
+ return None, None
+ m = g_skip.match(inp, pos)
+ m = g_symdef.match(inp,pos)
+ if m:
+ tk = m.group(0)
+ self.pos = m.end()
+ return 'SYMDEF',tk[:-1]
+ m = g_tok.match(inp,pos)
+ if m:
+ tk = m.group(0)
+ self.pos = m.end()
+ return tk,tk
+ m = g_string.match(inp,pos)
+ if m:
+ tk = m.group(0)
+ self.pos = m.end()
+ return 'STRING',tk[1:-1]
+ m = g_symbol.match(inp,pos)
+ if m:
+ tk = m.group(0)
+ self.pos = m.end()
+ return 'SYMBOL',tk
+ raise ValueError("Unknown token at pos=%d context='%s'" % (pos,inp[pos:pos+20]) )
+
+ def debug(self):
+ return self.input[self.pos:self.pos+20]
+
+def debug_rule( rule ):
+ nm = rule.__class__.__name__
+ print nm, rule.name, "->",
+ if nm=='KleenStar':
+ print "(%d,%d,%s)" % (rule.min, rule.max, rule.star),
+ for x in rule.args:
+ print x.name,
+ print
+
+def debug_rule( *args ):
+ pass
+
+
+class GrammarBuilder(BaseGrammarBuilder):
+ """Builds a grammar from a grammar desc"""
+ def __init__(self):
+ self.rules = {}
+ self.terminals = {}
+ self.rule_idx = 0
+ self.items = []
+ self.tokens = {}
+
+ def alternative( self, name, source ):
+ pass
+
+ def sequence( self, name, source, N ):
+ #print "seq:", name, "->", source.debug()
+ #print "Sequence", name
+ meth = getattr(self, "build_%s" % name, None)
+ if meth:
+ return meth(values)
+ raise RuntimeError( "symbol %s unhandled" % name )
+
+ def token( self, name, value, source ):
+ #print "tok:", name, "->", source.debug()
+ #print "Token", name, value
+ if name=="SYMDEF":
+ return value
+ elif name=="STRING":
+ tok = self.tokens.get(value)
+ if not tok:
+ tok = Token(value)
+ self.tokens[value] = tok
+ return tok
+ elif name=="SYMBOL":
+ sym = self.terminals.get(value)
+ if not sym:
+ sym = Token(value)
+ self.terminals[value] = sym
+ return sym
+ elif name in ('*','+','(','[',']',')','|',):
+ return name
+ return BuilderToken( name, value )
+
+ def build_sequence( self, values ):
+ """sequence: sequence_alt+
+ sequence_alt: symbol | STRING | option | group star?
+ """
+ if len(values)==1:
+ return values[0]
+ if len(values)>1:
+ seq = Sequence( self.get_name(), *values )
+ self.items.append(seq)
+ debug_rule( seq )
+ return seq
+ return True
+
+ def get_name(self):
+ s = "Rule_%03d" % self.rule_idx
+ self.rule_idx += 1
+ return s
+
+ def build_rule( self, values ):
+ rule_def = values[0]
+ rule_alt = values[1]
+ if not isinstance(rule_alt,Token):
+ rule_alt.name = rule_def
+ self.rules[rule_def] = rule_alt
+ return True
+
+ def build_alternative( self, values ):
+ if len(values[1])>0:
+ alt = Alternative( self.get_name(), values[0], *values[1] )
+ debug_rule( alt )
+ self.items.append(alt)
+ return alt
+ else:
+ return values[0]
+
+ def build_star_opt( self, values ):
+ """star_opt: star?"""
+ if values:
+ return values[0]
+ else:
+ return True
+
+ def build_seq_cont_list( self, values ):
+ """seq_cont_list: '|' sequence """
+ return values[1]
+
+ def build_symbol( self, values ):
+ """symbol: SYMBOL star?"""
+ sym = values[0]
+ star = values[1]
+ if star is True:
+ return sym
+ _min = 0
+ _max = -1
+ if star=='*':
+ _min = 0
+ elif star=='+':
+ _min = 1
+ sym = KleenStar( self.get_name(), _min, _max, rule=sym )
+ sym.star = star
+ debug_rule( sym )
+ self.items.append(sym)
+ return sym
+
+ def build_group( self, values ):
+ """group: '(' alternative ')' star?"""
+ return self.build_symbol( [ values[1], values[3] ] )
+
+ def build_option( self, values ):
+ """option: '[' alternative ']'"""
+ sym = KleenStar( self.get_name(), 0, 1, rule=values[1] )
+ debug_rule( sym )
+ self.items.append(sym)
+ return sym
+
+ def build_sequence_cont( self, values ):
+ """sequence_cont: seq_cont_list*"""
+ return values
+
+ def build_grammar( self, values ):
+ """ grammar: rules+"""
+ # the rules are registered already
+ # we do a pass through the variables to detect
+ # terminal symbols from non terminals
+ for r in self.items:
+ for i,a in enumerate(r.args):
+ if a.name in self.rules:
+ assert isinstance(a,Token)
+ r.args[i] = self.rules[a.name]
+ if a.name in self.terminals:
+ del self.terminals[a.name]
+
+
+class GrammarVisitor(object):
+ def __init__(self):
+ self.rules = {}
+ self.terminals = {}
+ self.current_rule = None
+ self.current_subrule = 0
+ self.tokens = {}
+ self.items = []
+
+ def new_name( self ):
+ rule_name = ":%s_%s" % (self.current_rule, self.current_subrule)
+ self.current_subrule += 1
+ return rule_name
+
+ def new_item( self, itm ):
+ self.items.append( itm )
+ return itm
+
+ def visit_grammar( self, node ):
+ print "Grammar:"
+ for rule in node.nodes:
+ rule.visit(self)
+ # the rules are registered already
+ # we do a pass through the variables to detect
+ # terminal symbols from non terminals
+ for r in self.items:
+ for i,a in enumerate(r.args):
+ if a.name in self.rules:
+ assert isinstance(a,Token)
+ r.args[i] = self.rules[a.name]
+ if a.name in self.terminals:
+ del self.terminals[a.name]
+
+ def visit_rule( self, node ):
+ symdef = node.nodes[0].value
+ self.current_rule = symdef
+ self.current_subrule = 0
+ alt = node.nodes[1]
+ rule = alt.visit(self)
+ if not isinstance( rule, Token ):
+ rule.name = symdef
+ self.rules[symdef] = rule
+
+ def visit_alternative( self, node ):
+ items = [ node.nodes[0].visit(self) ]
+ items+= node.nodes[1].visit(self)
+ if len(items)==1:
+ return items[0]
+ alt = Alternative( self.new_name(), *items )
+ return self.new_item( alt )
+
+ def visit_sequence( self, node ):
+ """ """
+ items = []
+ for n in node.nodes:
+ items.append( n.visit(self) )
+ if len(items)==1:
+ return items[0]
+ elif len(items)>1:
+ return self.new_item( Sequence( self.new_name(), *items) )
+ raise SyntaxError("Found empty sequence")
+
+ def visit_sequence_cont( self, node ):
+ """Returns a list of sequences (possibly empty)"""
+ L = []
+ for n in node.nodes:
+ L.append( n.visit(self) )
+ return L
+
+ def visit_seq_cont_list( self, node ):
+ return node.nodes[1].visit(self)
+
+
+ def visit_symbol( self, node ):
+ star_opt = node.nodes[1]
+ sym = node.nodes[0].value
+ terminal = self.terminals.get( sym )
+ if not terminal:
+ terminal = Token( sym )
+ self.terminals[sym] = terminal
+
+ return self.repeat( star_opt, terminal )
+
+ def visit_option( self, node ):
+ rule = node.nodes[1].visit(self)
+ return self.new_item( KleenStar( self.new_name(), 0, 1, rule ) )
+
+ def visit_group( self, node ):
+ rule = node.nodes[1].visit(self)
+ return self.repeat( node.nodes[3], rule )
+
+ def visit_STRING( self, node ):
+ value = node.value
+ tok = self.tokens.get(value)
+ if not tok:
+ if pylexer.py_punct.match( value ):
+ tok = Token( value )
+ elif pylexer.py_name.match( value ):
+ tok = Token('NAME',value)
+ else:
+ raise SyntaxError("Unknown STRING value ('%s')" % value )
+ self.tokens[value] = tok
+ return tok
+
+ def visit_sequence_alt( self, node ):
+ res = node.nodes[0].visit(self)
+ assert isinstance( res, Pgen )
+ return res
+
+ def repeat( self, star_opt, myrule ):
+ if star_opt.nodes:
+ rule_name = self.new_name()
+ tok = star_opt.nodes[0].nodes[0]
+ if tok.value == '+':
+ return self.new_item( KleenStar( rule_name, _min=1, rule = myrule ) )
+ elif tok.value == '*':
+ return self.new_item( KleenStar( rule_name, _min=0, rule = myrule ) )
+ else:
+ raise SyntaxError("Got symbol star_opt with value='%s'" % tok.value )
+ return myrule
+
+
+_grammar = """
+grammar: rule+
+rule: SYMDEF alternative
+
+alternative: sequence ( '|' sequence )+
+star: '*' | '+'
+sequence: (SYMBOL star? | STRING | option | group star? )+
+option: '[' alternative ']'
+group: '(' alternative ')' star?
+"""
+def grammar_grammar():
+ """Builds the grammar for the grammar file
+ """
+ # star: '*' | '+'
+ star = Alternative( "star", Token('*'), Token('+') )
+ star_opt = KleenStar ( "star_opt", 0, 1, rule=star )
+
+ # rule: SYMBOL ':' alternative
+ symbol = Sequence( "symbol", Token('SYMBOL'), star_opt )
+ symboldef = Token( "SYMDEF" )
+ alternative = Sequence( "alternative" )
+ rule = Sequence( "rule", symboldef, alternative )
+
+ # grammar: rule+
+ grammar = KleenStar( "grammar", _min=1, rule=rule )
+
+ # alternative: sequence ( '|' sequence )*
+ sequence = KleenStar( "sequence", 1 )
+ seq_cont_list = Sequence( "seq_cont_list", Token('|'), sequence )
+ sequence_cont = KleenStar( "sequence_cont",0, rule=seq_cont_list )
+
+ alternative.args = [ sequence, sequence_cont ]
+
+ # option: '[' alternative ']'
+ option = Sequence( "option", Token('['), alternative, Token(']') )
+
+ # group: '(' alternative ')'
+ group = Sequence( "group", Token('('), alternative, Token(')'), star_opt )
+
+ # sequence: (SYMBOL | STRING | option | group )+
+ string = Token('STRING')
+ alt = Alternative( "sequence_alt", symbol, string, option, group )
+ sequence.args = [ alt ]
+
+ return grammar
+
+
+def parse_python( pyf, gram ):
+ target = gram.rules['file_input']
+ src = PythonSource( pyf.read() )
+ builder = BaseGrammarBuilder(debug=False, rules=gram.rules)
+ # for r in gram.items:
+ # builder.gramrules[r.name] = rg
+ result = target.match( src, builder )
+ print result, builder.stack
+ if not result:
+ print src.debug()
+ raise SyntaxError("at %s" % src.debug() )
+ return builder
+
+
+from pprint import pprint
+def parse_grammar( fic ):
+ src = GrammarSource( fic )
+ rule = grammar_grammar()
+ builder = BaseGrammarBuilder()
+ result = rule.match( src, builder )
+ node = builder.stack[-1]
+ vis = GrammarVisitor()
+ node.visit(vis)
+
+ return vis
+
+
+if __name__ == "__main__":
+ grammar.DEBUG = False
+ import sys
+ fic = file('Grammar','r')
+ grambuild = parse_grammar( fic )
+ if len(sys.argv)>1:
+ print "-"*20
+ print
+ pyf = file(sys.argv[1],'r')
+ DEBUG = 0
+ builder = parse_python( pyf, grambuild )
+ #print "**", builder.stack
+ if builder.stack:
+ print builder.stack[-1].dumpstr()
+ tp1 = builder.stack[-1]
+ import parser
+ tp2 = parser.suite( file(sys.argv[1]).read() )
+
+ else:
+ for i,r in enumerate(grambuild.items):
+ print "% 3d : %s" % (i, r)
+ pprint(grambuild.terminals.keys())
+ pprint(grambuild.tokens)
+ print "|".join(grambuild.tokens.keys() )
Added: pypy/dist/pypy/module/parser/recparser/python/Grammar2.3
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/python/Grammar2.3 Mon Apr 25 16:03:44 2005
@@ -0,0 +1,108 @@
+# Grammar for Python
+
+# Note: Changing the grammar specified in this file will most likely
+# require corresponding changes in the parser module
+# (../Modules/parsermodule.c). If you can't make the changes to
+# that module yourself, please co-ordinate the required changes
+# with someone who can; ask around on python-dev for help. Fred
+# Drake <fdrake at acm.org> will probably be listening there.
+
+# Commands for Kees Blom's railroad program
+#diagram:token NAME
+#diagram:token NUMBER
+#diagram:token STRING
+#diagram:token NEWLINE
+#diagram:token ENDMARKER
+#diagram:token INDENT
+#diagram:output\input python.bla
+#diagram:token DEDENT
+#diagram:output\textwidth 20.04cm\oddsidemargin 0.0cm\evensidemargin 0.0cm
+#diagram:rules
+
+# Start symbols for the grammar:
+# single_input is a single interactive statement;
+# file_input is a module or sequence of commands read from an input file;
+# eval_input is the input for the eval() and input() functions.
+# NB: compound_stmt in single_input is followed by extra NEWLINE!
+single_input: NEWLINE | simple_stmt | compound_stmt NEWLINE
+file_input: (NEWLINE | stmt)* ENDMARKER
+eval_input: testlist NEWLINE* ENDMARKER
+
+funcdef: 'def' NAME parameters ':' suite
+parameters: '(' [varargslist] ')'
+varargslist: (fpdef ['=' test] ',')* ('*' NAME [',' '**' NAME] | '**' NAME) | fpdef ['=' test] (',' fpdef ['=' test])* [',']
+fpdef: NAME | '(' fplist ')'
+fplist: fpdef (',' fpdef)* [',']
+
+stmt: simple_stmt | compound_stmt
+simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE
+small_stmt: expr_stmt | print_stmt | del_stmt | pass_stmt | flow_stmt | import_stmt | global_stmt | exec_stmt | assert_stmt
+expr_stmt: testlist (augassign testlist | ('=' testlist)*)
+augassign: '+=' | '-=' | '*=' | '/=' | '%=' | '&=' | '|=' | '^=' | '<<=' | '>>=' | '**=' | '//='
+# For normal assignments, additional restrictions enforced by the interpreter
+print_stmt: 'print' ( '>>' test [ (',' test)+ [','] ] | [ test (',' test)* [','] ] )
+del_stmt: 'del' exprlist
+pass_stmt: 'pass'
+flow_stmt: break_stmt | continue_stmt | return_stmt | raise_stmt | yield_stmt
+break_stmt: 'break'
+continue_stmt: 'continue'
+return_stmt: 'return' [testlist]
+yield_stmt: 'yield' testlist
+raise_stmt: 'raise' [test [',' test [',' test]]]
+import_stmt: 'import' dotted_as_name (',' dotted_as_name)* | 'from' dotted_name 'import' ('*' | import_as_name (',' import_as_name)*)
+import_as_name: NAME [NAME NAME]
+dotted_as_name: dotted_name [NAME NAME]
+dotted_name: NAME ('.' NAME)*
+global_stmt: 'global' NAME (',' NAME)*
+exec_stmt: 'exec' expr ['in' test [',' test]]
+assert_stmt: 'assert' test [',' test]
+
+compound_stmt: if_stmt | while_stmt | for_stmt | try_stmt | funcdef | classdef
+if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]
+while_stmt: 'while' test ':' suite ['else' ':' suite]
+for_stmt: 'for' exprlist 'in' testlist ':' suite ['else' ':' suite]
+try_stmt: ('try' ':' suite (except_clause ':' suite)+ #diagram:break
+ ['else' ':' suite] | 'try' ':' suite 'finally' ':' suite)
+# NB compile.c makes sure that the default except clause is last
+except_clause: 'except' [test [',' test]]
+suite: simple_stmt | NEWLINE INDENT stmt+ DEDENT
+
+test: and_test ('or' and_test)* | lambdef
+and_test: not_test ('and' not_test)*
+not_test: 'not' not_test | comparison
+comparison: expr (comp_op expr)*
+comp_op: '<'|'>'|'=='|'>='|'<='|'<>'|'!='|'in'|'not' 'in'|'is' 'not'|'is'
+expr: xor_expr ('|' xor_expr)*
+xor_expr: and_expr ('^' and_expr)*
+and_expr: shift_expr ('&' shift_expr)*
+shift_expr: arith_expr (('<<'|'>>') arith_expr)*
+arith_expr: term (('+'|'-') term)*
+term: factor (('*'|'/'|'%'|'//') factor)*
+factor: ('+'|'-'|'~') factor | power
+power: atom trailer* ['**' factor]
+atom: '(' [testlist] ')' | '[' [listmaker] ']' | '{' [dictmaker] '}' | '`' testlist1 '`' | NAME | NUMBER | STRING+
+listmaker: test ( list_for | (',' test)* [','] )
+lambdef: 'lambda' [varargslist] ':' test
+trailer: '(' ')' | '(' arglist ')' | '[' subscriptlist ']' | '.' NAME
+subscriptlist: subscript (',' subscript)* [',']
+subscript: '.' '.' '.' | [test] ':' [test] [sliceop] | test
+sliceop: ':' [test]
+exprlist: expr (',' expr)* [',']
+testlist: test (',' test)* [',']
+testlist_safe: test [(',' test)+ [',']]
+dictmaker: test ':' test (',' test ':' test)* [',']
+
+classdef: 'class' NAME ['(' testlist ')'] ':' suite
+
+# arglist: (argument ',')* (argument [',']| '*' test [',' '**' test] | '**' test)
+arglist: (argument ',')* ( '*' test [',' '**' test] | '**' test | argument | [argument ','] )
+argument: [test '='] test # Really [keyword '='] test
+
+list_iter: list_for | list_if
+list_for: 'for' exprlist 'in' testlist_safe [list_iter]
+list_if: 'if' test [list_iter]
+
+testlist1: test (',' test)*
+
+# not used in grammar, but may appear in "node" passed from Parser to Compiler
+encoding_decl: NAME
Added: pypy/dist/pypy/module/parser/recparser/python/Grammar2.4
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/python/Grammar2.4 Mon Apr 25 16:03:44 2005
@@ -0,0 +1,118 @@
+# Grammar for Python
+
+# Note: Changing the grammar specified in this file will most likely
+# require corresponding changes in the parser module
+# (../Modules/parsermodule.c). If you can't make the changes to
+# that module yourself, please co-ordinate the required changes
+# with someone who can; ask around on python-dev for help. Fred
+# Drake <fdrake at acm.org> will probably be listening there.
+
+# Commands for Kees Blom's railroad program
+#diagram:token NAME
+#diagram:token NUMBER
+#diagram:token STRING
+#diagram:token NEWLINE
+#diagram:token ENDMARKER
+#diagram:token INDENT
+#diagram:output\input python.bla
+#diagram:token DEDENT
+#diagram:output\textwidth 20.04cm\oddsidemargin 0.0cm\evensidemargin 0.0cm
+#diagram:rules
+
+# Start symbols for the grammar:
+# single_input is a single interactive statement;
+# file_input is a module or sequence of commands read from an input file;
+# eval_input is the input for the eval() and input() functions.
+# NB: compound_stmt in single_input is followed by extra NEWLINE!
+single_input: NEWLINE | simple_stmt | compound_stmt NEWLINE
+file_input: (NEWLINE | stmt)* ENDMARKER
+eval_input: testlist NEWLINE* ENDMARKER
+
+decorator: '@' dotted_name [ '(' [arglist] ')' ] NEWLINE
+decorators: decorator+
+funcdef: [decorators] 'def' NAME parameters ':' suite
+parameters: '(' [varargslist] ')'
+varargslist: (fpdef ['=' test] ',')* ('*' NAME [',' '**' NAME] | '**' NAME) | fpdef ['=' test] (',' fpdef ['=' test])* [',']
+fpdef: NAME | '(' fplist ')'
+fplist: fpdef (',' fpdef)* [',']
+
+stmt: simple_stmt | compound_stmt
+simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE
+small_stmt: expr_stmt | print_stmt | del_stmt | pass_stmt | flow_stmt | import_stmt | global_stmt | exec_stmt | assert_stmt
+expr_stmt: testlist (augassign testlist | ('=' testlist)*)
+augassign: '+=' | '-=' | '*=' | '/=' | '%=' | '&=' | '|=' | '^=' | '<<=' | '>>=' | '**=' | '//='
+# For normal assignments, additional restrictions enforced by the interpreter
+print_stmt: 'print' ( '>>' test [ (',' test)+ [','] ] | [ test (',' test)* [','] ] )
+del_stmt: 'del' exprlist
+pass_stmt: 'pass'
+flow_stmt: break_stmt | continue_stmt | return_stmt | raise_stmt | yield_stmt
+break_stmt: 'break'
+continue_stmt: 'continue'
+return_stmt: 'return' [testlist]
+yield_stmt: 'yield' testlist
+raise_stmt: 'raise' [test [',' test [',' test]]]
+import_stmt: import_name | import_from
+import_name: 'import' dotted_as_names
+import_from: 'from' dotted_name 'import' ('*' | '(' import_as_names ')' | import_as_names)
+import_as_name: NAME [NAME NAME]
+dotted_as_name: dotted_name [NAME NAME]
+import_as_names: import_as_name (',' import_as_name)* [',']
+dotted_as_names: dotted_as_name (',' dotted_as_name)*
+dotted_name: NAME ('.' NAME)*
+global_stmt: 'global' NAME (',' NAME)*
+exec_stmt: 'exec' expr ['in' test [',' test]]
+assert_stmt: 'assert' test [',' test]
+
+compound_stmt: if_stmt | while_stmt | for_stmt | try_stmt | funcdef | classdef
+if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]
+while_stmt: 'while' test ':' suite ['else' ':' suite]
+for_stmt: 'for' exprlist 'in' testlist ':' suite ['else' ':' suite]
+try_stmt: ('try' ':' suite (except_clause ':' suite)+ #diagram:break
+ ['else' ':' suite] | 'try' ':' suite 'finally' ':' suite)
+# NB compile.c makes sure that the default except clause is last
+except_clause: 'except' [test [',' test]]
+suite: simple_stmt | NEWLINE INDENT stmt+ DEDENT
+
+test: and_test ('or' and_test)* | lambdef
+and_test: not_test ('and' not_test)*
+not_test: 'not' not_test | comparison
+comparison: expr (comp_op expr)*
+comp_op: '<'|'>'|'=='|'>='|'<='|'<>'|'!='|'in'|'not' 'in'|'is' 'not'|'is'
+expr: xor_expr ('|' xor_expr)*
+xor_expr: and_expr ('^' and_expr)*
+and_expr: shift_expr ('&' shift_expr)*
+shift_expr: arith_expr (('<<'|'>>') arith_expr)*
+arith_expr: term (('+'|'-') term)*
+term: factor (('*'|'/'|'%'|'//') factor)*
+factor: ('+'|'-'|'~') factor | power
+power: atom trailer* ['**' factor]
+atom: '(' [testlist_gexp] ')' | '[' [listmaker] ']' | '{' [dictmaker] '}' | '`' testlist1 '`' | NAME | NUMBER | STRING+
+listmaker: test ( list_for | (',' test)* [','] )
+testlist_gexp: test ( gen_for | (',' test)* [','] )
+lambdef: 'lambda' [varargslist] ':' test
+trailer: '(' [arglist] ')' | '[' subscriptlist ']' | '.' NAME
+subscriptlist: subscript (',' subscript)* [',']
+subscript: '.' '.' '.' | [test] ':' [test] [sliceop] | test
+sliceop: ':' [test]
+exprlist: expr (',' expr)* [',']
+testlist: test (',' test)* [',']
+testlist_safe: test [(',' test)+ [',']]
+dictmaker: test ':' test (',' test ':' test)* [',']
+
+classdef: 'class' NAME ['(' testlist ')'] ':' suite
+
+arglist: (argument ',')* (argument [',']| '*' test [',' '**' test] | '**' test)
+argument: [test '='] test [gen_for] # Really [keyword '='] test
+
+list_iter: list_for | list_if
+list_for: 'for' exprlist 'in' testlist_safe [list_iter]
+list_if: 'if' test [list_iter]
+
+gen_iter: gen_for | gen_if
+gen_for: 'for' exprlist 'in' test [gen_iter]
+gen_if: 'if' test [gen_iter]
+
+testlist1: test (',' test)*
+
+# not used in grammar, but may appear in "node" passed from Parser to Compiler
+encoding_decl: NAME
Added: pypy/dist/pypy/module/parser/recparser/python/__init__.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/python/__init__.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1,60 @@
+__all__ = [ "parse_file_input", "parse_single_input", "parse_eval_input",
+ "python_grammar", "PYTHON_GRAMMAR" ]
+
+from parse import parse_file_input, parse_single_input, parse_eval_input
+import os
+import sys
+
+_ver = ".".join([str(i) for i in sys.version_info[:2]])
+PYTHON_GRAMMAR = os.path.join( os.path.dirname(__file__), "Grammar" + _ver )
+
+def python_grammar():
+ """returns a """
+ from ebnf import parse_grammar
+ level = get_debug()
+ set_debug( 0 )
+ gram = parse_grammar( file(PYTHON_GRAMMAR) )
+ set_debug( level )
+ return gram
+
+def get_debug():
+ """Return debug level"""
+ import grammar
+ return grammar.DEBUG
+
+def set_debug( level ):
+ """sets debug mode to <level>"""
+ import grammar
+ grammar.DEBUG = level
+
+
+def python_parse(filename):
+ """parse <filename> using CPython's parser module and return nested tuples
+ """
+ pyf = file(filename)
+ import parser
+ tp2 = parser.suite(pyf.read())
+ return tp2.totuple()
+
+
+def _get_encoding(builder):
+ if hasattr(builder, '_source_encoding'):
+ return builder._source_encoding
+ return None
+
+def pypy_parse(filename):
+ """parse <filename> using PyPy's parser module and return nested tuples
+ """
+ pyf = file(filename)
+ builder = parse_file_input(pyf, python_grammar())
+ pyf.close()
+ if builder.stack:
+ # print builder.stack[-1]
+ root_node = builder.stack[-1]
+ nested_tuples = root_node.totuple()
+ source_encoding = _get_encoding(builder)
+ if source_encoding is None:
+ return nested_tuples
+ else:
+ return (323, nested_tuples, source_encoding)
+ return None # XXX raise an exception instead
Added: pypy/dist/pypy/module/parser/recparser/python/lexer.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/python/lexer.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1,387 @@
+"""This is a lexer for a Python recursive descent parser
+it obeys the TokenSource interface defined for the grammar
+analyser in grammar.py
+"""
+
+from grammar import TokenSource
+
+DEBUG = False
+import re
+
+KEYWORDS = [
+ 'and', 'assert', 'break', 'class', 'continue', 'def', 'del',
+ 'elif', 'if', 'import', 'in', 'is', 'finally', 'for', 'from',
+ 'global', 'else', 'except', 'exec', 'lambda', 'not', 'or',
+ 'pass', 'print', 'raise', 'return', 'try', 'while', 'yield'
+ ]
+
+py_keywords = re.compile(r'(%s)$' % ('|'.join(KEYWORDS)), re.M | re.X)
+
+py_punct = re.compile(r"""
+<>|!=|==|~|
+<=|<<=|<<|<|
+>=|>>=|>>|>|
+\*=|\*\*=|\*\*|\*|
+//=|/=|//|/|
+%=|\^=|\|=|\+=|=|&=|-=|
+,|\^|&|\+|-|\.|%|\||
+\)|\(|;|:|@|\[|\]|`|\{|\}
+""", re.M | re.X)
+
+g_symdef = re.compile(r"[a-zA-Z_][a-zA-Z0-9_]*:", re.M)
+g_string = re.compile(r"'[^']+'", re.M)
+py_name = re.compile(r"[a-zA-Z_][a-zA-Z0-9_]*", re.M)
+py_comment = re.compile(r"#.*$|[ \t\014]*$", re.M)
+py_ws = re.compile(r" *", re.M)
+py_skip = re.compile(r"[ \t\014]*(#.*$)?", re.M)
+py_encoding = re.compile(r"coding[:=]\s*([-\w.]+)")
+# py_number = re.compile(r"0x[0-9a-z]+|[0-9]+l|([0-9]+\.[0-9]*|\.[0-9]+|[0-9]+)(e[+-]?[0-9]+)?j?||[0-9]+", re.I)
+
+# 0x[\da-f]+l matches hexadecimal numbers, possibly defined as long
+# \d+l matches and only matches long integers
+# (\d+\.\d*|\.\d+|\d+)(e[+-]?\d+)?j? matches simple integers,
+# exponential notations and complex
+py_number = re.compile(r"""0x[\da-f]+l?|
+\d+l|
+(\d+\.\d*|\.\d+|\d+)(e[+-]?\d+)?j?
+""", re.I | re.X)
+
+def _normalize_encoding(encoding):
+ """returns normalized name for <encoding>
+
+ see dist/src/Parser/tokenizer.c 'get_normal_name()'
+ for implementation details / reference
+
+ NOTE: for now, parser.suite() raises a MemoryError when
+ a bad encoding is used. (SF bug #979739)
+ """
+ # lower() + '_' / '-' conversion
+ encoding = encoding.replace('_', '-').lower()
+ if encoding.startswith('utf-8'):
+ return 'utf-8'
+ for variant in ('latin-1', 'iso-latin-1', 'iso-8859-1'):
+ if encoding.startswith(variant):
+ return 'iso-8859-1'
+ return encoding
+
+class PythonSource(TokenSource):
+ """The Python tokenizer"""
+ def __init__(self, inpstring):
+ TokenSource.__init__(self)
+ self.input = inpstring
+ self.pos = 0
+ self.indent = 0
+ self.indentstack = [ 0 ]
+ self.atbol = True
+ self.line = 1
+ self._current_line = 1
+ self.pendin = 0 # indentation change waiting to be reported
+ self.level = 0
+ self.linestart = 0
+ self.stack = []
+ self.stack_pos = 0
+ self.comment = ''
+ self.encoding = None
+
+ def current_line(self):
+ return self._current_line
+
+ def context(self):
+ return self.stack_pos
+
+ def restore(self, ctx):
+ self.stack_pos = ctx
+
+ def _next(self):
+ """returns the next token from source"""
+ inp = self.input
+ pos = self.pos
+ input_length = len(inp)
+ if pos >= input_length:
+ return self.end_of_file()
+ # Beginning of line
+ if self.atbol:
+ self.linestart = pos
+ col = 0
+ m = py_ws.match(inp, pos)
+ pos = m.end()
+ col = pos - self.linestart
+ self.atbol = False
+ # skip blanklines
+ m = py_comment.match(inp, pos)
+ if m:
+ if not self.comment:
+ self.comment = m.group(0)
+ # <HACK> XXX FIXME: encoding management
+ if self.line <= 2:
+ # self.comment can be the previous comment, so don't use it
+ comment = m.group(0)[1:]
+ m_enc = py_encoding.search(comment)
+ if m_enc is not None:
+ self.encoding = _normalize_encoding(m_enc.group(1))
+ # </HACK>
+ self.pos = m.end() + 1
+ self.line += 1
+ self.atbol = True
+ return self._next()
+ # the current block is more indented than the previous one
+ if col > self.indentstack[-1]:
+ self.indentstack.append(col)
+ return "INDENT", None
+ # the current block is less indented than the previous one
+ while col < self.indentstack[-1]:
+ self.pendin += 1
+ self.indentstack.pop(-1)
+ if col != self.indentstack[-1]:
+ raise SyntaxError("Indentation Error")
+ if self.pendin > 0:
+ self.pendin -= 1
+ return "DEDENT", None
+ m = py_skip.match(inp, pos)
+ if m.group(0)[-1:] == '\n':
+ self.line += 1
+ self.comment = m.group(1) or ''
+ pos = m.end() # always match
+ if pos >= input_length:
+ return self.end_of_file()
+ self.pos = pos
+
+ # STRING
+ c = inp[pos]
+ if c in ('r','R'):
+ if pos < input_length-1 and inp[pos+1] in ("'",'"'):
+ return self.next_string(raw=1)
+ elif c in ('u','U'):
+ if pos < input_length-1:
+ if inp[pos+1] in ("r",'R'):
+ if pos<input_length-2 and inp[pos+2] in ("'",'"'):
+ return self.next_string( raw = 1, uni = 1 )
+ elif inp[pos+1] in ( "'", '"' ):
+ return self.next_string( uni = 1 )
+ elif c in ( '"', "'" ):
+ return self.next_string()
+
+ # NAME
+ m = py_name.match(inp, pos)
+ if m:
+ self.pos = m.end()
+ val = m.group(0)
+# if py_keywords.match(val):
+# return val, None
+ return "NAME", val
+
+ # NUMBER
+ m = py_number.match(inp, pos)
+ if m:
+ self.pos = m.end()
+ return "NUMBER", m.group(0)
+
+ # NEWLINE
+ if c == '\n':
+ self.pos += 1
+ self.line += 1
+ if self.level > 0:
+ return self._next()
+ else:
+ self.atbol = True
+ comment = self.comment
+ self.comment = ''
+ return "NEWLINE", comment
+
+ if c == '\\':
+ if pos < input_length-1 and inp[pos+1] == '\n':
+ self.pos += 2
+ return self._next()
+
+ m = py_punct.match(inp, pos)
+ if m:
+ punct = m.group(0)
+ if punct in ( '(', '{', '[' ):
+ self.level += 1
+ if punct in ( ')', '}', ']' ):
+ self.level -= 1
+ self.pos = m.end()
+ return punct, None
+ raise SyntaxError("Unrecognized token '%s'" % inp[pos:pos+20] )
+
+ def next(self):
+ if self.stack_pos >= len(self.stack):
+ tok, val = self._next()
+ self.stack.append( (tok, val, self.line) )
+ self._current_line = self.line
+ else:
+ tok,val,line = self.stack[self.stack_pos]
+ self._current_line = line
+ self.stack_pos += 1
+ if DEBUG:
+ print "%d/%d: %s, %s" % (self.stack_pos, len(self.stack), tok, val)
+ return (tok, val)
+
+ def end_of_file(self):
+ """return DEDENT and ENDMARKER"""
+ if len(self.indentstack) == 1:
+ self.indentstack.pop(-1)
+ return "NEWLINE", '' #self.comment
+ elif len(self.indentstack) > 1:
+ self.indentstack.pop(-1)
+ return "DEDENT", None
+ return "ENDMARKER", None
+
+
+ def next_string(self, raw=0, uni=0):
+ pos = self.pos + raw + uni
+ inp = self.input
+ quote = inp[pos]
+ qsize = 1
+ if inp[pos:pos+3] == 3*quote:
+ pos += 3
+ quote = 3*quote
+ qsize = 3
+ else:
+ pos += 1
+ while True:
+ if inp[pos:pos+qsize] == quote:
+ s = inp[self.pos:pos+qsize]
+ self.pos = pos+qsize
+ return "STRING", s
+ # FIXME : shouldn't it be inp[pos] == os.linesep ?
+ if inp[pos:pos+2] == "\n" and qsize == 1:
+ return None, None
+ if inp[pos] == "\\":
+ pos += 1
+ pos += 1
+
+ def debug(self):
+ """return context for debug information"""
+ if not hasattr(self, '_lines'):
+ # split lines only once
+ self._lines = self.input.splitlines()
+ return 'line %s : %s' % (self.line, self._lines[self.line-1])
+
+ ## ONLY refactor ideas ###########################################
+## def _mynext(self):
+## """returns the next token from source"""
+## inp = self.input
+## pos = self.pos
+## input_length = len(inp)
+## if pos >= input_length:
+## return self.end_of_file()
+## # Beginning of line
+## if self.atbol:
+## self.linestart = pos
+## col = 0
+## m = py_ws.match(inp, pos)
+## pos = m.end()
+## col = pos - self.linestart
+## self.atbol = False
+## # skip blanklines
+## m = py_comment.match(inp, pos)
+## if m:
+## self.pos = m.end() + 1
+## self.line += 1
+## self.atbol = True
+## return self._next()
+## # the current block is more indented than the previous one
+## if col > self.indentstack[-1]:
+## self.indentstack.append(col)
+## return "INDENT", None
+## # the current block is less indented than the previous one
+## while col < self.indentstack[-1]:
+## self.pendin += 1
+## self.indentstack.pop(-1)
+## if col != self.indentstack[-1]:
+## raise SyntaxError("Indentation Error")
+## if self.pendin > 0:
+## self.pendin -= 1
+## return "DEDENT", None
+## m = py_skip.match(inp, pos)
+## if m.group(0)[-1:] == '\n':
+## self.line += 1
+## pos = m.end() # always match
+## if pos >= input_length:
+## return self.end_of_file()
+## self.pos = pos
+
+## c = inp[pos]
+## chain = (self._check_string, self._check_name, self._check_number,
+## self._check_newline, self._check_backslash, self._check_punct)
+## for check_meth in chain:
+## token_val_pair = check_meth(c, pos)
+## if token_val_pair is not None:
+## return token_val_pair
+
+
+## def _check_string(self, c, pos):
+## inp = self.input
+## input_length = len(inp)
+## # STRING
+## if c in ('r', 'R'):
+## if pos < input_length-1 and inp[pos+1] in ("'",'"'):
+## return self.next_string(raw=1)
+## elif c in ('u','U'):
+## if pos < input_length - 1:
+## if inp[pos+1] in ("r", 'R'):
+## if pos<input_length-2 and inp[pos+2] in ("'",'"'):
+## return self.next_string(raw = 1, uni = 1)
+## elif inp[pos+1] in ( "'", '"' ):
+## return self.next_string(uni = 1)
+## elif c in ( '"', "'" ):
+## return self.next_string()
+## return None
+
+## def _check_name(self, c, pos):
+## inp = self.input
+## # NAME
+## m = py_name.match(inp, pos)
+## if m:
+## self.pos = m.end()
+## val = m.group(0)
+## if py_keywords.match(val):
+## return val, None
+## return "NAME", val
+## return None
+
+## def _check_number(self, c, pos):
+## inp = self.input
+## # NUMBER
+## m = py_number.match(inp, pos)
+## if m:
+## self.pos = m.end()
+## return "NUMBER", m.group(0)
+## return None
+
+## def _check_newline(self, c, pos):
+## # NEWLINE
+## if c == '\n':
+## self.pos += 1
+## self.line += 1
+## if self.level > 0:
+## return self._next()
+## else:
+## self.atbol = True
+## return "NEWLINE", None
+## return None
+
+## def _check_backslash(self, c, pos):
+## inp = self.input
+## input_length = len(inp)
+## if c == '\\':
+## if pos < input_length-1 and inp[pos+1] == '\n':
+## self.pos += 2
+## return self._next()
+## return None
+
+## def _check_punct(self, c, pos):
+## inp = self.input
+## input_length = len(inp)
+## m = py_punct.match(inp, pos)
+## if m:
+## punct = m.group(0)
+## if punct in ( '(', '{' ):
+## self.level += 1
+## if punct in ( ')', '}' ):
+## self.level -= 1
+## self.pos = m.end()
+## return punct, None
+## raise SyntaxError("Unrecognized token '%s'" % inp[pos:pos+20] )
+
Added: pypy/dist/pypy/module/parser/recparser/python/parse.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/python/parse.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1,50 @@
+#!/usr/bin/env python
+from grammar import BaseGrammarBuilder
+from lexer import PythonSource
+from ebnf import parse_grammar
+from pprint import pprint
+import sys
+import python
+
+
+def parse_python_source( textsrc, gram, goal ):
+ """Parse a python source according to goal"""
+ target = gram.rules[goal]
+ src = PythonSource(textsrc)
+ builder = BaseGrammarBuilder(debug=False, rules=gram.rules)
+ result = target.match(src, builder)
+ # <HACK> XXX find a clean way to process encoding declarations
+ if src.encoding:
+ builder._source_encoding = src.encoding
+ # </HACK>
+ if not result:
+ print src.debug()
+ raise SyntaxError("at %s" % src.debug() )
+ return builder
+
+def parse_file_input(pyf, gram):
+ """Parse a python file"""
+ return parse_python_source( pyf.read(), gram, "file_input" )
+
+def parse_single_input(textsrc, gram):
+ """Parse a python file"""
+ return parse_python_source( textsrc, gram, "single_input" )
+
+def parse_eval_input(textsrc, gram):
+ """Parse a python file"""
+ return parse_python_source( textsrc, gram, "eval_input" )
+
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print "python parse.py [-d N] test_file.py"
+ sys.exit(1)
+ if sys.argv[1] == "-d":
+ debug_level = int(sys.argv[2])
+ test_file = sys.argv[3]
+ else:
+ test_file = sys.argv[1]
+ print "-"*20
+ print
+ print "pyparse \n", python.pypy_parse(test_file)
+ print "parser \n", python.python_parse(test_file)
+
Added: pypy/dist/pypy/module/parser/recparser/syntaxtree.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/syntaxtree.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1,151 @@
+import symbol
+import token
+
+TOKEN_MAP = {
+ "STRING" : token.STRING,
+ "NUMBER" : token.NUMBER,
+ "NAME" : token.NAME,
+ "NEWLINE" : token.NEWLINE,
+ "DEDENT" : token.DEDENT,
+ "ENDMARKER" : token.ENDMARKER,
+ "INDENT" : token.INDENT,
+ "NEWLINE" : token.NEWLINE,
+ "NT_OFFSET" : token.NT_OFFSET,
+ "N_TOKENS" : token.N_TOKENS,
+ "OP" : token.OP,
+ "?ERRORTOKEN" : token.ERRORTOKEN,
+ "&" : token.AMPER,
+ "&=" : token.AMPEREQUAL,
+ "`" : token.BACKQUOTE,
+ "^" : token.CIRCUMFLEX,
+ "^=" : token.CIRCUMFLEXEQUAL,
+ ":" : token.COLON,
+ "," : token.COMMA,
+ "." : token.DOT,
+ "//" : token.DOUBLESLASH,
+ "//=" : token.DOUBLESLASHEQUAL,
+ "**" : token.DOUBLESTAR,
+ "**=" : token.DOUBLESTAREQUAL,
+ "==" : token.EQEQUAL,
+ "=" : token.EQUAL,
+ ">" : token.GREATER,
+ ">=" : token.GREATEREQUAL,
+ "{" : token.LBRACE,
+ "}" : token.RBRACE,
+ "<<" : token.LEFTSHIFT,
+ "<<=" : token.LEFTSHIFTEQUAL,
+ "<" : token.LESS,
+ "<=" : token.LESSEQUAL,
+ "(" : token.LPAR,
+ "[" : token.LSQB,
+ "-=" : token.MINEQUAL,
+ "-" : token.MINUS,
+ "!=" : token.NOTEQUAL,
+ "<>" : token.NOTEQUAL,
+ "%" : token.PERCENT,
+ "%=" : token.PERCENTEQUAL,
+ "+" : token.PLUS,
+ "+=" : token.PLUSEQUAL,
+ ")" : token.RBRACE,
+ ">>" : token.RIGHTSHIFT,
+ ">>=" : token.RIGHTSHIFTEQUAL,
+ ")" : token.RPAR,
+ "]" : token.RSQB,
+ ";" : token.SEMI,
+ "/" : token.SLASH,
+ "/=" : token.SLASHEQUAL,
+ "*" : token.STAR,
+ "*=" : token.STAREQUAL,
+ "~" : token.TILDE,
+ "|" : token.VBAR,
+ "|=" : token.VBAREQUAL,
+ }
+
+
+
+
+class SyntaxNode(object):
+ """A syntax node"""
+ def __init__(self, name, source, *args):
+ self.name = name
+ self.nodes = list(args)
+ self.lineno = source.current_line()
+
+ def dumptree(self, treenodes, indent):
+ treenodes.append(self.name)
+ if len(self.nodes) > 1:
+ treenodes.append(" -> (\n")
+ treenodes.append(indent+" ")
+ for node in self.nodes:
+ node.dumptree(treenodes, indent+" ")
+ treenodes.append(")\n")
+ treenodes.append(indent)
+ elif len(self.nodes) == 1:
+ treenodes.append(" ->\n")
+ treenodes.append(indent+" ")
+ self.nodes[0].dumptree(treenodes, indent+" ")
+
+ def dumpstr(self):
+ treenodes = []
+ self.dumptree(treenodes, "")
+ return "".join(treenodes)
+
+ def __repr__(self):
+ return "<node [%s] at 0x%x>" % (self.name, id(self))
+
+ def __str__(self):
+ return "(%s)" % self.name
+
+ def visit(self, visitor):
+ visit_meth = getattr(visitor, "visit_%s" % self.name, None)
+ if visit_meth:
+ return visit_meth(self)
+ # helper function for nodes that have only one subnode:
+ if len(self.nodes) == 1:
+ return self.nodes[0].visit(visitor)
+ raise RuntimeError("Unknonw Visitor for %r" % self.name)
+
+ def expand(self):
+ return [ self ]
+
+ def totuple(self):
+ l = [getattr(symbol, self.name, (0,self.name) )]
+ l += [node.totuple() for node in self.nodes]
+ return tuple(l)
+
+
+class TempSyntaxNode(SyntaxNode):
+ """A temporary syntax node to represent intermediate rules"""
+ def expand(self):
+ return self.nodes
+
+class TokenNode(SyntaxNode):
+ """A token node"""
+ def __init__(self, name, source, value):
+ SyntaxNode.__init__(self, name, source)
+ self.value = value
+
+ def dumptree(self, treenodes, indent):
+ if self.value:
+ treenodes.append("%s='%s' (%d) " % (self.name, self.value, self.lineno))
+ else:
+ treenodes.append("'%s' (%d) " % (self.name, self.lineno))
+
+ def __repr__(self):
+ if self.value is not None:
+ return "<%s=%s>" % ( self.name, repr(self.value))
+ else:
+ return "<%s!>" % (self.name,)
+
+ def totuple(self):
+ num = TOKEN_MAP.get(self.name, -1)
+ if num == -1:
+ print "Unknown", self.name, self.value
+ if self.value is not None:
+ val = self.value
+ else:
+ if self.name not in ("NEWLINE", "INDENT", "DEDENT", "ENDMARKER"):
+ val = self.name
+ else:
+ val = self.value or ''
+ return (num, val)
Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_1.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_1.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1,3 @@
+
+x = y + 1
+
Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_2.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_2.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1,16 @@
+
+
+L = []
+print L[0:10]
+
+def f():
+ print 1
+ # commentaire foireux
+x = 1
+s = "asd"
+
+class A:
+ def f():
+ pass
+
+
Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_3.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_3.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1 @@
+a[1:]
Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_4.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_4.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1 @@
+a is not None
Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_comment.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_comment.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1,6 @@
+x = 0x1L # comment
+a = 1 # yo
+ # hello
+# world
+a = 2
+# end
Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_encoding_declaration.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_encoding_declaration.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1,2 @@
+# -*- coding: ISO-8859-1 -*-
+a = 1 # keep this statement for now (see test_only_one_comment.py)
Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_encoding_declaration2.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_encoding_declaration2.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1,3 @@
+#!/usr/bin/env python
+# coding: ISO_LATIN_1
+a = 1
Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_encoding_declaration3.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_encoding_declaration3.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1,5 @@
+
+
+# coding: ISO-8859-1
+# encoding on the third line <=> no encoding
+a = 1
Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_function_calls.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_function_calls.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1,11 @@
+f()
+f(a)
+f(a,)
+f(a,b)
+f(a, b,)
+f(*args)
+f(**kwargs)
+f(*args, **kwargs)
+f(a, *args, **kwargs)
+f(a, b, *args, **kwargs)
+a = 1
Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_generator.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_generator.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1,3 @@
+def f(n):
+ for i in range(n):
+ yield n
Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_import_statements.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_import_statements.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1,2 @@
+import os
+import os.path as osp
Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_list_comps.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_list_comps.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1,4 @@
+[i for i in range(10) if i%2 == 0]
+# same list on several lines
+[i for i in range(10)
+ if i%2 == 0]
Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_numbers.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_numbers.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1,8 @@
+a = 1
+a = -1
+a = 1.
+a = .2
+a = 1.2
+a = 1e3
+a = 1.3e4
+a = -1.3
Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_ony_one_comment.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_ony_one_comment.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1 @@
+# only one comment
Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_redirected_prints.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_redirected_prints.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1 @@
+print >> f
Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_samples.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_samples.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1,96 @@
+
+
+
+import os, os.path as osp
+import sys
+from ebnf import parse_grammar
+from python import python_parse, pypy_parse, set_debug
+from pprint import pprint
+import grammar
+grammar.DEBUG = False
+from symbol import sym_name
+
+
+def name(elt):
+ return "%s[%d]"% (sym_name.get(elt,elt),elt)
+
+def read_samples_dir():
+ return [osp.join('samples', fname) for fname in os.listdir('samples')
+ if fname.endswith('.py')]
+
+
+def print_sym_tuple( tup ):
+ print "\n(",
+ for elt in tup:
+ if type(elt)==int:
+ print name(elt),
+ elif type(elt)==str:
+ print repr(elt),
+ else:
+ print_sym_tuple(elt)
+ print ")",
+
+def assert_tuples_equal(tup1, tup2, curpos = (), disp=""):
+ if disp:
+ print "\n"+disp+"(",
+ for index, (elt1, elt2) in enumerate(zip(tup1, tup2)):
+ if disp and elt1==elt2 and type(elt1)==int:
+ print name(elt1),
+ if elt1 != elt2:
+ if type(elt1) is tuple and type(elt2) is tuple:
+ if disp:
+ disp=disp+" "
+ assert_tuples_equal(elt1, elt2, curpos + (index,), disp)
+ print
+ print "TUP1"
+ print_sym_tuple(tup1)
+ print
+ print "TUP2"
+ print_sym_tuple(tup2)
+
+ raise AssertionError('Found difference at %s : %s != %s' %
+ (curpos, name(elt1), name(elt2) ), curpos)
+ if disp:
+ print ")",
+
+def test_samples( samples ):
+ for sample in samples:
+ pypy_tuples = pypy_parse(sample)
+ python_tuples = python_parse(sample)
+ print "="*20
+ print file(sample).read()
+ print "-"*10
+ pprint(pypy_tuples)
+ print "-"*10
+ pprint(python_tuples)
+ try:
+ assert_tuples_equal( python_tuples, pypy_tuples, disp=" " )
+ assert python_tuples == pypy_tuples
+ except AssertionError,e:
+ print
+ print "python_tuples"
+ show( python_tuples, e.args[-1] )
+ print
+ print "pypy_tuples"
+ show( pypy_tuples, e.args[-1] )
+ raise
+
+
+def show( tup, idxs ):
+ for level, i in enumerate(idxs):
+ print " "*level , tup
+ tup=tup[i]
+ print tup
+
+if __name__=="__main__":
+ import getopt
+ opts, args = getopt.getopt( sys.argv[1:], "d:", [] )
+ for opt, val in opts:
+ if opt=="-d":
+ set_debug(int(val))
+ if args:
+ samples = args
+ else:
+ samples = read_samples_dir()
+
+ test_samples( samples )
Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_simple_assignment.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_simple_assignment.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1 @@
+x = 1
Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_simple_class.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_simple_class.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1,12 @@
+class A:
+
+ def with_white_spaces_before(self):
+ pass
+
+
+ def another_method(self, foo):
+ """with a docstring
+ on several lines
+ # with a sharpsign
+ """
+ self.bar = foo
Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_simple_for_loop.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_simple_for_loop.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1,3 @@
+for x in range(10):
+ pass
+
Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_simple_in_test.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_simple_in_test.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1 @@
+x in range(10)
Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_slice.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_slice.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1 @@
+a[1:]
Added: pypy/dist/pypy/module/parser/recparser/test/samples/test_whitespaces.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/samples/test_whitespaces.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1,2 @@
+l = []
+l . append ( 12 )
Added: pypy/dist/pypy/module/parser/recparser/test/test_pytokenizer.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/test_pytokenizer.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1,111 @@
+import unittest
+from python.lexer import PythonSource, py_number, g_symdef, g_string, py_name, \
+ py_comment, py_ws, py_punct
+
+class TokenValPair(tuple):
+ token = 'Override me'
+ def __new__(cls, val = None):
+ return tuple.__new__(cls, (cls.token, val))
+
+TokenMap = {
+ 'Equals' : "=",
+ 'NonePair' : None,
+ }
+ctx = globals()
+for classname in ('Number', 'String', 'EndMarker', 'NewLine', 'Dedent', 'Name',
+ 'Equals', 'NonePair', 'SymDef', 'Symbol'):
+ classdict = {'token' : TokenMap.get(classname, classname.upper())}
+ ctx[classname] = type(classname, (TokenValPair,), classdict)
+
+
+PUNCTS = [ '>=', '<>', '!=', '<', '>', '<=', '==', '*=',
+ '//=', '%=', '^=', '<<=', '**=', '|=',
+ '+=', '>>=', '=', '&=', '/=', '-=', ',', '^',
+ '>>', '&', '+', '*', '-', '/', '.', '**',
+ '%', '<<', '//', '|', ')', '(', ';', ':',
+ '@', '[', ']', '`', '{', '}',
+ ]
+
+
+BAD_SYNTAX_STMTS = [
+ # "yo yo",
+ """for i in range(10):
+ print i
+ print 'bad dedent here'""",
+ """for i in range(10):
+ print i
+ print 'Bad indentation here'""",
+ ]
+
+def parse_source(source):
+ lexer = PythonSource(source)
+ tokens = []
+ last_token = ''
+ while last_token != 'ENDMARKER':
+ last_token, value = lexer.next()
+ tokens.append((last_token, value))
+ return tokens
+
+
+NUMBERS = [
+ '1', '1.23', '1.', '0',
+ '1L', '1l',
+ '0x12L', '0x12l', '0X12', '0x12',
+ '1j', '1J',
+ '1e2', '1.2e4',
+ '0.1', '0.', '0.12', '.2',
+ ]
+
+BAD_NUMBERS = [
+ 'j', '0xg', '0xj', '0xJ',
+ ]
+
+class PythonSourceTC(unittest.TestCase):
+ """ """
+ def setUp(self):
+ pass
+
+ def test_empty_string(self):
+ """make sure defined regexps don't match empty string"""
+ rgxes = {'numbers' : py_number,
+ 'defsym' : g_symdef,
+ 'strings' : g_string,
+ 'names' : py_name,
+ 'punct' : py_punct,
+ }
+ for label, rgx in rgxes.items():
+ self.assert_(rgx.match('') is None, '%s matches empty string' % label)
+
+ def test_several_lines_list(self):
+ """tests list definition on several lines"""
+ s = """['a'
+ ]"""
+ tokens = parse_source(s)
+ self.assertEquals(tokens, [('[', None), ('STRING', "'a'"), (']', None),
+ ('NEWLINE', ''), ('ENDMARKER', None)])
+
+ def test_numbers(self):
+ """make sure all kind of numbers are correctly parsed"""
+ for number in NUMBERS:
+ self.assertEquals(parse_source(number)[0], ('NUMBER', number))
+ neg = '-%s' % number
+ self.assertEquals(parse_source(neg)[:2],
+ [('-', None), ('NUMBER', number)])
+ for number in BAD_NUMBERS:
+ self.assertNotEquals(parse_source(number)[0], ('NUMBER', number))
+
+ def test_hex_number(self):
+ tokens = parse_source("a = 0x12L")
+ self.assertEquals(tokens, [('NAME', 'a'), ('=', None),
+ ('NUMBER', '0x12L'), ('NEWLINE', ''),
+ ('ENDMARKER', None)])
+
+ def test_punct(self):
+ for pstr in PUNCTS:
+ tokens = parse_source( pstr )
+ self.assertEqual( tokens[0][0], pstr )
+
+
+if __name__ == '__main__':
+ unittest.main()
+
Added: pypy/dist/pypy/module/parser/recparser/test/test_samples.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/test_samples.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1,95 @@
+"""test module for CPython / PyPy nested tuples comparison"""
+
+import os, os.path as osp
+import sys
+from ebnf import parse_grammar
+from python import python_parse, pypy_parse, set_debug
+from pprint import pprint
+import grammar
+grammar.DEBUG = False
+from symbol import sym_name
+
+
+def name(elt):
+ return "%s[%s]"% (sym_name.get(elt,elt),elt)
+
+def read_samples_dir():
+ return [osp.join('samples', fname) for fname in os.listdir('samples') if fname.endswith('.py')]
+
+def print_sym_tuple(nested, level=0, limit=15, names=False, trace=()):
+ buf = []
+ if level <= limit:
+ buf.append("%s(" % (" "*level))
+ else:
+ buf.append("(")
+ for index, elt in enumerate(nested):
+ # Test if debugging and if on last element of error path
+ if trace and not trace[1:] and index == trace[0]:
+ buf.append('\n----> ')
+ if type(elt) is int:
+ if names:
+ buf.append(name(elt))
+ else:
+ buf.append(str(elt))
+ buf.append(', ')
+ elif type(elt) is str:
+ buf.append(repr(elt))
+ else:
+ if level < limit:
+ buf.append('\n')
+ buf.extend(print_sym_tuple(elt, level+1, limit,
+ names, trace[1:]))
+ buf.append(')')
+ return buf
+
+def assert_tuples_equal(tup1, tup2, curpos = ()):
+ for index, (elt1, elt2) in enumerate(zip(tup1, tup2)):
+ if elt1 != elt2:
+ if type(elt1) is tuple and type(elt2) is tuple:
+ assert_tuples_equal(elt1, elt2, curpos + (index,))
+ raise AssertionError('Found difference at %s : %s != %s' %
+ (curpos, name(elt1), name(elt2) ), curpos)
+
+from time import time, clock
+def test_samples( samples ):
+ time_reports = {}
+ for sample in samples:
+ print "testing", sample
+ tstart1, cstart1 = time(), clock()
+ pypy_tuples = pypy_parse(sample)
+ tstart2, cstart2 = time(), clock()
+ python_tuples = python_parse(sample)
+ time_reports[sample] = (time() - tstart2, tstart2-tstart1, clock() - cstart2, cstart2-cstart1 )
+ #print "-"*10, "PyPy parse results", "-"*10
+ #print ''.join(print_sym_tuple(pypy_tuples, names=True))
+ #print "-"*10, "CPython parse results", "-"*10
+ #print ''.join(print_sym_tuple(python_tuples, names=True))
+ print
+ try:
+ assert_tuples_equal(pypy_tuples, python_tuples)
+ except AssertionError,e:
+ error_path = e.args[-1]
+ print "ERROR PATH =", error_path
+ print "="*80
+ print file(sample).read()
+ print "="*80
+ print "-"*10, "PyPy parse results", "-"*10
+ print ''.join(print_sym_tuple(pypy_tuples, names=True, trace=error_path))
+ print "-"*10, "CPython parse results", "-"*10
+ print ''.join(print_sym_tuple(python_tuples, names=True, trace=error_path))
+ print "Failed on (%s)" % sample
+ # raise
+ pprint(time_reports)
+
+if __name__=="__main__":
+ import getopt
+ opts, args = getopt.getopt( sys.argv[1:], "d:", [] )
+ for opt, val in opts:
+ if opt == "-d":
+ set_debug(int(val))
+ if args:
+ samples = args
+ else:
+ samples = read_samples_dir()
+
+ test_samples( samples )
Added: pypy/dist/pypy/module/parser/recparser/test/test_samples2.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/test/test_samples2.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1,70 @@
+"""test module for CPython / PyPy nested tuples comparison"""
+import os, os.path as osp
+from python import python_parse, pypy_parse
+from pprint import pprint
+import grammar
+grammar.DEBUG = False
+from symbol import sym_name
+
+def name(elt):
+ return "%s[%s]"% (sym_name.get(elt,elt),elt)
+
+def print_sym_tuple(nested, level=0, limit=15, names=False, trace=()):
+ buf = []
+ if level <= limit:
+ buf.append("%s(" % (" "*level))
+ else:
+ buf.append("(")
+ for index, elt in enumerate(nested):
+ # Test if debugging and if on last element of error path
+ if trace and not trace[1:] and index == trace[0]:
+ buf.append('\n----> ')
+ if type(elt) is int:
+ if names:
+ buf.append(name(elt))
+ else:
+ buf.append(str(elt))
+ buf.append(', ')
+ elif type(elt) is str:
+ buf.append(repr(elt))
+ else:
+ if level < limit:
+ buf.append('\n')
+ buf.extend(print_sym_tuple(elt, level+1, limit,
+ names, trace[1:]))
+ buf.append(')')
+ return buf
+
+def assert_tuples_equal(tup1, tup2, curpos = ()):
+ for index, (elt1, elt2) in enumerate(zip(tup1, tup2)):
+ if elt1 != elt2:
+ if type(elt1) is tuple and type(elt2) is tuple:
+ assert_tuples_equal(elt1, elt2, curpos + (index,))
+ raise AssertionError('Found difference at %s : %s != %s\n' %
+ (curpos, name(elt1), name(elt2) ), curpos)
+
+def test_samples():
+ samples_dir = osp.join(osp.dirname(__file__), 'samples')
+ for fname in os.listdir(samples_dir):
+ if not fname.endswith('.py'):
+ continue
+ abspath = osp.join(samples_dir, fname)
+ yield check_parse, abspath
+
+def check_parse(filepath):
+ pypy_tuples = pypy_parse(filepath)
+ python_tuples = python_parse(filepath)
+ try:
+ assert_tuples_equal(pypy_tuples, python_tuples)
+ except AssertionError, e:
+ error_path = e.args[-1]
+ print "ERROR PATH =", error_path
+ print "="*80
+ print file(filepath).read()
+ print "="*80
+ print "-"*10, "PyPy parse results", "-"*10
+ print ''.join(print_sym_tuple(pypy_tuples, names=True, trace=error_path))
+ print "-"*10, "CPython parse results", "-"*10
+ print ''.join(print_sym_tuple(python_tuples, names=True, trace=error_path))
+ assert False, filepath
+
Added: pypy/dist/pypy/module/parser/recparser/tools/tokenize.py
==============================================================================
--- (empty file)
+++ pypy/dist/pypy/module/parser/recparser/tools/tokenize.py Mon Apr 25 16:03:44 2005
@@ -0,0 +1,15 @@
+
+import sys
+from python.lexer import PythonSource
+
+
+def parse_file(filename):
+ f = file(filename).read()
+ src = PythonSource(f)
+ token = src.next()
+ while token!=("ENDMARKER",None) and token!=(None,None):
+ print token
+ token = src.next()
+
+if __name__ == '__main__':
+ parse_file(sys.argv[1])
More information about the Pypy-commit
mailing list