[Patches] [ python-Patches-1440601 ] Add col information to parse & ast nodes

SourceForge.net noreply at sourceforge.net
Wed Mar 1 21:54:33 CET 2006


Patches item #1440601, was opened at 2006-02-28 21:36
Message generated for change (Comment added) made by jpe
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1440601&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Parser/Compiler
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Submitted By: John Ehresman (jpe)
Assigned to: Martin v. Löwis (loewis)
Summary: Add col information to parse & ast nodes

Initial Comment:
This adds fields to the parser to capture the column
where each token starts and each ast node starts (this
is defined as the initial token in the ast node).  With
this it's reasonably easy to extract the text that ast
nodes are based on.

The patch is incomplete, will probably change a bit,
and lacks tests, but I wanted to get feedback on a few
questions.

* The byte offset of the column position is what is
being recorded.  I wonder now if the unicode character
position should be recorded.  This will slow things
down somewhat, but the performance loss may not be
signficant.

* I changed the signature of PyNode_AddChild and
PyParse_AddToken.  Is this permitted or do new
functions need to be created so that the old signatures
are preserved.

* Where should I put a function that given an ast tree
and the source text will add the text that each node is
based on?  This will be a python function (I'm pretty
sure) so it's not easily put in the _ast module.

Note that generated files are omitted from the patch.

----------------------------------------------------------------------

>Comment By: John Ehresman (jpe)
Date: 2006-03-01 20:54

Message:
Logged In: YES 
user_id=22785

Updated patch that includes some tests and documentation. 
The slightly tricky part is the col_offset of an Attribute
node -- it was being set to the start of the attribute and
after the initial name.  Now it points to the start of the
initial name.  I think we need to wait for some use cases to
determine if any more positional information is needed.  I
suspect some uses may want the positions of each identifier,
which is not easily obtainable right now.

Includes change to asdl.py to return attributes in the order
specified in the .asdl file.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2006-02-28 22:39

Message:
Logged In: YES 
user_id=21627

- the byte offset is actually a UTF-8 byte offset. That should be documented, in 
the grammar, and perhaps elaborated in libast.tex.
- changing the signatures is fine; it is unlikely that anybody calls this API, and if 
they do, the compiler will tell them.
- applications of the AST should go into Demo/parser.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1440601&group_id=5470


More information about the Patches mailing list