Re: [code-quality] RedBaron, a bottom-up refactoring lib/tool for python

24 Nov 2014

      2014-11-17 21:00 GMT+01:00 Laurent Peuch :
...
Hello everyone and thanks for your answers :)
...
...
...
I'm currently working on the integration of the lib2to3 parser into
Jedi.
This would make refactoring really easy (I'm about 50% done with the
parser). It's also well tested and offers a few other advantages.
In a perfect world, we could now combine our projects :-) I will
look in
detail at Red Baron on Monday.
David, we've been talking about this during the latest EuroPython, and
I've
talked with Laurent yesterday at the Capitole du Libre in Toulouse:
IMO we
could
start by extracting from lib2to3 "the" parser that could be used by
every
tools
like ours (refactoring, completion, static analysis...). It would be:
* loss-less (comments, indents...)
* accurate (eg from/to line numbers)
* fast
* version agnostic within a reasonable frame (eg 2.7 -> 3.4?)
Yes, that's what I'm trying to create now (based on lib2to3). However my
biggest problem is that I have to rewrite my evaluation engine as well,
because it was depending on the old parser. I have two additional
constraints:
- decent error recovery
- memory efficient
The "fast" part is something I'm very eager to implement. I have done
this
before with my old parser. My approach is to re-parse only the parts of
the
file that have changed.
My idea of "version agnostic" is to have the parser and the evaluation
engine try to adhere to one version. My goal is to give Jedi a Python
version number and Jedi would work according to that. This would make
linting really cool, because you can run the linter for different Python
versions.
I would recommend you to wait until I have finished my parser (1-2
months)
or can at least report back. You can then either take the parser and fork
it or take the evaluation engine as well.
Well, wouldn't it be a good idea if instead of us waiting 1-2 months
that you finish your parser then seeing if it also fits our needs we
discuss a bit of what each one of us needs and see if we could find an
agreement on what could be a common ST for all of us?
All right. I have cleaned it up a little bit:

https://github.com/davidhalter/jedi/tree/db76bbccc58729426cb39a6373e986139ea...

This is the latest parser branch. The parser itself is still working pretty
well. There's still a lot of old code lurking around, that I'm not deleting
yet (so I don't forget what I still need to add).

Few notes about the files:

- __init__.py contains the handlers. More about that later.
- pgen2 contains a pretty much unchanged lib2to3 parser.
- tree.py contains the whole business logic. It's all about searching the
tree. I'm pretty open to add more helpers. I'm just not needing more at the
moment.
- fast.py will contain a faster version of the parser (not working right
now). I'm going to do this by caching parts of the file.
- tokenize.py is Jedi's "old" tokenizer. I will probably replace pgen2's
tokenizer with this one to improve error recovery.
- grammar.txt etc. are the files with the Python grammar.
- user_context.py is Jedi related, will be partially rewritten and helps
with understanding messy code that the parser doesn't understand.

It's my goal to support multiple grammars at the same time. It should be
possible to parse 2.7 while still being able to parse 3.4 in the same
process (thread-safe). The same goal applies to Jedi: I want the user be
able to choose the Python version for the evaluation of code as well.
Therefore the parser has a `grammar` argument: Parser(grammar, source,
module_path=None, tokenizer=None).

There's a few design decisions that I took:

- Like lib2to3 I am creating nodes only if they have more than one child.
This is very important for performance reasons. There's an exception
though: ExprStmt is always created (I might remove this "feature" again).
- As you can see in tree.py (and also in jedi.parser.Parser._ast_mapping:
There's classes for nodes like functions, classes, params and others, while
there are no classes for `xor_expr`, `and_expr` and so on. This has been a
very good solution for Jedi. It makes business logic possible for the
classes where we need it, but at the same time doesn't bloat tree.py. The
children attribute is available anyway. I might also add a type attribute
(class attribute) to all the classes.

Glad to hear any feedback. Also really happy to reverse design decision or
change some things fundamentally. Just don't complain about the "messiness"
too much, that will get better :-)

~ Dave

PS: `Simple` is an old Jedi class. I will rename it to `BaseNode` later.
...
I can totally understand that discussing this right now might not be
the most appealing idea but maybe think a bit about it of the benefit
that we could get from a common AST for all of us:
* only one code base to maintain instead of everyone doing his
  own parser on his side
* we could share our efforts on making it as good as possible
* more time spend on doing the actual tools than the backend
* a de facto reference for every python developer that wants to join
  the field of tooling: more people -> more tools -> better python in
  general
I really think that we should at least try to see if it's possible,
having this kind of tool would really benefit us and the python
community in general.
What do you think?
--
Laurent Peuch -- Bram

Re: [code-quality] RedBaron, a bottom-up refactoring lib/tool for python

Dave Halter