2014-11-17 21:00 GMT+01:00 Laurent Peuch
Hello everyone and thanks for your answers :)
I'm currently working on the integration of the lib2to3 parser into Jedi. This would make refactoring really easy (I'm about 50% done with the parser). It's also well tested and offers a few other advantages.
In a perfect world, we could now combine our projects :-) I will look in detail at Red Baron on Monday.
David, we've been talking about this during the latest EuroPython, and I've talked with Laurent yesterday at the Capitole du Libre in Toulouse: IMO we could start by extracting from lib2to3 "the" parser that could be used by every tools like ours (refactoring, completion, static analysis...). It would be: * loss-less (comments, indents...) * accurate (eg from/to line numbers) * fast * version agnostic within a reasonable frame (eg 2.7 -> 3.4?)
Yes, that's what I'm trying to create now (based on lib2to3). However my biggest problem is that I have to rewrite my evaluation engine as well, because it was depending on the old parser. I have two additional constraints:
- decent error recovery - memory efficient
The "fast" part is something I'm very eager to implement. I have done this before with my old parser. My approach is to re-parse only the parts of the file that have changed. My idea of "version agnostic" is to have the parser and the evaluation engine try to adhere to one version. My goal is to give Jedi a Python version number and Jedi would work according to that. This would make linting really cool, because you can run the linter for different Python versions.
I would recommend you to wait until I have finished my parser (1-2 months) or can at least report back. You can then either take the parser and fork it or take the evaluation engine as well.
Well, wouldn't it be a good idea if instead of us waiting 1-2 months that you finish your parser then seeing if it also fits our needs we discuss a bit of what each one of us needs and see if we could find an agreement on what could be a common ST for all of us?
All right. I have cleaned it up a little bit: https://github.com/davidhalter/jedi/tree/db76bbccc58729426cb39a6373e986139ea... This is the latest parser branch. The parser itself is still working pretty well. There's still a lot of old code lurking around, that I'm not deleting yet (so I don't forget what I still need to add). Few notes about the files: - __init__.py contains the handlers. More about that later. - pgen2 contains a pretty much unchanged lib2to3 parser. - tree.py contains the whole business logic. It's all about searching the tree. I'm pretty open to add more helpers. I'm just not needing more at the moment. - fast.py will contain a faster version of the parser (not working right now). I'm going to do this by caching parts of the file. - tokenize.py is Jedi's "old" tokenizer. I will probably replace pgen2's tokenizer with this one to improve error recovery. - grammar.txt etc. are the files with the Python grammar. - user_context.py is Jedi related, will be partially rewritten and helps with understanding messy code that the parser doesn't understand. It's my goal to support multiple grammars at the same time. It should be possible to parse 2.7 while still being able to parse 3.4 in the same process (thread-safe). The same goal applies to Jedi: I want the user be able to choose the Python version for the evaluation of code as well. Therefore the parser has a `grammar` argument: Parser(grammar, source, module_path=None, tokenizer=None). There's a few design decisions that I took: - Like lib2to3 I am creating nodes only if they have more than one child. This is very important for performance reasons. There's an exception though: ExprStmt is always created (I might remove this "feature" again). - As you can see in tree.py (and also in jedi.parser.Parser._ast_mapping: There's classes for nodes like functions, classes, params and others, while there are no classes for `xor_expr`, `and_expr` and so on. This has been a very good solution for Jedi. It makes business logic possible for the classes where we need it, but at the same time doesn't bloat tree.py. The children attribute is available anyway. I might also add a type attribute (class attribute) to all the classes. Glad to hear any feedback. Also really happy to reverse design decision or change some things fundamentally. Just don't complain about the "messiness" too much, that will get better :-) ~ Dave PS: `Simple` is an old Jedi class. I will rename it to `BaseNode` later.
I can totally understand that discussing this right now might not be the most appealing idea but maybe think a bit about it of the benefit that we could get from a common AST for all of us:
* only one code base to maintain instead of everyone doing his own parser on his side * we could share our efforts on making it as good as possible * more time spend on doing the actual tools than the backend * a de facto reference for every python developer that wants to join the field of tooling: more people -> more tools -> better python in general
I really think that we should at least try to see if it's possible, having this kind of tool would really benefit us and the python community in general.
What do you think?
--
Laurent Peuch -- Bram