RedBaron, a bottom-up refactoring lib/tool for python
Hello everyone, Someone has suggested me to talk about the project I'm working on right now on this mailing list because this has a lot of chances to interest you. This tool is an answer to a frustration that I've had while trying to build tools for python and projects I was working on. While there is already good capacities in python to analyse code (ast.py, astroid (while it wasn't out at that time)), the "write code that modify source code" was really missing (in my opinion and my knowledge of the existing tools). I wanted a very intuitive and easy to use library that allows me to query and modify my source code only in the place I wanted to modify it without touching the rest of the code. So I've built what can be describe as "the BeautifulSoup of python source code". To do so, I've built what can be called "a lossless AST" for python (designed to be used by humans), an AST that satisfy this equation: source_code == ast_to_source(source_to_ast(source_code)) It produces json-serializable python data structures (because data structures are easier to use and don't hide anything from you). And now the part that should interest you more: on top on that AST, I've built an high level "query and modification" library that wraps those AST nodes into objects. I've put a lot of efforts in making this library intuitive and very easy to use while removing you the burden of having to deal with low level details. This "BeautifulSoup of the python source code" is called Redbaron. It looks like this: from redbaron import RedBaron # simple API # pass string red = RedBaron("some_value = 42") # get string back red.dumps() Queries are like BeautifulSoup: red.find("int", value=4) red.find_all("def", name="stuff") (You can pass lambda/regex/special syntaxe for globs/regex etc... to queries, they should be powerful enough for the vast majorities of your needs). Nodes modification is very simple: just pass source code stored in string and "voilà": red = RedBaron("some_value = 42") red[0].value = "1 + 1" # some_value = 1 + 1 red = RedBaron("def stuff():\n plop") red[0].value = "some_code" # def stuff():\n some_code # notice that the input is correctly formatting, indented and it # also takes care of not breaking the next node indentation # works too with decorators and where you expect it to works (It is possible to pass it ast datastructure or RedBaron objects to). And I've made an abstraction on top of "list of things" so you don't have to take care about when you need to put a separator (for eg: a "," in a list): red = RedBaron("[1, 2, 3]") red[0].append("plop") # [1, 2, 3, plop] # want to add a new django app to INSTALLED_APPS? just do: red.find("assignment", target=lambda x: x.dumps() == "INSTALLED_APPLICATIONS").value.append("'another_app'") # notice that the formatting of the list is detected # want to add "@profile" to every function of the root level for # line_profiler? red('def', recursive=False).map(lambda x: x.decorators.insert(0, '@profile')) # and remove them red("decorator", lambda x: x.dumps() == "@decorator").map(lambda x: x.parent.parent.decorators.remove(x)) # convert every "print a" to "logger.debug(a) red('print', value=lambda x: len(x) == 1).map(lambda x: x.replace('logger.debug(%s)' % x.value.dumps()) # and print a, b, c to logger.debug("%s %s %s" % (a, b, c)) red('print', value=lambda x: len(x) == 1).map(lambda x: x.replace('logger.debug("%s" % (%s))' % (" ".join('%s' * len(x.value))) Both library and fully tested (more than 2000 tests in total), fully *documented* (with lots of examples) and under freesoftware licences. I consider RedBaron to be in alpha stage, it is already very stable but a significant number of edge cases are probably not handled yet. Important point: RedBaron is not and will not do static analysis, I'm probably going to integrate (or integrate RedBaron into) a tool that already do that like astroid or rope. Links: * RedBaron tutorial: https://redbaron.readthedocs.org/en/latest/tuto.html * RedBaron documentation: https://redbaron.readthedocs.org * RedBaron source code: https://github.com/psycojoker/redbaron * Baron (the AST) source code: https://github.com/psycojoker/baron * Baron documentation: https://baron.readthedocs.org I hope that I have trigger your interest and I'm very interested by your feedback, Have a nice day and thanks for your time, PS: I've only been aware of the capacities of lib2to3 since 2 months and was very unhappy to discover it so late (I've spent months or googling before deciding to start this project), I'll probably swap my parser with lib2to3 one in the future. -- Laurent Peuch -- Bram
Hi Laurent Great to see somebody finally tackling refactoring. I'm answering, because I think we're working on the same issue. But we have finished two different parts: You have finished a refactoring implementation and I have finished the static analysis part. I'm the author of Jedi. https://github.com/davidhalter/jedi/ I'm currently working on the integration of the lib2to3 parser into Jedi. This would make refactoring really easy (I'm about 50% done with the parser). It's also well tested and offers a few other advantages. In a perfect world, we could now combine our projects :-) I will look in detail at Red Baron on Monday. ~ Dave PS: If you want to use a tool for static analysis, please use either Jedi or astroid, I don't think rope still makes sense, because the project is inactive. 2014-11-14 13:05 GMT+01:00 Laurent Peuch <cortex@worlddomination.be>:
Hello everyone,
Someone has suggested me to talk about the project I'm working on right now on this mailing list because this has a lot of chances to interest you.
This tool is an answer to a frustration that I've had while trying to build tools for python and projects I was working on. While there is already good capacities in python to analyse code (ast.py, astroid (while it wasn't out at that time)), the "write code that modify source code" was really missing (in my opinion and my knowledge of the existing tools).
I wanted a very intuitive and easy to use library that allows me to query and modify my source code only in the place I wanted to modify it without touching the rest of the code. So I've built what can be describe as "the BeautifulSoup of python source code".
To do so, I've built what can be called "a lossless AST" for python (designed to be used by humans), an AST that satisfy this equation:
source_code == ast_to_source(source_to_ast(source_code))
It produces json-serializable python data structures (because data structures are easier to use and don't hide anything from you).
And now the part that should interest you more: on top on that AST, I've built an high level "query and modification" library that wraps those AST nodes into objects. I've put a lot of efforts in making this library intuitive and very easy to use while removing you the burden of having to deal with low level details. This "BeautifulSoup of the python source code" is called Redbaron.
It looks like this:
from redbaron import RedBaron
# simple API
# pass string red = RedBaron("some_value = 42")
# get string back red.dumps()
Queries are like BeautifulSoup:
red.find("int", value=4) red.find_all("def", name="stuff")
(You can pass lambda/regex/special syntaxe for globs/regex etc... to queries, they should be powerful enough for the vast majorities of your needs).
Nodes modification is very simple: just pass source code stored in string and "voilà":
red = RedBaron("some_value = 42") red[0].value = "1 + 1" # some_value = 1 + 1
red = RedBaron("def stuff():\n plop") red[0].value = "some_code" # def stuff():\n some_code
# notice that the input is correctly formatting, indented and it # also takes care of not breaking the next node indentation # works too with decorators and where you expect it to works
(It is possible to pass it ast datastructure or RedBaron objects to).
And I've made an abstraction on top of "list of things" so you don't have to take care about when you need to put a separator (for eg: a "," in a list):
red = RedBaron("[1, 2, 3]") red[0].append("plop") # [1, 2, 3, plop]
# want to add a new django app to INSTALLED_APPS? just do: red.find("assignment", target=lambda x: x.dumps() == "INSTALLED_APPLICATIONS").value.append("'another_app'") # notice that the formatting of the list is detected
# want to add "@profile" to every function of the root level for # line_profiler? red('def', recursive=False).map(lambda x: x.decorators.insert(0, '@profile'))
# and remove them red("decorator", lambda x: x.dumps() == "@decorator").map(lambda x: x.parent.parent.decorators.remove(x))
# convert every "print a" to "logger.debug(a) red('print', value=lambda x: len(x) == 1).map(lambda x: x.replace('logger.debug(%s)' % x.value.dumps())
# and print a, b, c to logger.debug("%s %s %s" % (a, b, c)) red('print', value=lambda x: len(x) == 1).map(lambda x: x.replace('logger.debug("%s" % (%s))' % (" ".join('%s' * len(x.value)))
Both library and fully tested (more than 2000 tests in total), fully *documented* (with lots of examples) and under freesoftware licences. I consider RedBaron to be in alpha stage, it is already very stable but a significant number of edge cases are probably not handled yet.
Important point: RedBaron is not and will not do static analysis, I'm probably going to integrate (or integrate RedBaron into) a tool that already do that like astroid or rope.
Links:
* RedBaron tutorial: https://redbaron.readthedocs.org/en/latest/tuto.html * RedBaron documentation: https://redbaron.readthedocs.org * RedBaron source code: https://github.com/psycojoker/redbaron
* Baron (the AST) source code: https://github.com/psycojoker/baron * Baron documentation: https://baron.readthedocs.org
I hope that I have trigger your interest and I'm very interested by your feedback,
Have a nice day and thanks for your time,
PS: I've only been aware of the capacities of lib2to3 since 2 months and was very unhappy to discover it so late (I've spent months or googling before deciding to start this project), I'll probably swap my parser with lib2to3 one in the future.
--
Laurent Peuch -- Bram _______________________________________________ code-quality mailing list code-quality@python.org https://mail.python.org/mailman/listinfo/code-quality
On 15 novembre 16:49, Dave Halter wrote:
Hi Laurent
Hi Laurent, David,
Great to see somebody finally tackling refactoring.
indeed!
I'm answering, because I think we're working on the same issue. But we have finished two different parts: You have finished a refactoring implementation and I have finished the static analysis part. I'm the author of Jedi. https://github.com/davidhalter/jedi/
Could I ask what do you mean by static analysis in the context of a completion library?
I'm currently working on the integration of the lib2to3 parser into Jedi. This would make refactoring really easy (I'm about 50% done with the parser). It's also well tested and offers a few other advantages.
In a perfect world, we could now combine our projects :-) I will look in detail at Red Baron on Monday.
David, we've been talking about this during the latest EuroPython, and I've talked with Laurent yesterday at the Capitole du Libre in Toulouse: IMO we could start by extracting from lib2to3 "the" parser that could be used by every tools like ours (refactoring, completion, static analysis...). It would be: * loss-less (comments, indents...) * accurate (eg from/to line numbers) * fast * version agnostic within a reasonable frame (eg 2.7 -> 3.4?) I guess almost every one on this list would be interested in such a parser, even if most would have to do a second pass on the generated tree to get more "business oriented" tree for their own project. Whatever, we (pylint guys) would be greatly interested. -- Sylvain Thénault, LOGILAB, Paris (01.45.32.03.12) - Toulouse (05.62.17.16.42) Formations Python, Debian, Méth. Agiles: http://www.logilab.fr/formations Développement logiciel sur mesure: http://www.logilab.fr/services CubicWeb, the semantic web framework: http://www.cubicweb.org
Hi guys, I also have done some work in this area [1]. Well, I didnt get into reactoring yet... Anyway, I thought you might be interested. My approach was to convert python code to XML (and back to python) This allow any generic XML tool to be used to manipulate the XML-code, and then convert back to python without the need to create specific tools. The implementation uses python's AST tree and the tokenize module to retrieve information that is thrown away by the AST. This way I didnt need to implement my own python parser. It ended up being much more work than I expected but I still believe thats better than maitaining another parser. [1] http://pythonhosted.org/pyRegurgitator/#py2xml-experimental https://github.com/schettino72/pyRegurgitator cheers, Eduardo On Mon, Nov 17, 2014 at 5:01 PM, Sylvain Thénault < sylvain.thenault@logilab.fr> wrote:
On 15 novembre 16:49, Dave Halter wrote:
Hi Laurent
Hi Laurent, David,
Great to see somebody finally tackling refactoring.
indeed!
I'm answering, because I think we're working on the same issue. But we have finished two different parts: You have finished a refactoring implementation and I have finished the static analysis part. I'm the author of Jedi. https://github.com/davidhalter/jedi/
Could I ask what do you mean by static analysis in the context of a completion library?
I'm currently working on the integration of the lib2to3 parser into Jedi. This would make refactoring really easy (I'm about 50% done with the parser). It's also well tested and offers a few other advantages.
In a perfect world, we could now combine our projects :-) I will look in detail at Red Baron on Monday.
David, we've been talking about this during the latest EuroPython, and I've talked with Laurent yesterday at the Capitole du Libre in Toulouse: IMO we could start by extracting from lib2to3 "the" parser that could be used by every tools like ours (refactoring, completion, static analysis...). It would be: * loss-less (comments, indents...) * accurate (eg from/to line numbers) * fast * version agnostic within a reasonable frame (eg 2.7 -> 3.4?)
I guess almost every one on this list would be interested in such a parser, even if most would have to do a second pass on the generated tree to get more "business oriented" tree for their own project. Whatever, we (pylint guys) would be greatly interested.
-- Sylvain Thénault, LOGILAB, Paris (01.45.32.03.12) - Toulouse (05.62.17.16.42) Formations Python, Debian, Méth. Agiles: http://www.logilab.fr/formations Développement logiciel sur mesure: http://www.logilab.fr/services CubicWeb, the semantic web framework: http://www.cubicweb.org _______________________________________________ code-quality mailing list code-quality@python.org https://mail.python.org/mailman/listinfo/code-quality
Hello Eduardo,
My approach was to convert python code to XML (and back to python) This allow any generic XML tool to be used to manipulate the XML-code, and then convert back to python without the need to create specific tools.
Funny, you aren't the first one I've found who has taken this approach. This guys is converting lib2to3 output to xml and throwing lxml on it https://github.com/bukzor/RefactorLib I don't know what it's worth, he doesn't seem to have done any advertisement for it and the code is not very big.
The implementation uses python's AST tree and the tokenize module to retrieve information that is thrown away by the AST. This way I didnt need to implement my own python parser. It ended up being much more work than I expected but I still believe thats better than maitaining another parser.
Oh, I understand you so much, I didn't knew how much that would took me to parse python, if I knew I probably would have never done this project. Cheers, -- Laurent Peuch -- Bram
Sylvain, 2014-11-17 10:01 GMT+01:00 Sylvain Thénault <sylvain.thenault@logilab.fr>:
On 15 novembre 16:49, Dave Halter wrote:
Hi Laurent
Hi Laurent, David,
Great to see somebody finally tackling refactoring.
indeed!
I'm answering, because I think we're working on the same issue. But we have finished two different parts: You have finished a refactoring implementation and I have finished the static analysis part. I'm the author of Jedi. https://github.com/davidhalter/jedi/
Could I ask what do you mean by static analysis in the context of a completion library?
Not the "linting" part. The name resolution/tuple assignment/function execution part. I assumed he meant that when he talked about using rope or astroid for static analysis. Jedi's static analysis part is not much further than what I showed at EuroPython, because I'm currently rewriting the parser stuff.
I'm currently working on the integration of the lib2to3 parser into Jedi. This would make refactoring really easy (I'm about 50% done with the parser). It's also well tested and offers a few other advantages.
In a perfect world, we could now combine our projects :-) I will look in detail at Red Baron on Monday.
David, we've been talking about this during the latest EuroPython, and I've talked with Laurent yesterday at the Capitole du Libre in Toulouse: IMO we could start by extracting from lib2to3 "the" parser that could be used by every tools like ours (refactoring, completion, static analysis...). It would be: * loss-less (comments, indents...) * accurate (eg from/to line numbers) * fast * version agnostic within a reasonable frame (eg 2.7 -> 3.4?)
Yes, that's what I'm trying to create now (based on lib2to3). However my biggest problem is that I have to rewrite my evaluation engine as well, because it was depending on the old parser. I have two additional constraints: - decent error recovery - memory efficient The "fast" part is something I'm very eager to implement. I have done this before with my old parser. My approach is to re-parse only the parts of the file that have changed. My idea of "version agnostic" is to have the parser and the evaluation engine try to adhere to one version. My goal is to give Jedi a Python version number and Jedi would work according to that. This would make linting really cool, because you can run the linter for different Python versions. I would recommend you to wait until I have finished my parser (1-2 months) or can at least report back. You can then either take the parser and fork it or take the evaluation engine as well. ~ Dave
I guess almost every one on this list would be interested in such a parser, even if most would have to do a second pass on the generated tree to get more "business oriented" tree for their own project. Whatever, we (pylint guys) would be greatly interested.
-- Sylvain Thénault, LOGILAB, Paris (01.45.32.03.12) - Toulouse (05.62.17.16.42) Formations Python, Debian, Méth. Agiles: http://www.logilab.fr/formations Développement logiciel sur mesure: http://www.logilab.fr/services CubicWeb, the semantic web framework: http://www.cubicweb.org
Hello everyone and thanks for your answers :)
I'm currently working on the integration of the lib2to3 parser into Jedi. This would make refactoring really easy (I'm about 50% done with the parser). It's also well tested and offers a few other advantages.
In a perfect world, we could now combine our projects :-) I will look in detail at Red Baron on Monday.
David, we've been talking about this during the latest EuroPython, and I've talked with Laurent yesterday at the Capitole du Libre in Toulouse: IMO we could start by extracting from lib2to3 "the" parser that could be used by every tools like ours (refactoring, completion, static analysis...). It would be: * loss-less (comments, indents...) * accurate (eg from/to line numbers) * fast * version agnostic within a reasonable frame (eg 2.7 -> 3.4?)
Yes, that's what I'm trying to create now (based on lib2to3). However my biggest problem is that I have to rewrite my evaluation engine as well, because it was depending on the old parser. I have two additional constraints:
- decent error recovery - memory efficient
The "fast" part is something I'm very eager to implement. I have done this before with my old parser. My approach is to re-parse only the parts of the file that have changed. My idea of "version agnostic" is to have the parser and the evaluation engine try to adhere to one version. My goal is to give Jedi a Python version number and Jedi would work according to that. This would make linting really cool, because you can run the linter for different Python versions.
I would recommend you to wait until I have finished my parser (1-2 months) or can at least report back. You can then either take the parser and fork it or take the evaluation engine as well.
Well, wouldn't it be a good idea if instead of us waiting 1-2 months that you finish your parser then seeing if it also fits our needs we discuss a bit of what each one of us needs and see if we could find an agreement on what could be a common ST for all of us? I can totally understand that discussing this right now might not be the most appealing idea but maybe think a bit about it of the benefit that we could get from a common AST for all of us: * only one code base to maintain instead of everyone doing his own parser on his side * we could share our efforts on making it as good as possible * more time spend on doing the actual tools than the backend * a de facto reference for every python developer that wants to join the field of tooling: more people -> more tools -> better python in general I really think that we should at least try to see if it's possible, having this kind of tool would really benefit us and the python community in general. What do you think? -- Laurent Peuch -- Bram
2014-11-17 21:00 GMT+01:00 Laurent Peuch <cortex@worlddomination.be>:
Hello everyone and thanks for your answers :)
I'm currently working on the integration of the lib2to3 parser into Jedi. This would make refactoring really easy (I'm about 50% done with the parser). It's also well tested and offers a few other advantages.
In a perfect world, we could now combine our projects :-) I will look in detail at Red Baron on Monday.
David, we've been talking about this during the latest EuroPython, and I've talked with Laurent yesterday at the Capitole du Libre in Toulouse: IMO we could start by extracting from lib2to3 "the" parser that could be used by every tools like ours (refactoring, completion, static analysis...). It would be: * loss-less (comments, indents...) * accurate (eg from/to line numbers) * fast * version agnostic within a reasonable frame (eg 2.7 -> 3.4?)
Yes, that's what I'm trying to create now (based on lib2to3). However my biggest problem is that I have to rewrite my evaluation engine as well, because it was depending on the old parser. I have two additional constraints:
- decent error recovery - memory efficient
The "fast" part is something I'm very eager to implement. I have done this before with my old parser. My approach is to re-parse only the parts of the file that have changed. My idea of "version agnostic" is to have the parser and the evaluation engine try to adhere to one version. My goal is to give Jedi a Python version number and Jedi would work according to that. This would make linting really cool, because you can run the linter for different Python versions.
I would recommend you to wait until I have finished my parser (1-2 months) or can at least report back. You can then either take the parser and fork it or take the evaluation engine as well.
Well, wouldn't it be a good idea if instead of us waiting 1-2 months that you finish your parser then seeing if it also fits our needs we discuss a bit of what each one of us needs and see if we could find an agreement on what could be a common ST for all of us?
I can totally understand that discussing this right now might not be the most appealing idea but maybe think a bit about it of the benefit that we could get from a common AST for all of us:
* only one code base to maintain instead of everyone doing his own parser on his side * we could share our efforts on making it as good as possible * more time spend on doing the actual tools than the backend * a de facto reference for every python developer that wants to join the field of tooling: more people -> more tools -> better python in general
I really think that we should at least try to see if it's possible, having this kind of tool would really benefit us and the python community in general.
What do you think?
I totally agree. However the Jedi's parser right now (parser branch) is in a very confusing state. There's still a lot of old code lingering around that doesn't help a lot in understanding it. I'm going to clean it until the end of this week and then we can talk about it. The full 1-2 months would include Jedi being fast again + passing all tests. So I don't think we need to wait for that. I'm working full time on this project so I'm progressing quite quickly. ~ Dave
--
Laurent Peuch -- Bram
2014-11-17 21:00 GMT+01:00 Laurent Peuch <cortex@worlddomination.be>:
Hello everyone and thanks for your answers :)
I'm currently working on the integration of the lib2to3 parser into Jedi. This would make refactoring really easy (I'm about 50% done with the parser). It's also well tested and offers a few other advantages.
In a perfect world, we could now combine our projects :-) I will look in detail at Red Baron on Monday.
David, we've been talking about this during the latest EuroPython, and I've talked with Laurent yesterday at the Capitole du Libre in Toulouse: IMO we could start by extracting from lib2to3 "the" parser that could be used by every tools like ours (refactoring, completion, static analysis...). It would be: * loss-less (comments, indents...) * accurate (eg from/to line numbers) * fast * version agnostic within a reasonable frame (eg 2.7 -> 3.4?)
Yes, that's what I'm trying to create now (based on lib2to3). However my biggest problem is that I have to rewrite my evaluation engine as well, because it was depending on the old parser. I have two additional constraints:
- decent error recovery - memory efficient
The "fast" part is something I'm very eager to implement. I have done this before with my old parser. My approach is to re-parse only the parts of the file that have changed. My idea of "version agnostic" is to have the parser and the evaluation engine try to adhere to one version. My goal is to give Jedi a Python version number and Jedi would work according to that. This would make linting really cool, because you can run the linter for different Python versions.
I would recommend you to wait until I have finished my parser (1-2 months) or can at least report back. You can then either take the parser and fork it or take the evaluation engine as well.
Well, wouldn't it be a good idea if instead of us waiting 1-2 months that you finish your parser then seeing if it also fits our needs we discuss a bit of what each one of us needs and see if we could find an agreement on what could be a common ST for all of us?
All right. I have cleaned it up a little bit: https://github.com/davidhalter/jedi/tree/db76bbccc58729426cb39a6373e986139ea... This is the latest parser branch. The parser itself is still working pretty well. There's still a lot of old code lurking around, that I'm not deleting yet (so I don't forget what I still need to add). Few notes about the files: - __init__.py contains the handlers. More about that later. - pgen2 contains a pretty much unchanged lib2to3 parser. - tree.py contains the whole business logic. It's all about searching the tree. I'm pretty open to add more helpers. I'm just not needing more at the moment. - fast.py will contain a faster version of the parser (not working right now). I'm going to do this by caching parts of the file. - tokenize.py is Jedi's "old" tokenizer. I will probably replace pgen2's tokenizer with this one to improve error recovery. - grammar.txt etc. are the files with the Python grammar. - user_context.py is Jedi related, will be partially rewritten and helps with understanding messy code that the parser doesn't understand. It's my goal to support multiple grammars at the same time. It should be possible to parse 2.7 while still being able to parse 3.4 in the same process (thread-safe). The same goal applies to Jedi: I want the user be able to choose the Python version for the evaluation of code as well. Therefore the parser has a `grammar` argument: Parser(grammar, source, module_path=None, tokenizer=None). There's a few design decisions that I took: - Like lib2to3 I am creating nodes only if they have more than one child. This is very important for performance reasons. There's an exception though: ExprStmt is always created (I might remove this "feature" again). - As you can see in tree.py (and also in jedi.parser.Parser._ast_mapping: There's classes for nodes like functions, classes, params and others, while there are no classes for `xor_expr`, `and_expr` and so on. This has been a very good solution for Jedi. It makes business logic possible for the classes where we need it, but at the same time doesn't bloat tree.py. The children attribute is available anyway. I might also add a type attribute (class attribute) to all the classes. Glad to hear any feedback. Also really happy to reverse design decision or change some things fundamentally. Just don't complain about the "messiness" too much, that will get better :-) ~ Dave PS: `Simple` is an old Jedi class. I will rename it to `BaseNode` later.
I can totally understand that discussing this right now might not be the most appealing idea but maybe think a bit about it of the benefit that we could get from a common AST for all of us:
* only one code base to maintain instead of everyone doing his own parser on his side * we could share our efforts on making it as good as possible * more time spend on doing the actual tools than the backend * a de facto reference for every python developer that wants to join the field of tooling: more people -> more tools -> better python in general
I really think that we should at least try to see if it's possible, having this kind of tool would really benefit us and the python community in general.
What do you think?
--
Laurent Peuch -- Bram
Hello Laurent, for my coding conventions, which are a bit unusual, to me e.g. this is the correct function call, if every parameter is passed by keyword argument, and the "=" signs are aligned on the maximum level. I had looked into lib2to3, and found it terribly complex to work with visitors, to pick up enough state information, so I dropped my idea to make an automatic source code formatter for my preferred style. I think Redbaron might be much better suited to achieve that, it seems. However, in the past years, when I did these things, I have come to expose this sort of tree to the users in a way, so that they can make xpath queries. Can you make your tree lxml walkable, and basically allow xml transformations with its API. That way tests on the syntax tree could be expressed more easily. I am, in Nuitka, my Python compiler, exposing an XML dump of the final internal tree, and use tests with xpath queries on it, to e.g. identify print statements of non-constants, to declare these errors. While I don't do enough of these, to me, that is the natural way now of selecting data from a tree. My function calls to modify, and the values to use in there, would be pretty natural xpath queries, I suppose. I would then iterate in a for loop, determine the maximum identifier length of keyword arguments, update the spacing before the equals sign, in the node tree, with more xpath queries to find children. Performance might be a red herring there, but xpath queries that make me avoid touching nodes already properly formatted, would probably also be faster. Most likely performance is not an issue at all. But xpath can be run time compiled in principle. Let me know what you think of that. Yours, Kay
Hello Kay,
for my coding conventions, which are a bit unusual, to me e.g. this is the correct function call, if every parameter is passed by keyword argument, and the "=" signs are aligned on the maximum level. I had looked into lib2to3, and found it terribly complex to work with visitors, to pick up enough state information, so I dropped my idea to make an automatic source code formatter for my preferred style.
I think Redbaron might be much better suited to achieve that, it seems.
Yep, this is totally an use case for it. Actually you can go a level lower and directly do that with baron by writing a custom "dumper". As an example: I'm doing this very approach right now for pyfmt (which is a pretty formatter for python code, like gofmt basically) (WARNING: project is not ready yet). If you need inspiration on how to do that, you can find the code here: Old (but easier to understand but less flexible) approach: https://github.com/Psycojoker/pyfmt/blob/cf97f25138e866fb70fed7414c46801c92c... New approach I'm working on: https://github.com/Psycojoker/pyfmt/blob/5299a3f2233264ced5158a29dd1cb1d1586... The new approach is using this data structure (which describes the order and content of every node): https://github.com/Psycojoker/baron/blob/master/baron/render.py#L114 Documentation: https://baron.readthedocs.org/en/latest/technical.html
[...] Performance might be a red herring there, but xpath queries that make me avoid touching nodes already properly formatted, would probably also be faster. Most likely performance is not an issue at all. But xpath can be run time compiled in principle.
Let me know what you think of that.
Since Baron (the AST) is lossless, as long as you do a lossless convertion between Baron and anything else you can do wathever you want. I actually wanted to do that with xml (as a "joke" to be honest with you, I don't have a very good appreciation of xml while I understand the power of some of its tools) to demonstration this property, I was not expecting people to actually request it. So, my technical opinion on the subject: * yes this is totally possible and quite easy to do, be sure to use this data structure if you want to do it yourself https://github.com/Psycojoker/baron/blob/master/baron/render.py#L114 (documentation: https://baron.readthedocs.org/en/latest/technical.html) * yes, there is quite a lot of chances that performance will take an hit * I will eventually do it (a baron <-> xml convertor lib), but this isn't very high in my todolist * as a practical solution, you will end up with a very low level data structure (just like baron) especially for modifications. Basically, you will face every low level details that I have abstracted for you in RedBaron. If you want to follow this path, I really recommend you to read RedBaron documentation first before doing so to evaluate if the price to pay is worth it. Have a nice weekend, -- Laurent Peuch -- Bram
Hello Laurent, you wrote:
I think Redbaron might be much better suited to achieve that, it seems.
Yep, this is totally an use case for it. Actually you can go a level lower and directly do that with baron by writing a custom "dumper".
Before reading this, I already managed to pretty much immediately get this to work. I checked your pointers now, and I think they wouldn't do what I want to do, because I don't want to change layout, and when I want, I would be highly context sensitive. f( a = 1 bb = 2 ) and f(a = 1, bb = 2) are both OK to me. But in one case, always one space is used, and in other cases, vertical alignment is done. Many more things, where I would be context sensitive. I like e.g. contractions to be written like this: a = [ some_operation(a) for a in range(b) if a != 2 ] unless in cases where I don't do it, typically because there is no merit to reviewing it. So, I am more like patching, or at least detecting cases where the existing white space layout is inconsistent.
Performance might be a red herring there, but xpath queries that make me avoid touching nodes already properly formatted, would probably also be faster. Most likely performance is not an issue at all. But xpath can be run time compiled in principle.
Let me know what you think of that.
Since Baron (the AST) is lossless, as long as you do a lossless convertion between Baron and anything else you can do wathever you want. I actually wanted to do that with xml (as a "joke" to be honest with you, I don't have a very good appreciation of xml while I understand the power of some of its tools) to demonstration this property, I was not expecting people to actually request it.
I appreciate the power of the xpath query language. It's really good at saying find me nodes with attributes of that kind, and children with attributes of another kind, and in a certain range. It doesn't technically have to written as XML ever. But as a query API it's fantastic. I have employed it with decoded binary data that systems output, even wireshark captures, and with internal data recorded. Given a good tree, that's the thing to have for me. It's much like what SQL is to databases. A standard for querying. I am not an XML fanboy otherwise. :-) * I will eventually do it (a baron <-> xml convertor lib), but this
isn't very high in my todolist
This is mainly an idea for you. I am OK with working with the Python based API. I am mailing you with strangeness as I encountered it, but it's absolutely workable, and for a Python programmer, not a big deal.
* as a practical solution, you will end up with a very low level data structure (just like baron) especially for modifications. Basically, you will face every low level details that I have abstracted for you in RedBaron. If you want to follow this path, I really recommend you to read RedBaron documentation first before doing so to evaluate if the price to pay is worth it.
I am not sure, why the XML tree would be all that different from what "node.help()" does. Surely the first, second, third, etc. formatting will simply become children, few people will look at. Yours, Kay
participants (5)
-
Dave Halter
-
Eduardo Schettino
-
Kay Hayen
-
Laurent Peuch
-
Sylvain Thénault