Mailman 3 Jedi 0.9.0 is now a static analysis library - code-quality

newer
assigning node type with function...

Jedi 0.9.0 is now a static analysis library

Dave Halter

30 Apr 2015 30 Apr '15

4:24 a.m.

Hi! This is an announcement to make you consider using Jedi as a static analysis library for your own stuff. If you want to build a refactoring or a linter library for Python you should just be using Jedi, it's doing all the work for you. I have recently released Jedi 0.9.0. This release marks a change in the way how Jedi is structured. In the past Jedi had one objective only: Autocompletion. Now Jedi is a general purpose, high quality static analysis engine. The static analysis capabilities are really good, but it's hard to describe what it does exactly. It's better than any other static analysis engine out there for sure. Jedi's parser is built on the lib2to3 and therefore is able to create a "round-trip" representation of source code. After parsing and modifications to the syntax tree, Jedi echos the exact same source tree with the changes included, which is neat for refactoring. The parser also has built-in error recovery as well as support for different Python versions (Python version is a param). Jedi has become quite powerful. If you're considering Jedi, I'm very happy to get you started. Let's talk! https://github.com/davidhalter/jedi/ ~ Dave

Show replies by date

Kay Hayen

30 Apr 30 Apr

11:16 p.m.

Hello Dave, Jedi has become quite powerful. If you're considering Jedi, I'm very

...

happy to get you started. Let's talk!

The static analysis ignored, how does it compare to RedBaron when it comes to modifications and parsing of course code? Does it provide a bounding box for code constructs? And the README just says "would in theory support re-factoring". Should that be rephrased. I was looking for example code that does it ... Yours, Kay Yours, Kay

Dave Halter

1 May 1 May

12:28 a.m.

Hi Kai Glad you're interested! 2015-04-30 19:46 GMT+02:00 Kay Hayen :

...

Hello Dave,

...
Jedi has become quite powerful. If you're considering Jedi, I'm very happy to get you started. Let's talk!

The static analysis ignored, how does it compare to RedBaron when it comes to modifications and parsing of course code?

I don't know RedBaron very well. Its documentation states that it's still alpha and that Python 3 is not fully supported. That's different in Jedi. It's not alpha quality anymore, very well tested and supports Python 3 (not saying that RedBaron isn't well tested). Jedi uses a slightly modified lib2to3 internally, which is very much battle tested. Jedi also does error recovery. This is not something you __need__ for refactoring, but quite nice. I think the two tools are very similar. The biggest difference is probably static analysis, which you definitely need for certain refactorings. However Jedi definitely has fewer AST functions. The node/leaf objects of Jedi are at the moment quite simple. I'm willing to add functionality there, but only if it's used. Currently there's only the functions there that Jedi needs internally. To support the well known refactorings (e.g. inline/extract name/function), we might need to add a few methods there.

...

Does it provide a bounding box for code constructs?

I'm not really sure what you mean. Jedi knows the exact positions of objects. At the moment there's no method like RedBaron's `bounding_box`. Relative positions could be easily calculated with the current parser. However, I don't know what such a BoundingBox would be doing.

...

And the README just says "would in theory support re-factoring". Should that be rephrased. I was looking for example code that does it ...

You're probably right. I'm going to rephrase it. Refactoring was in an alpha state a few versions ago, when I removed support for it again, because the parser wasn't good enough. Now that I have replaced the parser, it should be much easier to implement it. Hope that was what you wanted to hear! ~ Dave

Kay Hayen

3 May 3 May

3:10 p.m.

Hello Dave, in my Python compiler Nuitka, I am using the "ast" module, and it's working fine for many things, but it lends itself badly to other things. I am condemned to do my own static analysis, as it's intended for optimization. My interest is in the reporting and also auto-format of source code. I would love to be able to report about source code easily for "profile guided optimization", or make annotations about Nuitka's finding in HTML reports from generated output. And I want a coherent code base for my private and work projects, and would like to be able to apply formatting, even function call style changes automatically, but on a programmatic base. I think the two tools are very similar. The biggest difference is

...

probably static analysis, which you definitely need for certain

...

refactorings.

Static analysis is great for auto-format indeed. Being e.g. able to tell that a method is only there for overload because it raises a "must be overloaded" exception, that kind of thing would otherwise be too hard. However Jedi definitely has fewer AST functions. The node/leaf objects

...

of Jedi are at the moment quite simple. I'm willing to add functionality there, but only if it's used. Currently there's only the functions there that Jedi needs internally. To support the well known refactorings (e.g. inline/extract name/function), we might need to add a few methods there.

With RedBaron, I can do this (bare with me on code quality, and by no means assume it's competent RedBaron usage): def updateString(string_node): # Skip doc strings for now. if string_node.parent.type in ("class", "def", None): return value = string_node.value def isQuotedWith(quote): return value.startswith(quote) and value.endswith(quote) for quote in "'''", '"""', "'", '"': if isQuotedWith(quote): break else: assert False, value real_value = value[len(quote):-len(quote)] assert quote + real_value + quote == value if "\n" not in real_value: # Single characters, should be quoted with "'" if len(eval(value)) == 1: if real_value != "'": string_node.value = "'" + real_value + "'" else: if '"' not in real_value: string_node.value = '"' + real_value + '"' And then call this: for node in red.find_all("StringNode"): try: updateString(node) except Exception: print("Problem with", node) node.help(deep = True, with_formatting = True) raise It allows me do enforce some rules according to strings that are not multi-line. My rules there are, do not use triple quites without a new-line, strings of resulting length 1, become 'a' or '\n' and others, use "" or "ab", except of course "'" is valid and 'some "quote" would be too' as well. I do a bunch of these, and like to have these things, to e.g. make sure that my multi-line calling convention lines up the "=" signs nicely, etc, inserting and removing white space from tuples, comma separated stuff, etc. To me, it's not about, if I should do this, but if it can be done.

...

...
Does it provide a bounding box for code constructs?

I'm not really sure what you mean. Jedi knows the exact positions of objects. At the moment there's no method like RedBaron's `bounding_box`. Relative positions could be easily calculated with the current parser. However, I don't know what such a BoundingBox would be doing.

That is the caret, typically used in stack traces, and fits the concept of a cursor. If I go to a report, then I will want to have the bounding box. When e.g. highlighting a function or a call expression, or argument expression, I am not only interested in where it starts, but where it ends. For an XHTML report of Nuitka performance compared to CPython performance on the same code (which is my plan for this autumn, at about the time it starts to make actual sense), with the "ast" module and apparently "jedi", all I get is this: f( arg1(), arg2(), c ** d) + g() And I would like to mouse over or highlight and know where the call to f() ends, where the third argument ends, what the operation "+" entails. Without a bounding box, that falls down. In fact, I would also want some position, like the "+" or "(" to indicate which bounding box I mean. My problem there with the "ast" module boils down to this:

...

...
...
ast.parse("a+b").body[0].value.col_offset 0 ast.dump(ast.parse("a+b").body[0]) "Expr(value=BinOp(left=Name(id='a', ctx=Load()), op=Add(), right=Name(id='b', ctx=Load())))"

With RedBaron, I get to have a "bounding_box". I must admit to not yet have used it. But I aspire to. And even for Nuitka, I would love to have the position of the "+" for use tracebacks in at least improved mode, as opposed to the first argument. But performance and bugs are keeping me away from considering any alternatives to "ast" there. So sure, I am asking of Jedi, if it has that bounding box, precisely to address this. With RedBaron, I can do this:

...

...
...
from redbaron import RedBaron red = RedBaron("a+b") red[0] a+b

...

...
...
red[0].second_formatting = " " red 0 a+ b

That of course also means, it knows the "+" location, or I can infer it:

...

...
...
red[0].second.bounding_box BoundingBox (Position (1, 1), Position (1, 1)) red[0].first.bounding_box BoundingBox (Position (1, 1), Position (1, 1)) red[0].bounding_box BoundingBox (Position (1, 1), Position (1, 5))

Seems bounding boxes are relative, but it also has "get_absolute_bounding_box_of_attribute". So, eventually I am faced with the issue of producing run time information from expressions, and then to find the same expression again in two different "ast" forms. But for rendering and editing, both RedBaron seems like it might work with heavy fighting. I would love for you to provide a code example how to use Jedi for editing like I did above, and if you think that creating such reports could be based on Jedi parsing. My plan for Nuitka now entails to probably identify an expression in Nuitka uniquely by path. That "uid" for code in a module is probably a new idea. The "uid" could be a run time number, hash code, which is then resolvable looking at the original code again in a new parse with even another tool that provides more detail. Finding it then in RedBaron or Jedi again may involve some normalization. The "ast" module hides some things from me, e.g. try/except/finally is nested statements in at least Python2. Yours, Kay

Dave Halter

4 May 4 May

9:27 p.m.

Hi Kay I will not write a very lengthy email here, but just guide you in a general direction of how Jedi handles things. First I want you to know that Jedi doesn't have a public API for its parser, YET. This is intentional. As long as there are no real users of it, why make it public? So if you're interested in a parser only (not static analysis), RedBaron might be the better choice for now, because Jedi's parser API is not offical yet. Pretty sure this will change, though. 2015-05-03 11:40 GMT+02:00 Kay Hayen :

...

With RedBaron, I can do this (bare with me on code quality, and by no means assume it's competent RedBaron usage):

def updateString(string_node): # Skip doc strings for now. if string_node.parent.type in ("class", "def", None): return

value = string_node.value

def isQuotedWith(quote): return value.startswith(quote) and value.endswith(quote)

for quote in "'''", '"""', "'", '"': if isQuotedWith(quote): break else: assert False, value

real_value = value[len(quote):-len(quote)] assert quote + real_value + quote == value

if "\n" not in real_value: # Single characters, should be quoted with "'" if len(eval(value)) == 1: if real_value != "'": string_node.value = "'" + real_value + "'" else: if '"' not in real_value: string_node.value = '"' + real_value + '"'

And then call this:

for node in red.find_all("StringNode"): try: updateString(node) except Exception: print("Problem with", node) node.help(deep = True, with_formatting = True) raise

It allows me do enforce some rules according to strings that are not multi-line. My rules there are, do not use triple quites without a new-line, strings of resulting length 1, become 'a' or '\n' and others, use "" or "ab", except of course "'" is valid and 'some "quote" would be too' as well.

I do a bunch of these, and like to have these things, to e.g. make sure that my multi-line calling convention lines up the "=" signs nicely, etc, inserting and removing white space from tuples, comma separated stuff, etc.

To me, it's not about, if I should do this, but if it can be done.

I think you could do that in a very similar way in Jedi. E.g. with this you can get all docstrings: >>> for scope in p.module.walk(): ... print(repr(scope.raw_doc)) ... '' With this you get all string leafs: >>> from jedi.parser import tree >>> def nested(node): ... for child in node.children: ... if isinstance(node, tree.Node): ... nested(node) ... elif node.type == 'string': ... print(node) # You can change everything here. ... nested(jedi.parser.Parser(load_grammar(), SOURCE)) You can of course modify them. Look at the example below...

...

So sure, I am asking of Jedi, if it has that bounding box, precisely to address this.

With RedBaron, I can do this:

...
...
...
from redbaron import RedBaron red = RedBaron("a+b") red[0] a+b

...
...
...
red[0].second_formatting = " " red 0 a+ b

That of course also means, it knows the "+" location, or I can infer it:

...
...
...
red[0].second.bounding_box BoundingBox (Position (1, 1), Position (1, 1)) red[0].first.bounding_box BoundingBox (Position (1, 1), Position (1, 1)) red[0].bounding_box BoundingBox (Position (1, 1), Position (1, 5))

>>> from jedi.parser import Parser, load_grammar >>> Parser(load_grammar(), 'a + b') > >>> p = Parser(load_grammar(), 'a + b') >>> p.module.children [Node(simple_stmt, [Node(arith_expr, [, , ]), ]), ] >>> expression = p.module.children[0].children[0] >>> expression Node(arith_expr, [, , ]) >>> b = p.module.children[0].children[0].children[2] >>> b >>> b.prefix ' ' >>> b.start_pos (1, 4) >>> b.end_pos (1, 5) >>> expression.children[1] >>> expression.children[1].start_pos (1, 2) >>> b.prefix = '' >>> b.value = '3' # Make the expression to '+3' instead of '+ b' >>> p.module.get_code() 'a +3' So as you can see, Jedi is all about positions. Jedi knows the positions of all leafs as well as the prefixes (whitespace + comments) of that stuff.

...

I would love for you to provide a code example how to use Jedi for editing like I did above, and if you think that creating such reports could be based on Jedi parsing.

I hope my example above is good enough. It shows how easy it is to play with code. Jedi nodes are really simple, but still powerful. Let me know if something is still unclear. ~ Dave

3274

Age (days ago)

3279

Last active (days ago)

List overview

Download

4 comments

2 participants

participants (2)

Dave Halter
Kay Hayen

Jedi 0.9.0 is now a static analysis library

Dave Halter

Kay Hayen

Dave Halter

Kay Hayen

Dave Halter

tags

participants (2)