Unsafe input analyzer
![](https://secure.gravatar.com/avatar/0f8670f9bfadcf0f6d460f82d488707c.jpg?s=120&d=mm&r=g)
Hello, While working on embedded Linux devices I ran into multiple cases of unsafe use of input data on the back-end. These back-ends used Python Flask framework and were made without the help of commercial (and expensive) static analysis tools. I looked around for a tool to spot these vulnerabilities, but could not find anything available for free. So, I wrote a quick script at first and then made it more generic. The tool checks functions with the route decorator, it detects the request related input data and checks if the data is passed to a function without being checked by a known filter or validator. I call the tool Python API parser and input analyzer (Papaia). I have the sources available at https://gitlab.com/melomaa/papaia Do you think this tool could be helpful for others and do you see that it would fit under the PyCQA? Best Regards, Mikko Elomaa
![](https://secure.gravatar.com/avatar/3ad07138ac2e19eba258440e6ef4e042.jpg?s=120&d=mm&r=g)
Hey, On Tue, Oct 06, 2020 at 01:33:24PM -0400, Mikko Elomaa via code-quality wrote:
I looked around for a tool to spot these vulnerabilities, but could not find anything available for free. So, I wrote a quick script at first and then made it more generic. The tool checks functions with the route decorator, it detects the request related input data and checks if the data is passed to a function without being checked by a known filter or validator.
FWIW I think GitHub's code scanning does some quite sophisticated analysis via CodeQL: https://github.blog/2020-09-30-code-scanning-is-now-available/ https://github.com/github/codeql/tree/main/python/ql/src/semmle/python/web/f...
I call the tool Python API parser and input analyzer (Papaia). I have the sources available at https://gitlab.com/melomaa/papaia
Looks interesting! I don't do much with Flask, so I can't say much more :) From a quick look, you might want to consider using an ast (abstract syntax tree) module for parsing the code, rather than using regular expressions. Some examples: https://docs.python.org/3/library/ast.html https://github.com/davidhalter/parso https://github.com/PyCQA/baron / https://github.com/PyCQA/redbaron (I've not used any of those myself, though) Florian -- me@the-compiler.org (Mail/XMPP) | https://www.qutebrowser.org https://bruhin.software/ | https://github.com/sponsors/The-Compiler/ GPG: 916E B0C8 FD55 A072 | https://the-compiler.org/pubkey.asc I love long mails! | https://email.is-not-s.ms/
![](https://secure.gravatar.com/avatar/512dcb8b740731944d02729975eb827f.jpg?s=120&d=mm&r=g)
Hi Mikko, welcome to the static analysis rabbit hole :-) On 13.10.20 15:09, Florian Bruhin wrote:
On Tue, Oct 06, 2020 at 01:33:24PM -0400, Mikko Elomaa via code-quality wrote:
I looked around for a tool to spot these vulnerabilities, but could not find anything available for free. So, I wrote a quick script at first and then made it more generic. The tool checks functions with the route decorator, it detects the request related input data and checks if the data is passed to a function without being checked by a known filter or validator.
There is PyT [1] (unmaintained) and Pysa [2] which both do this kind of taint analysis and maybe cover your needs. For some theoretical background you probably should read the Master Thesis behind PyT [5].
From a quick look, you might want to consider using an ast (abstract syntax tree) module for parsing the code, rather than using regular expressions.
A very good point Florian makes there :-) Regular Expressions are not capable of parsing Python code in the general case. More libs that could help you parsing and making sense of Python code: - Astroid [3] which is similar to the stdlib ast module but way nicer to work with - Jedi [4] which is a higher level interface built on top of parso which lets you do things like this ``` import jedi source = """ def fn(arg1, arg2): sink(arg1) """ script = jedi.Script(source, path="foo.py") for def_ in script.get_names(all_scopes=True, references=True): print(def_) ``` Cheers, Martin [1] https://github.com/python-security/pyt [2] https://pyre-check.org/docs/pysa-basics/#taint-analysis [3] http://pylint.pycqa.org/projects/astroid/en/latest/index.html [4] https://github.com/davidhalter/jedi [5] https://projekter.aau.dk/projekter/files/239563289/final.pdf
participants (3)
-
Florian Bruhin
-
Martin Vielsmaier
-
Mikko Elomaa