[Python-ideas] pytaint: taint tracking in python

Tue Oct 15 11:58:41 CEST 2013

1. Please correct me if I misunderstand the Python project, but if the idea
is deemed 'good' by this list, a PEP can follow and the feature can be
included in Python 3? It is not necessary to have a Python 3 implementation
beforehand?
The existing Python 2.7.5 pytaint implementation is intended to be run by
users who need tainting in Python 2 but can also serve as a reference /
benchmark / proof-of-concept implementation for this discussion.

2. I haven't had the time to publish benchmarks yet but I plan to. Also, of
course, the cpython tests pass and we added additional taint tracking
tests. We also ran the internal tests of our python codebase with the
pytaint interpreter. This had negligible fails, mostly because some C
extensions haven't had been recompiled to work with the redefined string
objects.

Regarding taint tracking as a feature for python:

First of all, taint tracking is a general language feature and can be
considered for additional applications besides security. When it comes to
the security community, taint tracking is certainly controversial.
Nevertheless, my pytaint announcement received 50 retweets and 30 favs from
a part of the security community, if that counts for something ;)

As Andrew and Bruce mention, there are other solutions to XSS and SQLi:
template systems and parameterized queries. Another library solution exists
to shell injection: pipes.quote. However, all these solutions require the
developer to pick the correct library and method. We have empirical
indicators that this works, but maybe only in 70% of cases. The rest of the
developers are introducing new vulnerabilities. Thus, an additional
language-based feature can help to mitigate the remaining 30% of cases. A
web app framework (or a python-developing company) can maintain and ship a
pytaint configuration which will throw a TaintError exception in those 30%
of cases and prevent the vulnerability from being exploited.

This argument follows along the principle of defense-in-depth: why just
have one security feature (e.g. pipes.quote) if we can offer several
security features to the developer? This has previously worked well for
system security: ALSR, DEP, etc.

Regarding the relation to typing:

We are using Mertis on purpose to be able to distinguish between different
forms of string cleaning. Today, most HTML template systems don't even make
a distinction between different escaping contexts. However, with a pytaint
Merit configuration for raw HTML, URLs, HTML attribution contents, CSS
attributes and JS strings, you would be able to make sure that your string
is cleaned for the specific context you're using it in. This can be
implemented for each template system individually but it would be easier to
just write a pytaint config.
If you don't clean strings based on browser context, you will run into
problems: a string is cleaned with HTML-entity encoding but used in a
<iframe src> attribute. An attacker could trigger a XSS by suppling
javascript:alert(document.cookie).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131015/5daefba3/attachment-0001.html>