[Python-ideas] pytaint: taint tracking in python

Thu Oct 17 11:36:56 CEST 2013

Sorry for quoting indirectly.

> Note that web frameworks, etc, are not in the stdlib. I am not sure that
taints should be either.

In pytaint we decided to modify the interpreter (and provide a helper
module) for several reasons, the major reason being performance. If you
just do wrapping/monkey patching of str/unicode, the performance impact is
much bigger since a lot of internals are using str/unicode. Thus the
overall slowdown is high for a wrapper-based implementation.

https://github.com/felixgr/pytaint/commit/07254534810341b3552a8c8452bbf749fe2f30c9#diff-2

Therefore, I think the feature should be a part of (1) the language and (2)
embedded in the core interpreter mechanics.

> That being said, with no investigation into the difficulties or costs of
implementing taint tracking in PyPy, Jython, and IronPython, not to mention
not-quite-implementations like Cython, there might be other arguments for
that position.

I cannot speak for the projects but a colleague has previously implemented
a similar feature for Java and Ruby. This, at least, hints towards a
feasible implementation for Jython.

http://www.youtube.com/watch?v=WmZvnKYiNlE

http://repo.staticsafe.ca/presentations/hitbsecconf2012/D1T2%20-%20Meder%20Kydyraliev%20-%20Defibrilating%20Web%20Security.pdf

> I'd be interested to hear why this feature isn't used in the languages
that already have it

As it was mentioned earlier, we suggest a different form of taint tracking.
Pytaint-style taint tracking is (a) company/framework-wide configurable,
(b) distinguishes between different forms of taint and cleaners and (c) is
more performant than previous python implementations.

To expand on point (a) again, I think it would be very beneficial to web
app frameworks to have pytaint. Web app frameworks could then continue to
provide APIs which are SQLi-safe (parameterized) and SQLi-unsafe (raw
strings). If a user without knowledge about the domain would then use one
of the unsafe API insecurely, pytaint would catch it. And a user who is
familiar with the problem domain could still continue to use the more
flexible but unsafe API securely.
SQLi is just an example here, there a many other possible security issues
which can mitigated with pytaint (see the examples on github).

> A way to track the origins of tainted objects would also be a big winner

I agree it would be a cool additional optional feature of pytaint.
But let's focus this discussion on the currently proposed pytaint
design/implementation :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20131017/5bd3e66b/attachment.html>