
Hi, I'd like to start a discussion on adding a security feature: taint tracking. As part of his internship, Marcin (cc) has been working on a patch to cpython-2.7.5 which is available online. We also published a design document and slides. https://github.com/felixgr/pytaint The idea behind taint tracking (or taint checking) is that we mark ('taint') untrusted data and prevent the programmer from using it in sensitive places (called sinks). A standard use case would be in a web application, where data extracted from HTTP requests is tainted and a database connection is sensitive sink. In other words: objects returned by http request have a property indicating taint, and when one of them is passed to database connection, a TaintException is raised. The idea itself is not new (Ruby and Perl have it; there are also some python libraries floating around) and pretty much noone uses it - however with a few improvements, it can be made viable. Firstly, we introduce different kinds of taint (motivation: a string may be attack vector for many classes of attacks - e.g. XSS, SQLi - and we need different escaping for that). Secondly, we allow to easily apply it to existing software - a programmer can simply write a config file specifying taint sources, sensitive sinks and taint cleaners, and enable tracking by adding one line to his app. We think it's a very useful feature for developing most of webapps and other security-sensitive application in Python, any thoughts on this? Thanks, Felix

On 14 October 2013 22:25, Felix Gröbert <felix@groebert.org> wrote:
We think it's a very useful feature for developing most of webapps and other security-sensitive application in Python, any thoughts on this?
It's definitely an interesting idea, and the idea of pursuing it initially as a separate project to optionally harden Python 2 applications is a good one. Longer term, before it can be considered for inclusion as a language feature: 1. It needs to work with Python 3 (which has a substantially different text model), as Python 2 is no longer receiving new features. 2. The performance impact needs to be assessed when the feature is disabled (the default) and when various sources and sinks are defined. The performance numbers comparing http://hg.python.org/benchmarks/ between vanilla CPython 2.7.5 and pytaint may also be of interest to potential users of the Python 2.7 version. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Oct 14, 2013, at 5:25, Felix Gröbert <felix@groebert.org> wrote:
The idea itself is not new (Ruby and Perl have it; there are also some python libraries floating around) and pretty much noone uses it - however with a few improvements, it can be made viable.
A good part of the reason no one uses it is that SQL injection is always given as the motivation for the idea, but it's not a very good solution for that problem, and there's already a well-known better solution: parameterized queries. SQL isn't the only case where you build executable strings--a document formatter might build Postscript code; a forum might build HTML (maybe even with embedded JS); a game might even read Python code from an in-game console or untrusted mod that's allowed to run in a different globals environment but not the main one; etc. Has anyone successfully used perl's long-standing taint mode for any such purposes? If not, can you demonstrate using it in python? I don't think that would be _necessary_ for a python taint mode implementation to be considered useful, but it would certainly help get attention to the idea.

There's another good use case for tainting: html injection (XSS). There's a good solution for that too but XSS is still prevalent because it's easy to build html by concatenating strings without escaping and template systems make it too easy to inject strings without escaping (or put another way, they make it equally easy to inject escaped strings as unescaped strings). However, the issue is not just tainting but typing as well. When I have a string, I need to know if it's raw text or html text. If it's html text, I need to know if it's safe (generated by the program or user input that's been sanitized (carefully)) or unsafe (raw user input). I'm not sure it isn't --- Bruce I'm hiring: http://www.cadencemd.com/info/jobs Latest blog post: Alice's Puzzle Page http://www.vroospeak.com Learn how hackers think: http://j.mp/gruyere-security On Mon, Oct 14, 2013 at 9:07 AM, Andrew Barnert <abarnert@yahoo.com> wrote:

1. Please correct me if I misunderstand the Python project, but if the idea is deemed 'good' by this list, a PEP can follow and the feature can be included in Python 3? It is not necessary to have a Python 3 implementation beforehand? The existing Python 2.7.5 pytaint implementation is intended to be run by users who need tainting in Python 2 but can also serve as a reference / benchmark / proof-of-concept implementation for this discussion. 2. I haven't had the time to publish benchmarks yet but I plan to. Also, of course, the cpython tests pass and we added additional taint tracking tests. We also ran the internal tests of our python codebase with the pytaint interpreter. This had negligible fails, mostly because some C extensions haven't had been recompiled to work with the redefined string objects. Regarding taint tracking as a feature for python: First of all, taint tracking is a general language feature and can be considered for additional applications besides security. When it comes to the security community, taint tracking is certainly controversial. Nevertheless, my pytaint announcement received 50 retweets and 30 favs from a part of the security community, if that counts for something ;) As Andrew and Bruce mention, there are other solutions to XSS and SQLi: template systems and parameterized queries. Another library solution exists to shell injection: pipes.quote. However, all these solutions require the developer to pick the correct library and method. We have empirical indicators that this works, but maybe only in 70% of cases. The rest of the developers are introducing new vulnerabilities. Thus, an additional language-based feature can help to mitigate the remaining 30% of cases. A web app framework (or a python-developing company) can maintain and ship a pytaint configuration which will throw a TaintError exception in those 30% of cases and prevent the vulnerability from being exploited. This argument follows along the principle of defense-in-depth: why just have one security feature (e.g. pipes.quote) if we can offer several security features to the developer? This has previously worked well for system security: ALSR, DEP, etc. Regarding the relation to typing: We are using Mertis on purpose to be able to distinguish between different forms of string cleaning. Today, most HTML template systems don't even make a distinction between different escaping contexts. However, with a pytaint Merit configuration for raw HTML, URLs, HTML attribution contents, CSS attributes and JS strings, you would be able to make sure that your string is cleaned for the specific context you're using it in. This can be implemented for each template system individually but it would be easier to just write a pytaint config. If you don't clean strings based on browser context, you will run into problems: a string is cleaned with HTML-entity encoding but used in a <iframe src> attribute. An attacker could trigger a XSS by suppling javascript:alert(document.cookie).

On Tue, Oct 15, 2013 at 2:58 AM, Felix Gröbert <felix@groebert.org> wrote:
FWIW having reviewed parts of this code as it was implemented by Marcin I'll state up front that porting this to Python 3 will mostly be a matter of mechanical work. Python 3's bytes (PyBytes) and str (PyUnicode) objects are not _that_ different in implementation in comparison to Python 2's str (PyString) and unicode (PyUnicode) objects for the purposes of adding and tracking taint. Besides, the code could use more eyeballs as would happen in any porting process. :)
Indeed. I like the taint merits system. It is much more powerful than what Perl 5 ever had with a single taint bit. The ability to configure taint properties "offline" via JSON files is also neat. You can effectively create taint merit and sink metadata for existing Python libraries without needing to modify them (similar to how Cython lets you specify types via an external file for it to apply its magic better to other libraries without needing to modify them). -gps

On 10/15/2013 5:58 AM, Felix Gröbert wrote:
1. Please correct me if I misunderstand the Python project, but if the idea is deemed 'good' by this list,
This list is a discussion forum, not a decision-making body. An individual person can consider an idea 'good' in some sense without thinking that it should be included in the CPython distribution.
a PEP can follow and the feature can be included in Python 3?
A PEP must be discussed on the pydev (core developer) list and approved by GvR or a person delegated by him.
It is not necessary to have a Python 3 implementation beforehand?
A Python 3 implementation is necessary for inclusion. It may or may not be needed for PEP approval, depending on the pydev discussion and ultimately the PEP decider.
Making objects instances of classes with attributes is a general feature, which Python already has. From what I have seen posted, taint tracking is a particular implementation of a specialized subjective concept 'untrusted code text'. The concept is based on the unfortunate social-psychological fact that some people enjoy messing up other people's lives.
Right. Taints are not the only possible implementation that uses the same concept.
However, all these solutions require the developer to pick the correct library and method.
The same would be true of a taint library. Note that web frameworks, etc, are not in the stdlib. I am not sure that taints should be either. The idea of marking bytes (or strings) with their encoding (or source encoding) has been rejected. I don't think anything else should be added either. -- Terry Jan Reedy

On Oct 15, 2013, at 10:14, Terry Reedy <tjreedy@udel.edu> wrote:
The same would be true of a taint library. Note that web frameworks, etc, are not in the stdlib. I am not sure that taints should be either.
Well, some of the things that could benefit from taint checking _are_ in the stdlib--sqlite3.Cursor.execute, eval, etc. More importantly, it sounds like (at least this particular implementation of) tainted string tracking requires language support. So it seems to me that it has to be in the stdlib or not be at all. (I suppose you could add language support that allows for a variety of different taint libraries and not have any in the stdlib, but that seems even less likely to be acceptable than the larger suggestion.) So what you're suggesting really amounts to saying that this project should remain a fork of CPython. That being said, with no investigation into the difficulties or costs of implementing taint tracking in PyPy, Jython, and IronPython, not to mention not-quite-implementations like Cython, there might be other arguments for that position.

On 10/15/2013 1:30 PM, Andrew Barnert wrote:
Perhaps a security-oriented sql package could try to force use of parameterized queries, even though that would be less convenient for hard-coded queries. Or, the db2 interface standard could be augmented with a standard interface for tainted strings. (Or such an interface/protocol might be defined in a pep.) As for eval (and exec), a package module could easily provide a wrapper. def eval(code, glob, loc): if safe(code): builtin_eval(text(code), glob, loc) else: raise TaintError("only eval save strings") It could even replace the binding in builtins. Note that in Python 3, exec is also a function, not a statement (and keyword), so that it too can be wrapped and masked.
More importantly, it sounds like (at least this particular implementation of) tainted string tracking requires language support.
If 'language support' means changing str
So what you're suggesting really amounts to saying that this project should remain a fork of CPython.
which 'fork implies to me, then experience with an implementation for 3.3+, using the new FSR classes, is needed for any real discussion.
Good catch. I presume Jython and IronPython simply use Java and C# strings respectively. -- Terry Jan Reedy

On 15 Oct 2013 19:59, "Felix Gröbert" <felix@groebert.org> wrote:
1. Please correct me if I misunderstand the Python project, but if the
idea is deemed 'good' by this list, a PEP can follow and the feature can be included in Python 3? It is not necessary to have a Python 3 implementation beforehand? Sure. I was just pointing out that the significantly different str and bytes types and the removal of the implicit conversions between them in 3.x could complicate the eventual forward porting process. (Although GPS has indicated it shouldn't be a major problem in this case). pytaint interpreter. This had negligible fails, mostly because some C extensions haven't had been recompiled to work with the redefined string objects.
Regarding taint tracking as a feature for python:
First of all, taint tracking is a general language feature and can be
considered for additional applications besides security. When it comes to the security community, taint tracking is certainly controversial. Nevertheless, my pytaint announcement received 50 retweets and 30 favs from a part of the security community, if that counts for something ;) If you can provide a way to taint strings with an encoding assumption such that combining strings with conflicting encoding assumptions fails, that would be a big point in favour of the system. A way to track the origins of tainted objects would also be a big winner. While I assume tracking that would be too expensive to do by default, tracing the origin of bad data can be a genuinely hard debugging problem, so being able to fire up failing unit tests or vulnerability scans in a taint tracing mode could be very interesting.
As Andrew and Bruce mention, there are other solutions to XSS and SQLi:
template systems and parameterized queries. Another library solution exists to shell injection: pipes.quote. However, all these solutions require the developer to pick the correct library and method. We have empirical indicators that this works, but maybe only in 70% of cases. The rest of the developers are introducing new vulnerabilities. Thus, an additional language-based feature can help to mitigate the remaining 30% of cases. A web app framework (or a python-developing company) can maintain and ship a pytaint configuration which will throw a TaintError exception in those 30% of cases and prevent the vulnerability from being exploited.
This argument follows along the principle of defense-in-depth: why just
If you don't clean strings based on browser context, you will run into
have one security feature (e.g. pipes.quote) if we can offer several security features to the developer? This has previously worked well for system security: ALSR, DEP, etc. Yes, the idea sounds interesting to me in principle. If it can be adapted to help with the "where did the bad string data come from?" problem more generally, then it becomes genuinely compelling :) that your string is cleaned for the specific context you're using it in. This can be implemented for each template system individually but it would be easier to just write a pytaint config. problems: a string is cleaned with HTML-entity encoding but used in a <iframe src> attribute. An attacker could trigger a XSS by suppling javascript:alert(document.cookie). It seems to me that viewing this as a parallel typing system for data strings is a potentially useful way of looking at things. Cheers, Nick.

On Tue, Oct 15, 2013 at 01:15:30PM -0700, Mark Janssen <dreamingforward@gmail.com> wrote:
Too late to discuss -- it's become a well-known term: https://en.wikipedia.org/wiki/Taint_checking Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On 10/14/13 8:25 AM, Felix Gröbert wrote:
I'd be interested to hear why this feature isn't used in the languages that already have it. That seems to be a strike against it. Your proposed changes sound like they make it a more complex feature, and therefore less likely to be used. --Ned.

On 16 Oct 2013 07:15, "Ned Batchelder" <ned@nedbatchelder.com> wrote:
On 10/14/13 8:25 AM, Felix Gröbert wrote:
The idea itself is not new (Ruby and Perl have it; there are also some
python libraries floating around) and pretty much noone uses it - however with a few improvements, it can be made viable.
I'd be interested to hear why this feature isn't used in the languages
that already have it. That seems to be a strike against it. Your proposed changes sound like they make it a more complex feature, and therefore less likely to be used. At least the Perl one is a bit too simplistic for sophisticated cases, as it just divides the world into safe and unsafe strings. That approach is closer to the safe/unsafe marking mechanisms that Python web frameworks already tend to use for templating and other aspects of response generation. Cheers, Nick.

On Oct 15, 2013, at 15:02, Nick Coghlan <ncoghlan@gmail.com> wrote:
Also keep in mind that we're talking about a perl 3 feature intended to solve SQL injection problems, and once parameterized SQL was invented it was no longer useful for that. (Yes, you can still embed strings directly into SQL statements and quote and escape them manually because you're sure you're too smart to ever make a mistake, or because you just haven't bothered to learn the language or domain--but the kind of person who does that also doesn't turn on taint mode.) A more flexible feature designed for other problems that haven't proven as amenable to an easy fix might find more use. Which is exactly why I suggested that the OP give better use cases than SQL injection--and he obliged.

Sorry for quoting indirectly.
Note that web frameworks, etc, are not in the stdlib. I am not sure that taints should be either.
I'd be interested to hear why this feature isn't used in the languages
In pytaint we decided to modify the interpreter (and provide a helper module) for several reasons, the major reason being performance. If you just do wrapping/monkey patching of str/unicode, the performance impact is much bigger since a lot of internals are using str/unicode. Thus the overall slowdown is high for a wrapper-based implementation. https://github.com/felixgr/pytaint/commit/07254534810341b3552a8c8452bbf749fe... Therefore, I think the feature should be a part of (1) the language and (2) embedded in the core interpreter mechanics. that position. I cannot speak for the projects but a colleague has previously implemented a similar feature for Java and Ruby. This, at least, hints towards a feasible implementation for Jython. http://www.youtube.com/watch?v=WmZvnKYiNlE http://repo.staticsafe.ca/presentations/hitbsecconf2012/D1T2%20-%20Meder%20K... that already have it As it was mentioned earlier, we suggest a different form of taint tracking. Pytaint-style taint tracking is (a) company/framework-wide configurable, (b) distinguishes between different forms of taint and cleaners and (c) is more performant than previous python implementations. To expand on point (a) again, I think it would be very beneficial to web app frameworks to have pytaint. Web app frameworks could then continue to provide APIs which are SQLi-safe (parameterized) and SQLi-unsafe (raw strings). If a user without knowledge about the domain would then use one of the unsafe API insecurely, pytaint would catch it. And a user who is familiar with the problem domain could still continue to use the more flexible but unsafe API securely. SQLi is just an example here, there a many other possible security issues which can mitigated with pytaint (see the examples on github).
A way to track the origins of tainted objects would also be a big winner
I agree it would be a cool additional optional feature of pytaint. But let's focus this discussion on the currently proposed pytaint design/implementation :)

On 14 October 2013 22:25, Felix Gröbert <felix@groebert.org> wrote:
We think it's a very useful feature for developing most of webapps and other security-sensitive application in Python, any thoughts on this?
It's definitely an interesting idea, and the idea of pursuing it initially as a separate project to optionally harden Python 2 applications is a good one. Longer term, before it can be considered for inclusion as a language feature: 1. It needs to work with Python 3 (which has a substantially different text model), as Python 2 is no longer receiving new features. 2. The performance impact needs to be assessed when the feature is disabled (the default) and when various sources and sinks are defined. The performance numbers comparing http://hg.python.org/benchmarks/ between vanilla CPython 2.7.5 and pytaint may also be of interest to potential users of the Python 2.7 version. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Oct 14, 2013, at 5:25, Felix Gröbert <felix@groebert.org> wrote:
The idea itself is not new (Ruby and Perl have it; there are also some python libraries floating around) and pretty much noone uses it - however with a few improvements, it can be made viable.
A good part of the reason no one uses it is that SQL injection is always given as the motivation for the idea, but it's not a very good solution for that problem, and there's already a well-known better solution: parameterized queries. SQL isn't the only case where you build executable strings--a document formatter might build Postscript code; a forum might build HTML (maybe even with embedded JS); a game might even read Python code from an in-game console or untrusted mod that's allowed to run in a different globals environment but not the main one; etc. Has anyone successfully used perl's long-standing taint mode for any such purposes? If not, can you demonstrate using it in python? I don't think that would be _necessary_ for a python taint mode implementation to be considered useful, but it would certainly help get attention to the idea.

There's another good use case for tainting: html injection (XSS). There's a good solution for that too but XSS is still prevalent because it's easy to build html by concatenating strings without escaping and template systems make it too easy to inject strings without escaping (or put another way, they make it equally easy to inject escaped strings as unescaped strings). However, the issue is not just tainting but typing as well. When I have a string, I need to know if it's raw text or html text. If it's html text, I need to know if it's safe (generated by the program or user input that's been sanitized (carefully)) or unsafe (raw user input). I'm not sure it isn't --- Bruce I'm hiring: http://www.cadencemd.com/info/jobs Latest blog post: Alice's Puzzle Page http://www.vroospeak.com Learn how hackers think: http://j.mp/gruyere-security On Mon, Oct 14, 2013 at 9:07 AM, Andrew Barnert <abarnert@yahoo.com> wrote:

1. Please correct me if I misunderstand the Python project, but if the idea is deemed 'good' by this list, a PEP can follow and the feature can be included in Python 3? It is not necessary to have a Python 3 implementation beforehand? The existing Python 2.7.5 pytaint implementation is intended to be run by users who need tainting in Python 2 but can also serve as a reference / benchmark / proof-of-concept implementation for this discussion. 2. I haven't had the time to publish benchmarks yet but I plan to. Also, of course, the cpython tests pass and we added additional taint tracking tests. We also ran the internal tests of our python codebase with the pytaint interpreter. This had negligible fails, mostly because some C extensions haven't had been recompiled to work with the redefined string objects. Regarding taint tracking as a feature for python: First of all, taint tracking is a general language feature and can be considered for additional applications besides security. When it comes to the security community, taint tracking is certainly controversial. Nevertheless, my pytaint announcement received 50 retweets and 30 favs from a part of the security community, if that counts for something ;) As Andrew and Bruce mention, there are other solutions to XSS and SQLi: template systems and parameterized queries. Another library solution exists to shell injection: pipes.quote. However, all these solutions require the developer to pick the correct library and method. We have empirical indicators that this works, but maybe only in 70% of cases. The rest of the developers are introducing new vulnerabilities. Thus, an additional language-based feature can help to mitigate the remaining 30% of cases. A web app framework (or a python-developing company) can maintain and ship a pytaint configuration which will throw a TaintError exception in those 30% of cases and prevent the vulnerability from being exploited. This argument follows along the principle of defense-in-depth: why just have one security feature (e.g. pipes.quote) if we can offer several security features to the developer? This has previously worked well for system security: ALSR, DEP, etc. Regarding the relation to typing: We are using Mertis on purpose to be able to distinguish between different forms of string cleaning. Today, most HTML template systems don't even make a distinction between different escaping contexts. However, with a pytaint Merit configuration for raw HTML, URLs, HTML attribution contents, CSS attributes and JS strings, you would be able to make sure that your string is cleaned for the specific context you're using it in. This can be implemented for each template system individually but it would be easier to just write a pytaint config. If you don't clean strings based on browser context, you will run into problems: a string is cleaned with HTML-entity encoding but used in a <iframe src> attribute. An attacker could trigger a XSS by suppling javascript:alert(document.cookie).

On Tue, Oct 15, 2013 at 2:58 AM, Felix Gröbert <felix@groebert.org> wrote:
FWIW having reviewed parts of this code as it was implemented by Marcin I'll state up front that porting this to Python 3 will mostly be a matter of mechanical work. Python 3's bytes (PyBytes) and str (PyUnicode) objects are not _that_ different in implementation in comparison to Python 2's str (PyString) and unicode (PyUnicode) objects for the purposes of adding and tracking taint. Besides, the code could use more eyeballs as would happen in any porting process. :)
Indeed. I like the taint merits system. It is much more powerful than what Perl 5 ever had with a single taint bit. The ability to configure taint properties "offline" via JSON files is also neat. You can effectively create taint merit and sink metadata for existing Python libraries without needing to modify them (similar to how Cython lets you specify types via an external file for it to apply its magic better to other libraries without needing to modify them). -gps

On 10/15/2013 5:58 AM, Felix Gröbert wrote:
1. Please correct me if I misunderstand the Python project, but if the idea is deemed 'good' by this list,
This list is a discussion forum, not a decision-making body. An individual person can consider an idea 'good' in some sense without thinking that it should be included in the CPython distribution.
a PEP can follow and the feature can be included in Python 3?
A PEP must be discussed on the pydev (core developer) list and approved by GvR or a person delegated by him.
It is not necessary to have a Python 3 implementation beforehand?
A Python 3 implementation is necessary for inclusion. It may or may not be needed for PEP approval, depending on the pydev discussion and ultimately the PEP decider.
Making objects instances of classes with attributes is a general feature, which Python already has. From what I have seen posted, taint tracking is a particular implementation of a specialized subjective concept 'untrusted code text'. The concept is based on the unfortunate social-psychological fact that some people enjoy messing up other people's lives.
Right. Taints are not the only possible implementation that uses the same concept.
However, all these solutions require the developer to pick the correct library and method.
The same would be true of a taint library. Note that web frameworks, etc, are not in the stdlib. I am not sure that taints should be either. The idea of marking bytes (or strings) with their encoding (or source encoding) has been rejected. I don't think anything else should be added either. -- Terry Jan Reedy

On Oct 15, 2013, at 10:14, Terry Reedy <tjreedy@udel.edu> wrote:
The same would be true of a taint library. Note that web frameworks, etc, are not in the stdlib. I am not sure that taints should be either.
Well, some of the things that could benefit from taint checking _are_ in the stdlib--sqlite3.Cursor.execute, eval, etc. More importantly, it sounds like (at least this particular implementation of) tainted string tracking requires language support. So it seems to me that it has to be in the stdlib or not be at all. (I suppose you could add language support that allows for a variety of different taint libraries and not have any in the stdlib, but that seems even less likely to be acceptable than the larger suggestion.) So what you're suggesting really amounts to saying that this project should remain a fork of CPython. That being said, with no investigation into the difficulties or costs of implementing taint tracking in PyPy, Jython, and IronPython, not to mention not-quite-implementations like Cython, there might be other arguments for that position.

On 10/15/2013 1:30 PM, Andrew Barnert wrote:
Perhaps a security-oriented sql package could try to force use of parameterized queries, even though that would be less convenient for hard-coded queries. Or, the db2 interface standard could be augmented with a standard interface for tainted strings. (Or such an interface/protocol might be defined in a pep.) As for eval (and exec), a package module could easily provide a wrapper. def eval(code, glob, loc): if safe(code): builtin_eval(text(code), glob, loc) else: raise TaintError("only eval save strings") It could even replace the binding in builtins. Note that in Python 3, exec is also a function, not a statement (and keyword), so that it too can be wrapped and masked.
More importantly, it sounds like (at least this particular implementation of) tainted string tracking requires language support.
If 'language support' means changing str
So what you're suggesting really amounts to saying that this project should remain a fork of CPython.
which 'fork implies to me, then experience with an implementation for 3.3+, using the new FSR classes, is needed for any real discussion.
Good catch. I presume Jython and IronPython simply use Java and C# strings respectively. -- Terry Jan Reedy

On 15 Oct 2013 19:59, "Felix Gröbert" <felix@groebert.org> wrote:
1. Please correct me if I misunderstand the Python project, but if the
idea is deemed 'good' by this list, a PEP can follow and the feature can be included in Python 3? It is not necessary to have a Python 3 implementation beforehand? Sure. I was just pointing out that the significantly different str and bytes types and the removal of the implicit conversions between them in 3.x could complicate the eventual forward porting process. (Although GPS has indicated it shouldn't be a major problem in this case). pytaint interpreter. This had negligible fails, mostly because some C extensions haven't had been recompiled to work with the redefined string objects.
Regarding taint tracking as a feature for python:
First of all, taint tracking is a general language feature and can be
considered for additional applications besides security. When it comes to the security community, taint tracking is certainly controversial. Nevertheless, my pytaint announcement received 50 retweets and 30 favs from a part of the security community, if that counts for something ;) If you can provide a way to taint strings with an encoding assumption such that combining strings with conflicting encoding assumptions fails, that would be a big point in favour of the system. A way to track the origins of tainted objects would also be a big winner. While I assume tracking that would be too expensive to do by default, tracing the origin of bad data can be a genuinely hard debugging problem, so being able to fire up failing unit tests or vulnerability scans in a taint tracing mode could be very interesting.
As Andrew and Bruce mention, there are other solutions to XSS and SQLi:
template systems and parameterized queries. Another library solution exists to shell injection: pipes.quote. However, all these solutions require the developer to pick the correct library and method. We have empirical indicators that this works, but maybe only in 70% of cases. The rest of the developers are introducing new vulnerabilities. Thus, an additional language-based feature can help to mitigate the remaining 30% of cases. A web app framework (or a python-developing company) can maintain and ship a pytaint configuration which will throw a TaintError exception in those 30% of cases and prevent the vulnerability from being exploited.
This argument follows along the principle of defense-in-depth: why just
If you don't clean strings based on browser context, you will run into
have one security feature (e.g. pipes.quote) if we can offer several security features to the developer? This has previously worked well for system security: ALSR, DEP, etc. Yes, the idea sounds interesting to me in principle. If it can be adapted to help with the "where did the bad string data come from?" problem more generally, then it becomes genuinely compelling :) that your string is cleaned for the specific context you're using it in. This can be implemented for each template system individually but it would be easier to just write a pytaint config. problems: a string is cleaned with HTML-entity encoding but used in a <iframe src> attribute. An attacker could trigger a XSS by suppling javascript:alert(document.cookie). It seems to me that viewing this as a parallel typing system for data strings is a potentially useful way of looking at things. Cheers, Nick.

On Tue, Oct 15, 2013 at 01:15:30PM -0700, Mark Janssen <dreamingforward@gmail.com> wrote:
Too late to discuss -- it's become a well-known term: https://en.wikipedia.org/wiki/Taint_checking Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On 10/14/13 8:25 AM, Felix Gröbert wrote:
I'd be interested to hear why this feature isn't used in the languages that already have it. That seems to be a strike against it. Your proposed changes sound like they make it a more complex feature, and therefore less likely to be used. --Ned.

On 16 Oct 2013 07:15, "Ned Batchelder" <ned@nedbatchelder.com> wrote:
On 10/14/13 8:25 AM, Felix Gröbert wrote:
The idea itself is not new (Ruby and Perl have it; there are also some
python libraries floating around) and pretty much noone uses it - however with a few improvements, it can be made viable.
I'd be interested to hear why this feature isn't used in the languages
that already have it. That seems to be a strike against it. Your proposed changes sound like they make it a more complex feature, and therefore less likely to be used. At least the Perl one is a bit too simplistic for sophisticated cases, as it just divides the world into safe and unsafe strings. That approach is closer to the safe/unsafe marking mechanisms that Python web frameworks already tend to use for templating and other aspects of response generation. Cheers, Nick.

On Oct 15, 2013, at 15:02, Nick Coghlan <ncoghlan@gmail.com> wrote:
Also keep in mind that we're talking about a perl 3 feature intended to solve SQL injection problems, and once parameterized SQL was invented it was no longer useful for that. (Yes, you can still embed strings directly into SQL statements and quote and escape them manually because you're sure you're too smart to ever make a mistake, or because you just haven't bothered to learn the language or domain--but the kind of person who does that also doesn't turn on taint mode.) A more flexible feature designed for other problems that haven't proven as amenable to an easy fix might find more use. Which is exactly why I suggested that the OP give better use cases than SQL injection--and he obliged.

Sorry for quoting indirectly.
Note that web frameworks, etc, are not in the stdlib. I am not sure that taints should be either.
I'd be interested to hear why this feature isn't used in the languages
In pytaint we decided to modify the interpreter (and provide a helper module) for several reasons, the major reason being performance. If you just do wrapping/monkey patching of str/unicode, the performance impact is much bigger since a lot of internals are using str/unicode. Thus the overall slowdown is high for a wrapper-based implementation. https://github.com/felixgr/pytaint/commit/07254534810341b3552a8c8452bbf749fe... Therefore, I think the feature should be a part of (1) the language and (2) embedded in the core interpreter mechanics. that position. I cannot speak for the projects but a colleague has previously implemented a similar feature for Java and Ruby. This, at least, hints towards a feasible implementation for Jython. http://www.youtube.com/watch?v=WmZvnKYiNlE http://repo.staticsafe.ca/presentations/hitbsecconf2012/D1T2%20-%20Meder%20K... that already have it As it was mentioned earlier, we suggest a different form of taint tracking. Pytaint-style taint tracking is (a) company/framework-wide configurable, (b) distinguishes between different forms of taint and cleaners and (c) is more performant than previous python implementations. To expand on point (a) again, I think it would be very beneficial to web app frameworks to have pytaint. Web app frameworks could then continue to provide APIs which are SQLi-safe (parameterized) and SQLi-unsafe (raw strings). If a user without knowledge about the domain would then use one of the unsafe API insecurely, pytaint would catch it. And a user who is familiar with the problem domain could still continue to use the more flexible but unsafe API securely. SQLi is just an example here, there a many other possible security issues which can mitigated with pytaint (see the examples on github).
A way to track the origins of tainted objects would also be a big winner
I agree it would be a cool additional optional feature of pytaint. But let's focus this discussion on the currently proposed pytaint design/implementation :)
participants (9)
-
Andrew Barnert
-
Bruce Leban
-
Felix Gröbert
-
Gregory P. Smith
-
Mark Janssen
-
Ned Batchelder
-
Nick Coghlan
-
Oleg Broytman
-
Terry Reedy