[Typing-sig] Re: Arbitrary Literal Strings

Aug. 9, 2021

      I don't think the SQL query is a good use case; many non-trivial
queries (e.g. conditionally including clauses) require generating SQL
statements dynamically.

On Mon, 2021-08-09 at 22:23 +0000, gbleaney@gmail.com wrote:
...
...
The proposal is based on the assumption that a literal type is a
good proxy for “a value that has been validated as safe by the
caller”. That assumption seems tenuous. There are many ways the
caller can perform such validation. For example, it could use a
regex filter, compare against a table of known good values, scan
for dangerous character sequences, or pass it though an escape
transform. None of these techniques will produce a literal-typed
value.
You're 100% right! There are lots of ways to make sure a given string
is safe to use in a given command. The problem is, many of the ways
that you suggested (regex filtering, scanning for dangerous
characters, escape transforms) can have subtle implementation flaws
that allow an attacker to bypass them. Additionally, users can
entirely forget to make those checks. As a security engineer, I have
to assume that someone somewhere along the line is going to mess up
their ad-hoc regex check (or forget to write it) and let a special
character slip though. `Literal[str]` gives me another way though. It
lets me create *opinionated* APIs that say the user *must* supply a
literal string, which the type checker can then tell me with 100%
certainty was not dynamically created with user controlled input.
Let's make things concrete and talk about SQL injection. The
canonical way to prevent it is to use parameterized queries:
```
def get_data(value: str):
    SQL = "SELECT * FROM table WHERE col = %s"
    return conn.query(SQL, value)
```
The problem is, nothing stops a developer from inserting a
dynamically created SQL string into that first parameter and creating
a SQL injection vulnerability despite the availability of
parameterization:
```
def get_data(value: str):
    SQL = f"SELECT * FROM table WHERE col = '{value}'"
    return conn.query(SQL)
```
If I change the interface of `query` to require that the first
argument is a literal, I can prevent this SQL injection issue from
happening.
...
Python is a runtime-type-safe language, so it already prevents
format string attacks from accessing stack or heap locations
outside of the target object. Are you primarily concerned about
cases where Python code invokes code written in a different
language that potentially has vulnerabilities because of a lack of
runtime type safety?
My concerns have nothing to do with memory or type safety, and
everything to do with preventing the confusion of data and commands.
As you've alluded to, many APIs take a string and run it as code
(`pickle`, `eval`, etc. run Python code, SQL APIs run SQL code,
`os.system` runs shell commands, `python-ldap`'s APIs let you run
LDAP queries, etc.). Sometimes the commands they run need data (IE. a
value to insert into the table). If the commands and data are a part
of the same string, injection vulnerabilities can occur. Using
`Literal[str]`, API designers can enforce the separation of commands
and data by requiring that the commands be literals within the python
program, rather than coming from some external source that is user
controlled data.
...
It sounds like you are looking for some form of taint analysis, but
this doesn’t strike me as a good solution to that problem.
Full taint analysis is definitely useful, and I actually spend the
majority of my time working on Pysa which is a taint analysis tool
build on top of Pyre. To me, the reasons to want this in a type
checker are:
1) It's way faster to run and give feedback to developers. Pysa will
take an hour+ to report an issue to a developer on a massive
codebase, wheres Pyre can do it in a second.
2) Taint analysis requires that you're able to track sources of user
controlled data into the dangerous function. There is always a risk
of false negative there, whereas I can't imagine a case of a false
negative coming from a type check for `Literal` (outside of explicit
lint suppression)
_______________________________________________
Typing-sig mailing list -- typing-sig@python.org
To unsubscribe send an email to typing-sig-leave@python.org
https://mail.python.org/mailman3/lists/typing-sig.python.org/
Member address: pbryan@anode.ca

[Typing-sig] Re: Arbitrary Literal Strings

Paul Bryan