Why Is Escaping Data Considered So Magical?

Thu Jun 24 22:43:26 EDT 2010

On 2010-06-24 21:02:48 -0400, Roy Smith said:

> In article <i00t2k$l07$1 at lust.ihug.co.nz>,
>  Lawrence D'Oliveiro <ldo at geek-central.gen.new_zealand> wrote:
> 
>> I construct ad-hoc queries all the time. It really isnâ€™t that hard to do
>> safely. All you have to do is read the documentation
> 
> I get worried when people talk about how easy it is to do something
> safely.

First: I agree with this. While it's definitely possible to correctly 
escape a given SQL dialect under controlled conditions, it's not at all 
easy to get it right, and the real world is even more unfriendly than 
most people expect. Furthermore there's no reason to do it that way: 
Python's DB API spec effectively requires that placeholder parameters 
of *some* kind exist. Even if you feel the need to construct SQL, you 
can construct it with parameters almost as easily as you can construct 
it with the values baked in.

With that said...

> 2) Somebody runs your application with a different PYTHONPATH, which
> causes a different (i.e. malicious) urllib module to get loaded, which
> makes urllib.quote() do something you didn't expect.

Someone who can manipulate PYTHONPATH or otherwise add code to the 
runtime environment is already in a position to hose your database, 
independently of escaping-related issues. It's up to the sysadmin or 
user to ensure that their environment is sane, and it's on their head 
if they add broken code to a program's runtime environment.

Lawrence D'Oliveiro wrote:

> I'€™ve done this sort of thing for MySQL, for HTML and JavaScript (in both
> Python and JavaScript itself), and for Bash. Itâ€™s not hard to verify youâ€™ve
> done it correctly. It lets you easily create table-updating code like the
> following, which makes it so easy to update the code to track changes in the
> database structure:
> 
>      sql.cursor.execute \
>       (
>             "update items set "
>         +
>             ", ".join
>                 (
>                     tuple
>                         (
>                             "%(name)s = %(value)s"
>                         %
>                             {
>                                 "name" : field[0],
>                                 "value" : SQLString(Params.getvalue
>                                   (
>                                     "%s[%s]" % (field[1], 
> urllib.quote(modify_id))
>                                   ))
>                             }
>                         for field in
>                             (
>                                 ("class_name", "modify_class"),
>                                 ("make", "modify_make"),
>                                 ("model", "modify_model"),
>                                 ("details", "modify_details"),
>                                 ("serial_nr", "modify_serial"),
>                                 ("inventory_nr", "modify_invent"),
>                                 ("when_purchased", "modify_when_purchased"),
>                                 ... you get the idea ...
>                                 ("location_name", "modify_location"),
>                                 ("comment", "modify_comment"),
>                             )
>                         )
>                 +
>                     (
>                         "last_modified = %d" % int(time.time()),
>                     )
>                 )
>         +
>             " where inventory_nr = %s" % SQLString(modify_id)
>       )

Why would I write this when SQLAlchemy, even without using its ORM 
features, can do it for me? It even uses the placeholder-generating 
strategy I mentioned above, where possible.

Finally, it's worth noting that MySQL is (almost) the only mainstream 
database that uses escaping for parameterization. PostgreSQL, SQL 
Server, Oracle, DB2, and most other databases support parameters 
natively in their communication protocols: parameters aren't injected 
into the query string, but are sent separately and processed separately 
within the DBMS. This neatly avoids encoding-related and 
quoting-related problems entirely, and it means the type of the 
parameter can be preserved if it's useful.

-o