String substitution VS proper mysql escaping

Cameron Simpson cs at zip.com.au
Wed Aug 18 05:50:17 EDT 2010


On 18Aug2010 12:07, Nik Gr <nikos.the.gr33k at gmail.com> wrote:
|  Στις 18/8/2010 7:31 πμ, ο/η Cameron Simpson έγραψε:
| >On 17Aug2010 20:15, Νίκος<nikos.the.gr33k at gmail.com>  wrote:
| >| ===============================
| >| cursor.execute( ''' SELECT host, hits, date FROM visitors WHERE page =
| >| '%s' ORDER BY date DESC ''' % (page) )
| >| ===============================
| >|
| >| Someone told me NOT to do string substitution ("%") on SQL statements
| >| and to let MySQLdb do it
| >| for me, with proper escaping like the following
| >|
| >| ===============================
| >| cursor.execute('''SELECT host, hits, date FROM visitors WHERE page=%s
| >|         ORDER BY date DESC''', (page,))
| >| ===============================
| >|
| >| The difference is that if some external source can control "page",
| >| and
| >| they put in a value like
| >|         100 ; DELETE FROM visitors; SELECT * FROM visitors
| >| i will be losing my database table data.
| >
| >That other difference is that the mysql dialect support knows how to
| >correctly escape a string for insertion into an SQL statement. You may
| >not, or may forget to pre-escape the string, etc. Using the MySQLdb
| >stuff do it for you is reliable and robust.
| 
| Can you please tell me what escaping means by giving me an example
| of what is escaped and whats isn't?

In your plain substitution example above:

  cursor.execute( ''' SELECT host, hits, date FROM visitors WHERE page =
  '%s' ORDER BY date DESC ''' % (page) )

Supposing page is the string "100". This will produce the SQL statement:

  SELECT host, hits, date FROM visitors WHERE page = '100' ORDER BY date DESC

which looks ok. But suppose page was the string:

  bill o'reilly

Then your SQL statement looks like this:

  SELECT host, hits, date FROM visitors WHERE page = 'bill o'reilly' ORDER BY date DESC

To the SQL engine this looks like the string "bill o" followed by an SQL
instruction named "reilly", and then the opening quote for another string.
Invalid SQL.

The procedure used to avoid this problem (to insert an _arbitrary_
string into the SQL statement) is to "escape" problematic characters in
strings when placing them into SQL statements. In this case, the quote
character in the string is the SQL "end string" character. Therefore the
string must be modified in the SQL statement to be correctly expressed.

IIRC, SQL uses the quote doubling convention for strings, so this:

  SELECT host, hits, date FROM visitors WHERE page = 'bill o''reilly' ORDER BY date DESC

is how one would write the literal SQL for that.

The MySQLdb library will do this and a host of other equivalent things
automatically and correctly and consistently when you pass page as a
parameter to the execute() method, needing no special attention or
detailed syntactic knowledge on your part when you write your program.

[...snip...]
| >The "%" operator has special knowledge that is it is passed as string instead
| >of a list or tuple or other sequence then it should act _as_ _if_ it had been
| >passed a single element tuple containing the string.
| >
| %s and %d is behaving the same due to % expecting a string instead
| of an integer?

I haven't checked. I make a point of using the (page,) form (i.e.
always use a tuple, even with just one argument) these days. That way
there is no ambiguity.

| >Otherwise, because a string _is_ a sequence the "%" might want to treat
| >the string "foo" as the sequence:
| >
| >   ("f", "o", "o")
| cursor.execute('''SELECT host, hits, date FROM visitors WHERE
| page=%s ORDER BY date DESC''',  page)
| 
| But it alss might treat it an entity, i mean since 'page' is a
| variable containing a string why not just 'page' as it is expecting
| 'page' variable to give its value when asked?

A string is also a sequence of characters.

| >Run these three loops to see the difference:
| >
| >   for s in "foo":
| >     print s
| >   for s in ("foo"):
| >     print s
| >   for s in ("foo",):
| >     print s
| >
| >Cheers,
| >>> for s in "nikos":
|     print s
| 
| 
| n
| i
| k
| o
| s
| 
| # this handles the string "nikos" as a series of chars right?

Yes.

| >>> for s in ("nikos"):
|     print s
| 
| 
| n
| i
| k
| o
| s
| 
| # this handles the string "nikos" as a series of chars too but what
| si the difference with the above in htis with the parentheses? is
| "nikos" is handles still as string here?

It is exactly the same as the first loop. Just as:

  1 + 3

is exactly the same as:

  (1) + (3)

| >>> for s in ("nikos",):
|     print s
| 
| nikos
| 
| # Here yes it handles "nikos" as the 1st item of a tuple

Yep.

| >>> for s in ["nikos"]:
|     print s
| 
| nikos
| 
| # Here? why is it behaving fifferent than the above ("nikos") and is
| proccessign it all chars in one?

("nikos",) is a single element tuple.
["nikos"] is a single element list.
["nikos",] is also a single element list, just written like the tuple.

You don't see the ["nikos",] form very often because ["nikos"] is not
ambiguous. It is only because ("nikos") gets reduced to plain "nikos"
just like the arithmetic above that you see the ("nikos",) form - the
comma makes it a tuple.

| >>> for s in ["nikos",]:
|     print s
| 
| 
| nikos
| 
| # Here it handles "nikos" as the 1st item of a list right?

Yes.

Cheers,
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

Everything is gone;
Your life's work has been destroyed.
Squeeze trigger (yes/no)?
- Haiku Error Messages http://www.salonmagazine.com/21st/chal/1998/02/10chal2.html



More information about the Python-list mailing list