[Tutor] OT SQL (but through Python...)

Fri Feb 25 12:38:00 CET 2005

A light dawns, and I now understand how SQL code injection attacks can happen.

Looks like I'm going to have to rethink & re-examine some docs....

Cheers, 

Liam 

On Fri, 25 Feb 2005 06:15:05 -0500, Kent Johnson <kent37 at tds.net> wrote:
> Liam Clarke wrote:
> > Hi,
> >
> > Hope I don't annoy anyone by asking this here, if I do, let me know.
> >
> > When you're doing a SQL select statement, what would be better? Say
> > you're searching by name, should I do -
> > j = cx.execute
> > j('select * from foo where first == %s and last == %s') % (a,b)
> > q = cx.fetchall()
> > if not q:
> >     j('select * from foo where first like %s%% and last like %s%%') % (a,b)
> >
> > or just use
> >
> > j('select * from foo where first like %s%% and last like %s%%') % (a,b)
> >
> > straight off the bat?
> 
> First, a code correction. You should never substitute user values into your SQL. Use the database
> driver's capabilities instead. That will ensure that special characters are escaped correctly.
> 
> The syntax to do this varies for each database - the 'paramstyle' attribute of the database driver
> will tell you which style to use; the docs should also mention it.
> 
> IIRC you are using SQLite, which says,
>   >>> import sqlite
>   >>> sqlite.paramstyle
> 'pyformat'
> 
> The docs have this example:
>   >>> cu.execute("insert into test(u1, u2) values (%s, %s)", (u"\x99sterreich", u"Ungarn"))
> 
> so I think your code should be
>    j('select * from foo where first == %s and last == %s', (a,b))
> 
> (note the final '%' is now a comma and (a, b) is a parameter passed to j.)
> 
> Why should you care? Imagine a user searches for "Tim" "John'son". Then your SQL becomes
>    select * from foo where first == Tim and last == John'son
> which will probably give you a syntax errer.
> 
> Worse, this opens you up to malicious attacks. What if I search for "Tim" "Johnson; delete table
> foo"? Now your SQL is
>    select * from foo where first == Tim and last == Johnson; delete table foo
> 
> ...oops
> 
> Finally, some databases will optimize repeated queries. If you substitute the string yourself you
> don't get this benefit because the query string is different each time.
> 
> Two more minor notes about the SQL - I use =, not == - actually I'm surprised == works. And if you
> are going to have string literals in your SQL put them in 'quotes'.
> 
> >
> > The first method gives me direct match, and searches for alternatives
> > if none are found,
> > the second may not give me a direct match if there is one.
> >
> > i.e
> >
> > first             last
> >
> > Tim           Johns
> > Timothy    Johnston
> >
> > the 1st method will find the 1st row for a='Tim' b = 'Johns's, but the
> > 2nd method will find both.
> >
> > I'm asking more from a user perspective really, would the quicker
> > action of the first outweigh the inconstant UI resulting from a
> > search?
> 
> This isn't really a database question but a UI question. Who are the users? Can you ask them what
> they would prefer? If it is just for you, what do you prefer?
> 
> Maybe you want an "Exact match" checkbox in your GUI?
> 
> I wouldn't decide on the basis of speed, for any decent database you won't notice the difference.
> 
> Kent
> 
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
> 

-- 
'There is only one basic human right, and that is to do as you damn well please.
And with it comes the only basic human duty, to take the consequences.