[Tutor] OT SQL (but through Python...)

Liam Clarke cyresse at gmail.com
Fri Feb 25 12:38:00 CET 2005


A light dawns, and I now understand how SQL code injection attacks can happen.

Looks like I'm going to have to rethink & re-examine some docs....

Cheers, 

Liam 

On Fri, 25 Feb 2005 06:15:05 -0500, Kent Johnson <kent37 at tds.net> wrote:
> Liam Clarke wrote:
> > Hi,
> >
> > Hope I don't annoy anyone by asking this here, if I do, let me know.
> >
> > When you're doing a SQL select statement, what would be better? Say
> > you're searching by name, should I do -
> > j = cx.execute
> > j('select * from foo where first == %s and last == %s') % (a,b)
> > q = cx.fetchall()
> > if not q:
> >     j('select * from foo where first like %s%% and last like %s%%') % (a,b)
> >
> > or just use
> >
> > j('select * from foo where first like %s%% and last like %s%%') % (a,b)
> >
> > straight off the bat?
> 
> First, a code correction. You should never substitute user values into your SQL. Use the database
> driver's capabilities instead. That will ensure that special characters are escaped correctly.
> 
> The syntax to do this varies for each database - the 'paramstyle' attribute of the database driver
> will tell you which style to use; the docs should also mention it.
> 
> IIRC you are using SQLite, which says,
>   >>> import sqlite
>   >>> sqlite.paramstyle
> 'pyformat'
> 
> The docs have this example:
>   >>> cu.execute("insert into test(u1, u2) values (%s, %s)", (u"\x99sterreich", u"Ungarn"))
> 
> so I think your code should be
>    j('select * from foo where first == %s and last == %s', (a,b))
> 
> (note the final '%' is now a comma and (a, b) is a parameter passed to j.)
> 
> Why should you care? Imagine a user searches for "Tim" "John'son". Then your SQL becomes
>    select * from foo where first == Tim and last == John'son
> which will probably give you a syntax errer.
> 
> Worse, this opens you up to malicious attacks. What if I search for "Tim" "Johnson; delete table
> foo"? Now your SQL is
>    select * from foo where first == Tim and last == Johnson; delete table foo
> 
> ...oops
> 
> Finally, some databases will optimize repeated queries. If you substitute the string yourself you
> don't get this benefit because the query string is different each time.
> 
> Two more minor notes about the SQL - I use =, not == - actually I'm surprised == works. And if you
> are going to have string literals in your SQL put them in 'quotes'.
> 
> >
> > The first method gives me direct match, and searches for alternatives
> > if none are found,
> > the second may not give me a direct match if there is one.
> >
> > i.e
> >
> > first             last
> >
> > Tim           Johns
> > Timothy    Johnston
> >
> > the 1st method will find the 1st row for a='Tim' b = 'Johns's, but the
> > 2nd method will find both.
> >
> > I'm asking more from a user perspective really, would the quicker
> > action of the first outweigh the inconstant UI resulting from a
> > search?
> 
> This isn't really a database question but a UI question. Who are the users? Can you ask them what
> they would prefer? If it is just for you, what do you prefer?
> 
> Maybe you want an "Exact match" checkbox in your GUI?
> 
> I wouldn't decide on the basis of speed, for any decent database you won't notice the difference.
> 
> Kent
> 
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
> 


-- 
'There is only one basic human right, and that is to do as you damn well please.
And with it comes the only basic human duty, to take the consequences.


More information about the Tutor mailing list