[DB-SIG] How can I reliably detect whether an SQL statement is a Query?

Chris Clark Chris.Clark at ingres.com
Tue Aug 3 20:32:59 CEST 2010


Andy Dustman wrote:
> On Mon, Aug 2, 2010 at 5:57 AM, Vernon Cole <vernondcole at gmail.com> wrote:
>   
>> ...1) I am not confident in my ability to understand Regular Expression
>> strings.
>> ....
>> It seems to me that this code could be confused by the substring 'SELECT'
>> being included as part of a longer string, or in a string literal. Am
>> reading it wrong?
>>     
>
> The way your expression is written, it only matches SELECT at the
> beginning of the line (after any whitespace).
>   

You're both correct :-)

Consider the following:

--- Cut here ---
CREATE VIEW AS
SELECT .....
--- Cut here ---

This is not a row returning statement :-)

Also consider:
--- Cut here ---
/* This is the best piece of SQL ever! */ SELECT .....
--- Cut here ---

This should be a row returning statement :-)

It is possible to handle this with a regex but it would need to be 
slightly more complicated. Regexs are great if you are able to create 
them quickly as they are easy to use, but they tend to run a lot slower 
than a hand written piece of code (the hand written code will take 
longer to write though).

>> It also seems to me that I should be able to detect a query by the fact that
>> the first token in the command will be either 'SELECT' or 'WITH, but would
>> that still be true for other dialects of SQL?'
>>
>> I am thinking of using something like:
>> <code>
>> def is_query(operation):
>>     return operation.split(' ')[0].upcase in ['SELECT','WITH']
>> </code>
>>
>> Good idea, or Bad idea?
>>     


This won't handle newlines, or tabs. So in its current form I prefer the 
original regex.


>> Any comments appreciated.
>>     
>
> What are the consequences of using ExecuteReader() when there is
> nothing to read? If none, i.e. you get an empty set of results, then I
> would say to use that all the time, and don't bother to examine your
> SQL at all.
>   

I think this is worth checking out, if you can avoid parsing (or partial 
parsing) that is likely to be a better solution (assuming this doesn't 
cause additional database IO which can be costly across a network). The 
DBMS already does this sort of work so if the driver can avoid it, that 
is a good thing. It may not work out but it is worth testing.

If you want to play with regex I can recommend 
http://kodos.sourceforge.net/ it is an excellent GUI to play with regex, 
it can even generate Python code :-)

There is a pyparsing SQL parser that you could use instead, this would 
handle your requirement of; "reliably detect whether an SQL statement is 
a Query". The regex and split above are not reliable but would probably 
handle 80-90% of the SQL you throw at it.

Vernon - thanks for looking into this, it will be nice to have a 
supported/maintained IronPython dbi .NET driver.

Chris



More information about the DB-SIG mailing list