Confusion with string.replace()

Steve Holden sholden at holdenweb.com
Fri Oct 26 12:49:36 EDT 2001


"Joseph Wilhelm" <jwilhelm at outsourcefinancial.com> wrote in ...
> I'm getting some really strange behaviour from string.replace(), and I was
> wondering if somebody could help explain this to me. All I'm trying to do
is
> escape single quotes in a string for a SQL query, but here's an example of
> what I'm getting:
>
> >>> import string
> >>> a = "a'b"
> >>> b = string.replace( a, "'", "\'" )
> >>> b
> "a'b"
> >>> b = string.replace( a, "'", "\\'" )
> >>> b
> "a\\'b"
> >>>
>
> I just can't seem to wrap my brain around why it's doing that. Can
somebody
> explain that, or perhaps provide an easier option for preparing a SQL
> statement?
>
The major difficulty you are experiencing is understanding the interactive
iterpreter's behavior. However, we also need to talk about SQL, which we'll
do next. Here's a session that might help you to understand a little better:

>>> s1 = "\\"
>>> print s1
\
>>> s1
'\\'
>>> print len(s1)
1
>>> print "repr():", repr(s1), "str:", str(s1)
repr(): '\\' str: \
>>>

When you enter an expression into the interpreter it evaluates it and then
prints out the value (unless it is None) using the __repr__() method. repr()
usually attempts to produce something that would yield the correct value is
given as an argument to eval(), so it puts quotes around strings and uses
backslash quoting on any special characters, like backslashes!

So, replace() is doing exactly what you want it to: in the first example,
the backslash quotes the single-quote, and so you simply replace single
quotes with single-quotes. In the second example the backslash escapes
another backslash, so you replace single quotes with a backslash and a
single quote. When you ask the interpreter to evaluate the resulting string,
however, it uses the repr() of the string, so you see surrounding double
quotes and doubled backslashes. If you take the length of the string you
will find there is actually only one backslash in it.

Why should you not do this to build SQL statements?

Well, first of all, some databases require you to escape single quotes in
string values by putting two single quotes in a row. So, in some SQLs,
rather than 'a\'b' you should use 'a''b' to represent the three character
string containing a single quote between a letter "a" and an letter "b".

Secondly the DB API allows you to use paramterized SQL, and have the
database module put the values in to the SQL statement dynamically, with no
need to do anything except pass them through as Python expressions. So, you
might write:

    name = "Steve O'Reilly"
    address = "12, Dingle's Hill Road"
    age = 35
    sql = "INSERT INTO ctbl (custname, custaddr, custage) VALUES (?, ?, ?)"
    cursor.execute(sql, (name, address, age))

The execute() call substitutes the tuple elements for the parameter markers
(in this case, question marks, but there are five possible parameterization
styles) in the SQL statement. This way of doing things has two advantages:

1. You don't have to pratt about replacing single quotes in string values.
In fact you don't have to bother much with any kind of conversion, since the
DB modules do their best to coerce the values you provide into the
appropriate types to be stored in the database.

2. The DB module will prepare the SQL statement, and if you use the same
statement repreatedly it will not need to repeat the preaparation, gaining a
good deal of performance in the process.

Hope this helps.

regards
 Steve
--
http://www.holdenweb.com/








More information about the Python-list mailing list