[ python-Feature Requests-1285086 ] urllib.quote is too slow

SourceForge.net noreply at sourceforge.net
Sat Sep 10 05:45:52 CEST 2005


Feature Requests item #1285086, was opened at 2005-09-08 11:37
Message generated for change (Comment added) made by rhettinger
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1285086&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
>Category: Python Library
>Group: None
Status: Open
Resolution: None
>Priority: 2
Submitted By: Tres Seaver (tseaver)
Assigned to: Nobody/Anonymous (nobody)
Summary: urllib.quote is too slow

Initial Comment:
'urllib.quote' delegates to '_fast_quote' for the common
case that the user has passed no 'safe' argument.  However,
'_fast_quote' isn't really very fast, especially for
the case that
 it doesn't need to quote anything.

Zope (and presumably other web frameworks) can end up
calling 'quote' dozens, hundreds, even thousands of times
to render a page, which makes this a potentially big win
for them.

I will attach a speed test script which demonstrates the
speed penalty, along with a patch which implements the
speedup.

----------------------------------------------------------------------

>Comment By: Raymond Hettinger (rhettinger)
Date: 2005-09-09 22:45

Message:
Logged In: YES 
user_id=80475

Checked in a speed-up for Py2.5.
See Lib/urllib.py 1.169.

The check-in provides fast-quoting for all cases (not just
for the default safe argument).  Even the fast path is
quicker.  With translation for both safe and unsafe
characters, it saves len(s) trips through the eval loop,
computes of non-safe replacements just once, and eliminates
the if-logic.  The new table is collision free and has no
failed lookups, so each lookup requires exactly one probe. 
One my machine, timings improved by a factor of two to three
depending on the length of input and number of escaped
characters.

The check-in also simplifies and speeds-up quote_plus() by
using str.replace() instead of a split

Leaving this SF report open because the OP's idea may
possibly provide further improvement -- the checkin itself
was done because it is a clear win over the existing version.

The OP's patch uses regexps to short-circuit when no changes
are needed.  Unless the regexp is cheap and short-circuits
often, the cost of testing will likely exceed the average
amount saved.

Determining whether the regexp is cheaper than the
checked-in version just requires a few timings.  But,
determining the short-circuit percentage requires collecting
statistics from real programs with real data.  For the idea
to be a winner, regexps have to be much faster than the
map/lookup/join step AND the short-circuit case must occur
frequently.

Am lowering the priority until a better patch is received
along with timings and statistical evidence demonstrating a
significant improvement.  Also, reclassifying as a Feature
Request because the existing code is functioning as
documented and passing tests.


----------------------------------------------------------------------

Comment By: Tres Seaver (tseaver)
Date: 2005-09-08 21:35

Message:
Logged In: YES 
user_id=127625

Note that the speed test script shows equivalent speedups for
both 2.3 and 2.4, ranging from 90% (for the empty string) down
to 73% (for a string with a single character).  The more
"normal"
cases range from 82% to 89% speedups.

----------------------------------------------------------------------

Comment By: Tres Seaver (tseaver)
Date: 2005-09-08 21:30

Message:
Logged In: YES 
user_id=127625

I'm attaching a patch against 2.4's version

----------------------------------------------------------------------

Comment By: Jeff Epler (jepler)
Date: 2005-09-08 20:01

Message:
Logged In: YES 
user_id=2772

Tested on Python 2.4.0.  The patch fails on the first chunk
because the list of imports don't match.

The urllib_fast_quote_speed_test.py doesn't run once urllib
has been patched.

I reverted the patch to urllib.py and re-ran.  I got
"faster" values from 0.758 to 0.964.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1285086&group_id=5470


More information about the Python-bugs-list mailing list