Curious to see alternate approach on a search/replace via regex

Wed Feb 6 22:04:39 EST 2013

On Wed, 06 Feb 2013 13:55:58 -0800, Demian Brecht wrote:

> Well, an alternative /could/ be:
> 
> from urlparse import urlparse
> 
> parts =
> urlparse('http://alongnameofasite1234567.com/q?sports=run&a=1&b=1')
> print '%s%s_%s' % (parts.netloc.replace('.', '_'),
>     parts.path.replace('/', '_'),
>     parts.query.replace('&', '_').replace('=', '_') )
> 
> 
> Although with the result of:
> 
> alongnameofasite1234567_com_q_sports_run_a_1_b_1
>          1288 function calls in 0.004 seconds
> 
> 
> Compared to regex method:
> 
> 498 function calls (480 primitive calls) in 0.000 seconds
> 
> I'd prefer the regex method myself.

I dispute those results. I think you are mostly measuring the time to 
print the result, and I/O is quite slow. My tests show that using urlparse 
is 33% faster than using regexes, and far more understandable and 
maintainable.

py> from urlparse import urlparse
py> def mangle(url):
...     parts = urlparse(url)
...     return '%s%s_%s' % (parts.netloc.replace('.', '_'),
...             parts.path.replace('/', '_'),
...             parts.query.replace('&', '_').replace('=', '_')
...             )
... 
py> import re
py> def u2f(u):
...     nx = re.compile(r'https?://(.+)$')
...     u = nx.search(u).group(1)
...     ux = re.compile(r'([-:./?&=]+)')
...     return ux.sub('_', u)
... 
py> s = 'http://alongnameofasite1234567.com/q?sports=run&a=1&b=1'
py> assert u2f(s) == mangle(s)
py> 
py> from timeit import Timer
py> setup = 'from __main__ import s, u2f, mangle'
py> t1 = Timer('mangle(s)', setup)
py> t2 = Timer('u2f(s)', setup)
py> 
py> min(t1.repeat(repeat=7))
7.2962000370025635
py> min(t2.repeat(repeat=7))
10.981598854064941
py>
py> (10.98-7.29)/10.98
0.33606557377049184

(Timings done using Python 2.6 on my laptop -- your speeds may vary.)

-- 
Steven