Curious to see alternate approach on a search/replace via regex
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Wed Feb 6 22:04:39 EST 2013
On Wed, 06 Feb 2013 13:55:58 -0800, Demian Brecht wrote:
> Well, an alternative /could/ be:
>
> from urlparse import urlparse
>
> parts =
> urlparse('http://alongnameofasite1234567.com/q?sports=run&a=1&b=1')
> print '%s%s_%s' % (parts.netloc.replace('.', '_'),
> parts.path.replace('/', '_'),
> parts.query.replace('&', '_').replace('=', '_') )
>
>
> Although with the result of:
>
> alongnameofasite1234567_com_q_sports_run_a_1_b_1
> 1288 function calls in 0.004 seconds
>
>
> Compared to regex method:
>
> 498 function calls (480 primitive calls) in 0.000 seconds
>
> I'd prefer the regex method myself.
I dispute those results. I think you are mostly measuring the time to
print the result, and I/O is quite slow. My tests show that using urlparse
is 33% faster than using regexes, and far more understandable and
maintainable.
py> from urlparse import urlparse
py> def mangle(url):
... parts = urlparse(url)
... return '%s%s_%s' % (parts.netloc.replace('.', '_'),
... parts.path.replace('/', '_'),
... parts.query.replace('&', '_').replace('=', '_')
... )
...
py> import re
py> def u2f(u):
... nx = re.compile(r'https?://(.+)$')
... u = nx.search(u).group(1)
... ux = re.compile(r'([-:./?&=]+)')
... return ux.sub('_', u)
...
py> s = 'http://alongnameofasite1234567.com/q?sports=run&a=1&b=1'
py> assert u2f(s) == mangle(s)
py>
py> from timeit import Timer
py> setup = 'from __main__ import s, u2f, mangle'
py> t1 = Timer('mangle(s)', setup)
py> t2 = Timer('u2f(s)', setup)
py>
py> min(t1.repeat(repeat=7))
7.2962000370025635
py> min(t2.repeat(repeat=7))
10.981598854064941
py>
py> (10.98-7.29)/10.98
0.33606557377049184
(Timings done using Python 2.6 on my laptop -- your speeds may vary.)
--
Steven
More information about the Python-list
mailing list