regex for url paramter

Wed Dec 8 00:13:48 EST 2004

"Robert Brewer" <fumanchu at amor.org> wrote in message
news:mailman.7337.1102459902.5135.python-list at python.org...
Andreas Volz wrote:
> I try to extract a http target from a URL that is given as parameter.
> urlparse couldn't really help me. I tried it like this
>
> url="http://www.example.com/example.html?url=http://www.exampl
> e.org/exa
> mple.html"
>
> p = re.compile( '.*url=')
> url = p.sub( '', url)
> print url
> > http://www.example.org/example.html
>
> This works, but if there're more parameters it doesn't work:
>
> url2="http://www.example.com/example.html?url=http://www.examp
> le.org/exa
> mple.html&param=1"
>
> p = re.compile( '.*url=')
> url2 = p.sub( '', url2)
> print url2
> > http://www.example.org/example.html&param=1
>
> I played with regex to find one that matches also second case with
> multible parameters. I think it's easy, but I don't know how
> to do. Can you help me?

I'd go back to urlparse if I were you.

>>> import urlparse
>>>
url="http://www.example.com/example.html?url=http://www.example.org/example.
html"
>>> urlparse.urlparse(url)
('http', 'www.example.com', '/example.html',
'','url=http://www.example.org/example.html', '')
>>> query = urlparse.urlparse(url)[4]
>>> params = [p.split("=", 1) for p in query.split("&")]
>>> params
[['url', 'http://www.example.org/example.html']]
>>> urlparse.urlparse(params[0][1])
('http', 'www.example.org', '/example.html', '', '', '')

<< Added by Paul>>

Robert Brewer's params list comprehension may be a bit much to swallow all
at once for someone new to Python, but it is a very slick example, and it
works for multiple parameters.
    [p.split("=", 1) for p in query.split("&")]

First of all, you see that the variable query is returned from urlparse and
contains everything in the original url after the '?' mark.  Now the list
comprehension contains 'query.split("&")' - this will return a list of
strings containing each of the individual parameter assignments.  'for p in
query.split("&")' will iterate over this list and give us back the temporary
variable 'p' representing each individual parameter in turn.  For example [p
for p in query.split("&")] is sort of a nonsense list comprehension, it just
builds a list from the list returned from query.split("&").  But instead,
Robert splits each 'p' at its equals sign, so for each parameter we get a
2-element list: the parameter, and its assigned value.  Using a list
comprehension does all of this iteration and list building in one single,
compact statement.

A long spelled out version would look like:
    allparams = query.split("&")
    params = []
    for p in allparams:
        params.append( p.split("=",1) )

Now if we make a slight change Robert Brewer's "params = [p.split..." line
to, and construct a dictionary using dict():
    params = dict( [p.split("=", 1) for p in query.split("&")] )
this will create a dictionary for you (the dict() constructor will accept a
list of pairs, and interpret them as key-value entries into the dictionary).
Then you can reference the params by name.  Here's the example, with more
than one param in the url.

>>>
url="http://www.example.com/example.html?url=http://www.example.org/example.
html&url2=http://www.xyzzy.net/zork.html"
>>> print urlparse.urlparse(url)
('http', 'www.example.com', '/example.html', '',
'url=http://www.example.org/example.html&url2=http://www.xyzzy.net/zork.html
', '')
>>> query = urlparse.urlparse(url)[4]
>>> params = dict([p.split("=", 1) for p in query.split("&")])
>>> print params
{'url': 'http://www.example.org/example.html', 'url2':
'http://www.xyzzy.net/zork.html'}
>>> print params.keys()
['url', 'url2']
>>> print params['url']
http://www.example.org/example.html
>>> print params['url2']
http://www.xyzzy.net/zork.html

List comprehensions are another powerful tool to put in your Python toolbox.

Keep pluggin' away, Andreas!

-- Paul