[Tutor] urllib problem

Steven D'Aprano steve at pearwood.info
Tue Oct 12 16:51:16 CEST 2010


On Tue, 12 Oct 2010 11:58:03 pm Steven D'Aprano wrote:
> On Tue, 12 Oct 2010 11:40:17 pm Roelof Wobben wrote:
> > Hoi,
> >
> > I have this programm :
> >
> > import urllib
> > import re
> > f =
> > urllib.urlopen("http://www.pythonchallenge.com/pc/def/linkedlist.ph
> >p? nothing=6") inhoud = f.read()
> > f.close()
> > nummer = re.search('[0-9]', inhoud)
> > volgende = int(nummer.group())
> > teller = 1
> > while teller <= 3 :
> >       url =
> > "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=" +
> > str(volgende) f = urllib.urlopen(url)
> >       inhoud = f.read()
> >       f.close()
> >       nummer = re.search('[0-9]', inhoud)
> >       print "nummer is", nummer.group()
> >       volgende = int(nummer.group())
> >       print volgende
> >       teller = teller + 1
> >
> > but now the url changes but volgende not.
> > What do I have done wrong ?
>
> Each time through the loop, you set volgende to the same result:
>
> nummer = re.search('[0-9]', inhoud)
> volgende = int(nummer.group())
>
> Since inhoud never changes, and the search never changes, the search
> result never changes, and volgende never changes.

Wait, sorry, inhoud should change... I missed the line inhoud = f.read()

My mistake, sorry about that. However, I can now see what is going 
wrong. Your regular expression only looks for a single digit:

re.search('[0-9]', inhoud)

If you want any number of digits, you need '[0-9]+' instead.


Starting from the first URL:

>>> f = urllib.urlopen(
... "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=6")
>>> inhoud = f.read()
>>> f.close()
>>> print inhoud
and the next nothing is 87599


but:

>>> nummer = re.search('[0-9]', inhoud)
>>> nummer.group()
'8'

See, you only get the first digit. Then looking up the page with 
nothing=8 gives a first digit starting with 5, and then you get stuck 
on 5 forever:

>>> urllib.urlopen(
... "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=8").read() 
'and the next nothing is 59212'
>>> urllib.urlopen(
... "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=5").read() 
'and the next nothing is 51716'


You need to add a + to the regular expression, which means "one or more 
digits" instead of "a single digit".



-- 
Steven D'Aprano


More information about the Tutor mailing list