[Tutor] urllib problem
Steven D'Aprano
steve at pearwood.info
Tue Oct 12 16:51:16 CEST 2010
On Tue, 12 Oct 2010 11:58:03 pm Steven D'Aprano wrote:
> On Tue, 12 Oct 2010 11:40:17 pm Roelof Wobben wrote:
> > Hoi,
> >
> > I have this programm :
> >
> > import urllib
> > import re
> > f =
> > urllib.urlopen("http://www.pythonchallenge.com/pc/def/linkedlist.ph
> >p? nothing=6") inhoud = f.read()
> > f.close()
> > nummer = re.search('[0-9]', inhoud)
> > volgende = int(nummer.group())
> > teller = 1
> > while teller <= 3 :
> > url =
> > "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=" +
> > str(volgende) f = urllib.urlopen(url)
> > inhoud = f.read()
> > f.close()
> > nummer = re.search('[0-9]', inhoud)
> > print "nummer is", nummer.group()
> > volgende = int(nummer.group())
> > print volgende
> > teller = teller + 1
> >
> > but now the url changes but volgende not.
> > What do I have done wrong ?
>
> Each time through the loop, you set volgende to the same result:
>
> nummer = re.search('[0-9]', inhoud)
> volgende = int(nummer.group())
>
> Since inhoud never changes, and the search never changes, the search
> result never changes, and volgende never changes.
Wait, sorry, inhoud should change... I missed the line inhoud = f.read()
My mistake, sorry about that. However, I can now see what is going
wrong. Your regular expression only looks for a single digit:
re.search('[0-9]', inhoud)
If you want any number of digits, you need '[0-9]+' instead.
Starting from the first URL:
>>> f = urllib.urlopen(
... "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=6")
>>> inhoud = f.read()
>>> f.close()
>>> print inhoud
and the next nothing is 87599
but:
>>> nummer = re.search('[0-9]', inhoud)
>>> nummer.group()
'8'
See, you only get the first digit. Then looking up the page with
nothing=8 gives a first digit starting with 5, and then you get stuck
on 5 forever:
>>> urllib.urlopen(
... "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=8").read()
'and the next nothing is 59212'
>>> urllib.urlopen(
... "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=5").read()
'and the next nothing is 51716'
You need to add a + to the regular expression, which means "one or more
digits" instead of "a single digit".
--
Steven D'Aprano
More information about the Tutor
mailing list