[Tutor] Pictures
louis leichtnam
l.leichtnam at gmail.com
Thu May 5 04:45:19 CEST 2011
Hello,
I tried it and it works, though if I change the web site from
http://finance.blog.lemonde.fr to http://www. ...something else it doesn't
work.
DO I have to change the '([\S]+)'
in x=re.findall(r"<img\s+src='([\S]+)'",page)? but into what?
Thanks a lot
2011/4/28 naheed arafat <naheedcse at gmail.com>
> Observing the page source i think :
>
> page=urllib.urlopen('http://finance.blog.lemonde.fr').read()
>
> x=re.findall(r"<img\s+src='([\S]+)'",page)
> #matches image source of the pattern like:
> #<img src='
> http://finance.blog.lemonde.fr/filescropped/7642_300_400/2011/04/1157.1301668834.jpg
> '
> y=re.findall(r"<img\s+src=\"([\S]+)\"",page)
> # matches image source of the pattern like:
> # <img src="
> http://s2.lemde.fr/image/2011/02/16/87x0/1480844_7_87fe_bandeau-lycee-electrique.jpg
> "
> x.extend(y)
> x=list(set(x))
> for img in x:
> image=img.split('.')[-1]
> if image=='jpg':
> print img
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110504/b4f536cd/attachment.html>
More information about the Tutor
mailing list