AW: How to grab HTML files behind authentification

DirkK d_krause at pixelpark.com
Thu Jun 28 05:56:35 EDT 2001


To be honest, I didn't try this (though it's obvious). But on the other
hand, the other approach is a bit more transparent, because you control the
header yourself. Do you have another easy solution for cookies (this would
have been the next thing I need)?

Dirk

-----Ursprungliche Nachricht-----
Von: Oleg Broytmann [mailto:phd at phd.fep.ru]
Gesendet: Donnerstag, 28. Juni 2001 11:51
An: Dirk Krause
Cc: python-list at python.org
Betreff: Re: How to grab HTML files behind authentification


Thank you. But do you know urllib can do the same even simpler?

urllib.urlretrieve("myName:myPassword at http://www.something.com/secret/index.
html")

On 28 Jun 2001, Dirk Krause wrote:
>   I've put together some code the python community might find useful.
> You can use this script to automatically spider web pages beyond the
> www-authenticate Dialog Box.
>
> ---snip---
> import httplib, string, base64
>
> # How to grab HTML files behind authentification
> # author: Dirk Krause, 06/28/2001
> # change these entries below!!
>
> base = 'http://www.something.com'
> path = '/secret/index.html'
>
> u_name = 'myName'
> u_pwd  = 'myPassword'
>
>
> # ok, here goes
>
> hlink = httplib.HTTP(base)
> hlink.putrequest('GET', path+' HTTP/1.0')
> hlink.putheader('Host', base)
>
> hlink.putheader('Accept', 'text/html')
> hlink.putheader('Accept', 'text/plain')
>
> temp = "%s:%s" % (u_name,u_pwd)
> temp = base64.encodestring(temp)
> temp = "Basic %s" % string.strip(temp)
> hlink.putheader("Authorization",temp)
>
> hlink.endheaders()
>
> errcode, errmsg, header = hlink.getreply()
> content = hlink.getfile().read()
>
> print content
> print errcode, header

Oleg.
----
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20010628/f2abb8bb/attachment.html>


More information about the Python-list mailing list