[Tutor] python myspace module?

Andrew P grouch at gmail.com
Mon Oct 31 07:44:40 CET 2005


Probably the most useful library from the wwwsearch page Danny pointed
you to is the cookielib, which is built into Python as of 2.3 or 2.4. 
The most useful because you can't scrape by without it on most sites,
and cookies are really no fun to handle manually  Login and forms can
largely be fudged, but it's really nice to have cookies handled
transparently.

You'll want to do something like:

url = "http://www.myspace.com"
cj = cookielib.CookieJar()
myURLOpen = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
myURLOpen.addheaders = [('User-agent', 'Mozilla/5.0')]
urlopen = myURLOpen.open

urlopen(url).read()

I think this came straight from the Python docs originally, where it's
explained a bit more fully.  But this will do two things.  1) set up a
client side cookie, and magically handle all cookie magic for you, and
2) change your user agent to 'Mozilla/5.0', completing the illusion
that you are a browser.  Many sites don't like to see user agents like
"Python-urllib/2.4" :)

The rest is just understanding how forms work.  I've never used a
library, just POSTed/GETted the data directly to login and search
sites.  So I can't say how much easier it would be.  Probably quite a
bit.

You can POST data like:

urlopen("https://www.whatever.com/signin.dll",data=authPairs).close()

where authPairs is something like:

authPairs = urllib.urlencode({'userid':'andrew', 'password':'secret1234'})

You'll have to dig around in the html to see whether forms are using
POST or GET, and what variables they are passing, etc.  But that
should be enough to get you started.  Logged in, and able to fetch
pages and fill out forms manually.

Good luck,

Andrew

On 10/25/05, Ed Hotchkiss <edhotchkiss at gmail.com> wrote:
> looking to write a myspace wrapper/module. what is the best way (hopefully
> using the stdlib not an outside module) to connect to a website and (if
> possible, otherwise ill have to code it?) access forms with GET POST blah
> blah blah ...
>
> thanks!
>
> --
> edward hotchkiss
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
>
>


More information about the Tutor mailing list