traversing yahoogroups group messages

Peter Hansen peter at engcorp.com
Sun Sep 12 18:07:39 EDT 2004


lothar wrote:

> i want to traverse a set of messages in a Yahoogroups group from a Python
> program.
> 
> to get to the messages of the group, one must log in.
> 
> this presents, i think, two problems,
> 1) handling the form element for the login, which has a javascript submit
> routine,
> 2) keeping login state with cookies.
> 
> to someone who knows something about the issues here, my questions are:
> 
> 1) is it possible to do this in Python?

Yes.

> 2) if so, how do i handle the form and the javascript?

There are a variety of approaches, including ones which depend on which
platform you are using (e.g. Win32, Linux, other?) and which depend
on how sophisticated and flexible you want the result to be.

> 3) does Java Python have a javascript engine and do i need Java Python here?

Do you realize that Java has absolutely nothing to do with Javascript
except forming part of its name?  And no, you don't need it here.

> 4) if i need to use cookies, how do i know what to name and what to set into
> a cookie?

By asking the server, and watching the cookies that come back from
it.  The ClientCookie module would presumably help.  You could also
just turn off cookies in your browser and access the site, and see
if it still works... maybe you don't need them at all.

> for context, i include the form element below.
> the submit routine hash() is javascript.

There have been similar questions and many responses on this subject
in the past.  I suggest using Google Groups to check the newsgroup
archives, using search words such as "web scraping", possibly paying
close attention to any threads with responses by Cameron Laird or
John J Lee.  ;-)

-Peter



More information about the Python-list mailing list