How to Encode Parameters into an HTML Parsing Script

SMERSH009X at gmail.com SMERSH009X at gmail.com
Fri Jun 22 04:37:07 CEST 2007


I've written a Script that navigates various urls on a website, and
fetches the contents.
The Url's are being fed from a list "urlList". Everything seems to
work splendidly, until I introduce the concept of encoding parameters
for a certain url.
So for example if I wanted to navigate to an encoded url
http://online.investools.com/landing.iedu?signedin=true rather than
just http://online.investools.com/landing.iedu   How would I do this?
How can I modify the script to urlencode these parameters:
{signedin:true}     and to associate them with a specific url from the
urlList
 Thank you!


import datetime, time, re, os, sys, traceback, smtplib, string,
urllib2, urllib, inspect
from urllib2 import build_opener, HTTPCookieProcessor, Request
opener = build_opener(HTTPCookieProcessor)
from urllib import urlencode

def urlopen2(url, data=None, user_agent='urlopen2'):
    """Opens Our URLS """
    if hasattr(data, "__iter__"):
        data = urlencode(data)
        headers = {'User-Agent' : user_agent}  # User-Agent for
Unspecified Browser
    return opener.open(Request(url, data, headers))

def badCharCheck(host,url):
    try:
        page = urlopen2("http://"+host+".investools.com/"+url+"", ())
        pageRead= page.read()
        print "Loading:",url
        #print pageRead
    except:
        print "Failed: ", traceback.format_tb(sys.exc_info()[2]),'\n'


if __name__ == '__main__':
    host= "online"
    urlList = ["landing.iedu","sitemap.iedu"]
    print "\n","***** Begin BadCharCheck for", host
    for url in urlList:
        badCharCheck(host,url)

    print'***** TEST FINISHED! Total Runs:'
    sys.exit()

OUTPUT:
***** Begin BadCharCheck for online
Loading: landing.iedu
Loading: sitemap.iedu
***** TEST FINISHED! Total Runs:




More information about the Python-list mailing list