Access to database other web sites

Cameron Laird claird at lairds.com
Sat Sep 27 11:24:21 EDT 2003


In article <87n0crbi8z.fsf at pobox.com>, John J. Lee <jjl at pobox.com> wrote:
>tibi87 at hanmail.net (Jenny) writes:
>
>> I am doing research about realationship between sales rates and
>> discounted prices or recommendation frequency. To do this, I need to
>> access the database of commercial web sites via internet. I think this
>> is possible because it it simmilar to the work of price comparison
>> sites and web robot.
>
>IIUYC, what you're contemplating is called "web scraping" -- at least,
>it is by Cameron Laird, and I like the name.  Others might know it as
>"web client programming".  Cameron wrote an article about this a while
>back (Unix Review?) which you might like if you're a newbie -- Google
>for it (but note that the Perl book he mentions has actually been
>replaced by a newer one by Sean Burke, also from O'Reilly).
>
>
>> I am studying python these days because I thinks it is a good language
>> for the work.
>[...]
>
>I think so too.
			.
		[excellent and detailed
		technical advice]
			.
			.
Also filling a niche in this territory is PyCurl <URL: http://pycurl.sf.net >.
The references at <URL: http://wiki.tcl.tk/WebScraping > are likely to be at
least inspirational.

I'm ... reserved about the prospects for the proposed research.  The commercial
sites you want to study are, in my experience, some of the most difficult to
"scrape".  Complementing that difficulty is the poverty of inference I antici-
pate you'll be able to ground on what you find there; their commerce has a lot
more noise than signal, as I see it.  'Twould be great, though, for you to
uncover something real.  Good luck.
-- 

Cameron Laird <Cameron at Lairds.com>
Business:  http://www.Phaseit.net
Personal:  http://phaseit.net/claird/home.html




More information about the Python-list mailing list