retrieving ATOM/FSS feeds
_spitFIRE
timid.gentoo at gmail.com
Mon Aug 13 06:07:14 EDT 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Lawrence Oluyede wrote:
> If the content producer doesn't provide the full article via RSS/ATOM
> there's no way you can get it from there. Search for full content feeds
> if any, otherwise get the article URL and feed it to BeautifulSoup to
> scrape the content.
>
For the same feed (where the content producer doesn't provide the full
article!) I was able to see the complete post in other RSS aggregators (like
Blam). I wanted to know how they were able to collect the feed!
I knew for sure that you can't do screen scraping separately for each and
every blog and that there has be a standard way or atleast that blogs
maintain a standard template for rendering posts. I mean if each of the site
only offered partial content and the rest had to be scraped from the page,
and the page maintained a non-standard structure which is more likely, then
it would become impossible IMHO for any aggregator to aggregate feeds!
I shall for now try with BeautifulSoup, though I'm still doubtful about it.
- --
_ _ _]{5pitph!r3}[_ _ _
__________________________________________________
“I'm smart enough to know that I'm dumb.”
- Richard P Feynman
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGwC1SA0th8WKBUJMRAs4eAJ0bLJVzEZls1JtE6e8MUrqdapXGPwCfVO02
yYzezvhJFY1SDHUGxrJdR5M=
=rfLo
-----END PGP SIGNATURE-----
More information about the Python-list
mailing list