[Tutor] Get variable values [Introduction to Planet RSS news aggregator]
Danny Yoo
dyoo at hkn.eecs.berkeley.edu
Fri Feb 2 00:35:52 CET 2007
> Yes, i've read the two docs.
> But my problem is more related with Python.
>
> If you read my previous post with the links to the code, my doubt is how
> to get the values for "url", "content", "name", etc.
Hi Mario,
After taking a much closer look at the code you mentioned here:
http://pastebin.com/872998
it looks like you're supposed to have a "NewsItem" in hand.
Ok, wait. I think I have a better idea of what you're trying to do.
Let me try to dissuade you from doing what you're doing. *grin*
There should be no reason for mucking into the definition of Planet's
implementation in order to make it do what you want: you should be able to
just treat Planet as a library, and use it to do what you want. You should
almost certainly not touch the internals of get_content(), if I understand
what you're trying to do: that's private to the implementation of Planet
and a very bad approach toward code reuse.
Rather than hack at NewsItem.get_content() to get it to insert into a
database, it's probably a lot better to not modify Planet, but rather
write new programs that use Planet. Respect the library and treat it as
if it were a resource. If Scott James Remnant and Jeff Waugh take their
code at:
http://www.planetplanet.org/
and update it, or correct bugs, then you do not want to have to manually
update your own code to patch things up the same way.
Concretely, if we want to take a feed and print out all the titles, we
should not be modify the get_title() method of these news items in a
private copy of the Planet library. Rather, we can more simply use Planet
as an external library:
#################################################
>>> import planet
>>> import ConfigParser
>>> config = ConfigParser.ConfigParser()
>>> p = planet.Planet(config)
>>> c = planet.Channel(p,
"http://hashcollision.blogspot.com/feeds/posts/default")
>>> c.update()
>>> len(c.items())
25
#################################################
Ok, there are 25 items here. Let's take a look at the titles:
##################################################
>>> for item in c:
... print item.title
...
latex
in summation...
heresy
debugging test-case
new year
[text output truncated]
##################################################
Let's look at a particular item in the channel.
####################################################
>>> firstItem = c.items()[0]
>>> firstItem.title
'how not to write xml'
>>> firstItem.id
'tag:blogger.com,1999:blog-18302393.post-116249176169366001'
>>> firstItem.link
'http://hashcollision.blogspot.com/2006/11/how-not-to-write-xml.html'
>>> firstItem.summary
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/lib/python2.4/site-packages/planet/cache.py", line 279, in
__getattr__
raise AttributeError, key
AttributeError: summary
####################################################
Ok, so some things are not defined. That's to be expected. What things
are defined for my news item?
####################################################
>>> firstItem.keys()
['updated', 'subtitle', 'title', 'author', 'author_name', 'order',
'content', 'link', 'published', 'date', 'id_hash', 'id']
>>> firstItem.author
'Danny Yoo'
####################################################
(It really is _my_ news item. *wink*)
According to the documentation of a NewsItem, you can expect to see the
following (usually):
#####################################################################
id Channel-unique identifier for this item.
id_hash Relatively short, printable cryptographic hash of id
date Corrected UTC-Normalised update time, for sorting.
order Order in which items on the same date can be sorted.
hidden Item should be hidden (True if exists).
title One-line title (*).
link Link to the original format text (*).
summary Short first-page summary (*).
content Full HTML content.
modified Date the item claims to have been modified (*).
issued Date the item claims to have been issued (*).
created Date the item claims to have been created (*).
expired Date the item claims to expire (*).
author Name of the author (*).
publisher Name of the publisher (*).
category Category name (*).
comments Link to a page to enter comments (*).
license Link to the licence for the content (*).
source_name Name of the original source of this item (*).
source_link Link to the original source of this item (*).
#####################################################################
In able to see help documentation on planet, use the help() function at
the prompt:
#################
>>> import planet
>>> help(planet)
#################
The documentation on Planet is a bit focused for developers: the authors
expect you to already know Python before touching Planet, so you might
have some rough going at first.
Does this help you get started? Please ask more questions if you have
them.
More information about the Tutor
mailing list