[Tutor] Get variable values [Introduction to Planet RSS news aggregator]

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Fri Feb 2 00:35:52 CET 2007



> Yes, i've read the two docs.
> But my problem is more related with Python.
>
> If you read my previous post with the links to the code, my doubt is how 
> to get the values for "url", "content", "name", etc.

Hi Mario,


After taking a much closer look at the code you mentioned here:

     http://pastebin.com/872998

it looks like you're supposed to have a "NewsItem" in hand.


Ok, wait.  I think I have a better idea of what you're trying to do.

Let me try to dissuade you from doing what you're doing.  *grin*


There should be no reason for mucking into the definition of Planet's 
implementation in order to make it do what you want: you should be able to 
just treat Planet as a library, and use it to do what you want. You should 
almost certainly not touch the internals of get_content(), if I understand 
what you're trying to do: that's private to the implementation of Planet 
and a very bad approach toward code reuse.


Rather than hack at NewsItem.get_content() to get it to insert into a 
database, it's probably a lot better to not modify Planet, but rather 
write new programs that use Planet.  Respect the library and treat it as 
if it were a resource.  If Scott James Remnant and Jeff Waugh take their 
code at:

     http://www.planetplanet.org/

and update it, or correct bugs, then you do not want to have to manually 
update your own code to patch things up the same way.



Concretely, if we want to take a feed and print out all the titles, we 
should not be modify the get_title() method of these news items in a 
private copy of the Planet library.  Rather, we can more simply use Planet 
as an external library:

#################################################
>>> import planet
>>> import ConfigParser
>>> config = ConfigParser.ConfigParser()
>>> p = planet.Planet(config)
>>> c = planet.Channel(p, 
"http://hashcollision.blogspot.com/feeds/posts/default")
>>> c.update()
>>> len(c.items())
25
#################################################


Ok, there are 25 items here.  Let's take a look at the titles:

##################################################
>>> for item in c:
...     print item.title
...
latex
in summation...
heresy
debugging test-case
new year
[text output truncated]
##################################################


Let's look at a particular item in the channel.

####################################################
>>> firstItem = c.items()[0]
>>> firstItem.title
'how not to write xml'
>>> firstItem.id
'tag:blogger.com,1999:blog-18302393.post-116249176169366001'
>>> firstItem.link
'http://hashcollision.blogspot.com/2006/11/how-not-to-write-xml.html'
>>> firstItem.summary
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
   File "/usr/lib/python2.4/site-packages/planet/cache.py", line 279, in 
__getattr__
     raise AttributeError, key
AttributeError: summary
####################################################



Ok, so some things are not defined.  That's to be expected.  What things 
are defined for my news item?


####################################################
>>> firstItem.keys()
['updated', 'subtitle', 'title', 'author', 'author_name', 'order', 
'content', 'link', 'published', 'date', 'id_hash', 'id']
>>> firstItem.author
'Danny Yoo'
####################################################

(It really is _my_ news item.  *wink*)


According to the documentation of a NewsItem, you can expect to see the 
following (usually):

#####################################################################
id              Channel-unique identifier for this item.
id_hash         Relatively short, printable cryptographic hash of id
date            Corrected UTC-Normalised update time, for sorting.
order           Order in which items on the same date can be sorted.
hidden          Item should be hidden (True if exists).
title           One-line title (*).
link            Link to the original format text (*).
summary         Short first-page summary (*).
content         Full HTML content.
modified        Date the item claims to have been modified (*).
issued          Date the item claims to have been issued (*).
created         Date the item claims to have been created (*).
expired         Date the item claims to expire (*).
author          Name of the author (*).
publisher       Name of the publisher (*).
category        Category name (*).
comments        Link to a page to enter comments (*).
license         Link to the licence for the content (*).
source_name     Name of the original source of this item (*).
source_link     Link to the original source of this item (*).
#####################################################################


In able to see help documentation on planet, use the help() function at 
the prompt:

#################
>>> import planet
>>> help(planet)
#################


The documentation on Planet is a bit focused for developers: the authors 
expect you to already know Python before touching Planet, so you might 
have some rough going at first.


Does this help you get started?  Please ask more questions if you have 
them.


More information about the Tutor mailing list