Getting Newsgroup Headers
R. David Murray
rdmurray at bitdance.com
Thu Apr 16 22:13:24 EDT 2009
aslkoi fdsda <pythonzine at gmail.com> wrote:
> I would like to read just the headers out of a newsgroup.
> Being a Python newbie, I was wondering if this is possible and how difficult
> it would be for a novice Python programmer.
> Thanks for any reply!
> [HTML part not displayed]
It's not hard at all. I've pulled some bits and pieces out of the
self-written minimalist newsreader I'm responding to your post with,
and added some example usage code. It should head you in the right
direction, and there's no advanced python involved here:
--------------------------------------------------------------
from email.parser import FeedParser
from nntplib import NNTP
from rfc822 import mktime_tz, parsedate_tz
class Article:
def __init__(self):
self.num = None
self.subject = None
self.poster = None
self.date = None
self.id = None
self.references = []
self.size = 0
self.lines = 0
self.newsgroups = []
def loadFromOverview(self, overview):
(self.subject, self.poster, self.date, self.id,
self.references, self.size, self.lines) = overview[1:]
try: self.date = mktime_tz(parsedate_tz(self.date))
except ValueError:
print "ERROR in date parsing (%s)" % self.date
self.date = None
return overview[0]
def loadMessage(self, server):
msgparser = FeedParser()
resp, num, id, lines = server.head(self.id)
msgparser.feed('\n'.join(lines)+'\n\n')
resp, num, id, lines = server.body(self.id)
msgparser.feed('\n'.join(lines)+'\n')
self.message = msgparser.close()
server = NNTP('news.gmane.org')
resp, count, first, last, name = server.group('gmane.comp.python.ideas')
resp, headersets = server.xover(str(int(last)-100), last)
articles = []
for h in headersets:
a = Article()
artnum = a.loadFromOverview(h)
articles.append(a)
anarticle = articles[0]
anarticle.loadMessage(server)
print dir(anarticle.message)
for header in anarticle.message.keys():
print "%s: %s" % (header, anarticle.message[header])
--------------------------------------------------------------
Heh, looking at this I remember it is several-years-old code and really
needs to be revisited and updated...so I'm not going to claim
that this is the best code that could be written for this task :)
Oh, and there's more involved in actually printing the headers if you
need to deal with non-ASCII characters ("encoded words") in the headers.
(That's in the docs for the email module, though it took me a bit to
figure out how to do it right.)
--
R. David Murray http://www.bitdance.com
More information about the Python-list
mailing list