Getting Newsgroup Headers

R. David Murray rdmurray at bitdance.com
Thu Apr 16 22:13:24 EDT 2009


aslkoi fdsda <pythonzine at gmail.com> wrote:
> I would like to read just the headers out of a newsgroup.
> Being a Python newbie, I was wondering if this is possible and how difficult
> it would be for a novice Python programmer.
> Thanks for any reply!
> [HTML part not displayed]

It's not hard at all.  I've pulled some bits and pieces out of the
self-written minimalist newsreader I'm responding to your post with,
and added some example usage code.  It should head you in the right
direction, and there's no advanced python involved here:

--------------------------------------------------------------
from email.parser import FeedParser
from nntplib import NNTP
from rfc822 import mktime_tz, parsedate_tz

class Article:

    def __init__(self):
        self.num = None
        self.subject = None
        self.poster = None
        self.date = None
        self.id = None
        self.references = []
        self.size = 0
        self.lines = 0
        self.newsgroups = []

    def loadFromOverview(self, overview):
        (self.subject, self.poster, self.date, self.id,
            self.references, self.size, self.lines) = overview[1:]
        try: self.date = mktime_tz(parsedate_tz(self.date))
        except ValueError:
            print "ERROR in date parsing (%s)" % self.date
            self.date = None
        return overview[0]


    def loadMessage(self, server):
        msgparser = FeedParser()
        resp, num, id, lines = server.head(self.id)
        msgparser.feed('\n'.join(lines)+'\n\n')
        resp, num, id, lines = server.body(self.id)
        msgparser.feed('\n'.join(lines)+'\n')
        self.message = msgparser.close()



server = NNTP('news.gmane.org')
resp, count, first, last, name = server.group('gmane.comp.python.ideas')
resp, headersets = server.xover(str(int(last)-100), last)
articles = []
for h in headersets:
    a = Article()
    artnum = a.loadFromOverview(h)
    articles.append(a)

anarticle = articles[0]
anarticle.loadMessage(server)
print dir(anarticle.message)
for header in anarticle.message.keys():
    print "%s: %s" % (header, anarticle.message[header])

--------------------------------------------------------------

Heh, looking at this I remember it is several-years-old code and really
needs to be revisited and updated...so I'm not going to claim
that this is the best code that could be written for this task :)

Oh, and there's more involved in actually printing the headers if you
need to deal with non-ASCII characters ("encoded words") in the headers.
(That's in the docs for the email module, though it took me a bit to
figure out how to do it right.)

--
R. David Murray             http://www.bitdance.com




More information about the Python-list mailing list