Performance Issues with Threaded Python Network Server

Tue Jan 15 15:23:50 EST 2002

"Joao Prado Maia" <JMaia at lexgen.com> wrote ...
>
> Hi,
>
> Before even starting with the description of my problem, let me explain my
> situation. I'm trying to create a custom Python based NNTP server to serve
> as a gateway of sorts to some web message boards. This way people can
> connect to the News server and get messages / reply and such.
>
Good idea...

> Please be aware that this is my first real-world python project, so please
> be gentil if you see something stupid in my code ;)
>
Hey, this is c.l.python, not one of those other groups ;-)

> Anyway, I read 'Programming Python' a little bit on the Network
programming
> chapter and decided to use threads (aka SocketServer.ThreadingTCPServer)
on
> my NNTP server. Everything works great but I have been experiencing some
> heavy CPU usage on the server.
>
> A little bit more of information - the heavy CPU load is triggered when
the
> an user tries to download all 1500 messages / articles of one message
board.
> The way NNTP works and the way Outlook Express (the newsreader on this
case)
> works is that it will download all the headers for the articles at once,
and
> then request the actual body of the articles one by one.
>
What news clients do (from actual network traces against my news server) is
a query on the group, which returns a tuple (resp, estimate, first, last,
name), followed by an XOVER, which is documented in the nntplib
documentation. I'm not sure what OE would do if it got no response to an
XOVER, but this is certainly the fastest way to work.

> What this means is that the server will write to the 'wfile' file
descriptor
> to send the response (the headers and bodies of the articles) to the
> newsreader.
>
In the case of an XOVER the news server responds with fairly abbreviated
information about the available articles, the whole point being to remove
the need to transfer all headers and bodies down the wire just so the client
knows what's available. OE (and other news clients) will be fairly
intelligent about not asking for articles it's already got details for.

> The problem here is that whenever this happens, the CPU usage of the NNTP
> server goes to about 35% and continues increasing slowly while the
> newsreader is receiving all the message headers and bodies.
>
Too much data! I'm guessing you aren't using XOVER?

> My guess right now is that I'm doing something bad on the routines that
spit
> the information to the newsreader somehow and because of this the server
is
> consuming a lot of CPU. Since I'm not an expert in the performance tweaks
or
> even about possible bottlenecks, I would love if someone could take a look
> and maybe get some insight into what could be wrong.
>
At the foot of this response is a program I've run against my NNTP server to
download the XOVER data. It's really quick, so I presume your server should
also be if it has the required information in a relational store. Don't know
whether this will help or not, but you can try it just to see whether ig
vies you any useful information.

> The NNTP server gets its information from a MySQL database (and no, MySQL
is
> not the bottleneck as far as I know, since 'top' shows the NNTP server
> consuming 35% of CPU, not MySQL), formats the output by using string
> replacement (aka "%s %s <%s@%s>" % (v,x,z,y)) and writes to the 'wfile'
file
> descriptor.
>
Your do_XOVER code doesn't look outrageously bad.

> The source code for the NNTP server can be found below directly from CVS:
> http://cvs.phpbrasil.com/chora/co.php/papercut/papercut.py?r=1.18
>
> The source code for the MySQL backend interface that does most of the
> formatting of the information coming from the database can also be found
> here:
>
http://cvs.phpbrasil.com/chora/co.php/papercut/backends/phorum_mysql.py?r=1.
> 3
>
> Any insights would be very much appreciated.
>
#!/usr/bin/python

import nntplib, cStringIO, rfc822, sys

SRVR = "xxxxxxxxxxxxxxxxxxxxx"      # Your news server
USER = "xxxxxxxxxxxxxxxxxxxxxxxx"   # Your account name
PASS = "xxxxxxxxx"                  # Your password
newsgroup = "comp.lang.python"      # Group of your choice

def inpdflt(s, d):
    resp = raw_input("%s [%s]: " % (s, d))
    return resp or d

news = nntplib.NNTP(SRVR, user=USER, password=PASS)
resp, estimate, first, last, name = news.group(newsgroup)

#
# The program should track all unseen articles with numbers higher
# than "first". When a formerly missing range is expired by "first"
# moving up the program can stop asking for it.
#
if estimate == '0':
    sys.exit("No messages in "+newsgroup)

print estimate, first, last, name

first =  inpdflt("First", first)
last = inpdflt("Last", last)

artnum = first # better: cached from the previous run

#
# Get (article number, subject, poster, date, id, references, size, lines)
# for each of the articles between first and last
#
xover = news.xover(first, last)

# loop through articles, extracting headers
for x in xover[1]:
 # x == (article number, subject, poster, date, id, references, size, lines)
    try:
        hdrs = news.head(x[0])[3]
        mesg = rfc822.Message(cStringIO.StringIO("\r\n".join(hdrs)))
        print '%s\n+++%s' % (mesg.getheader("from"),
                             mesg.getheader("subject"))
        # Need not now retrieve message just to show poster & subject ...
    except nntplib.NNTPError:
        pass
news.quit()

Hope this helps. If not, you might want to try using the profiler to see
where the execution time is going.

regards
 Steve
--
http://www.holdenweb.com/