OT: why do web BBS's and blogs get so slow?

Sat Jan 31 17:56:15 EST 2004

Lots of times, when a blog (that supports user comments) or a web BBS
gets heavily loaded, it slows down horribly, even when it's hosted at
an ISP on a fast computer with plenty of net bandwidth.  I'm wondering
what those programs are doing, that makes them bog down so badly.
Anyone know what the main bottlenecks are?  I'm just imagining them
doing a bunch of really dumb things.

I'm asking this on clpy because thinking about the problem naturally
made me wonder how I'd write a program like that myself, which of
course would mean using Python.

FWIW, here's how I'd do it:

1) It would be single threaded web server (asyncore, twistedmatrix)
with a select loop talking to a socket, either on port 80 directly, or
to a proxy web server running mod_gzip, SSL, and so forth.

2) It might use MySQL for infrequent operations like user info lookup
at login time or preference updates, but not for frequent operations
like reading and posting messages.  User session info and preferences
would be in ram during a session, in a python dict indexed by a
browser session cookie.

3) The message store would be two files, one for metadata and one for
message text.  Both of these would be mmap'd into memory.  There would
be a fixed length of metadata for each message, so getting the
metadata for message #N would be a single array lookup.  The metadata
would contain the location in the text file where the message text is
and its length, so getting the text would take just one memcpy.  The
box would have enough ram to hold all recently active messages in ram
almost all the time.  Paging to disk is left to the host OS's virtual
memory system.  From the application's point of view everything is
always in ram.  Digging up old messages might take some page faults,
but doing that should be relatively rare.  New messages are always
appended to the files, keeping memory and paging behavior fairly
localized.  There might be a third file for an activity log, which is
append-only (serial access).  Ideally that would be on a separate disk
from the message disk, to reduce head contention.

4) Finding all N messages in an active discussion thread might require
chasing N pointers in the metadata file, but that would usually be at
most a few thousand small lookups, all in ram, and the thread info
would be cached in ram in the application once found.  Fancier disk
structures could speed this up but probably arn't needed.

A site like Slashdot gets maybe 20(?) page views per second at busy
times, and around 10,000 new messages a day of maybe 1 kbyte each, a
mere 10 MB per day of message text, no problem to keep in ram.  I'd
like to think that the above scheme could handle Slashdot's traffic
level pretty easily on one or two typical current PC's with IDE disks,
one PC for the BBS application and the other (if needed) for a proxy
server running gzip and caching static pages.  Am I being naive and/or
missing something important?  Slashdot itself uses a tremendous amount
of hardware by comparison.