Can PyApache slow things down?!

mbf2y at my-dejanews.com mbf2y at my-dejanews.com
Mon Apr 19 18:17:10 EDT 1999


Short version of question:

I have a project where I am using SGMLParser to parse a HTML document I fetch
from another site.  The machine I'm using is slow - a DEC Alpha with (get
this) 20MB of Ram.  (Someone must have canibalized the memory to boost
another machine at some point.	The lack of RAM makes the machine crawl.) 
I'm also sharing this machine with other people.  Bottom line is that my
queries take 5-7 sec of wall-clock time for one query, and 3-5 seconds of
wall-clock time for the other.	In an attempt to speed things up, I
downloaded and built apache with PyApache included.  Added the "AddHandler"
line.  I can tell that PyApache is running properly because using "top" I can
see that "httpd" is the process doing all the work, whereas previously,
"myscript.py" was doing the work.  Problem: I noticed a slowdown in
wall-clock time.  After much pondering, I decided that since my machine is so
low on RAM and the httpd binary nearly doubled in size (to just over a meg),
maybe I'm doing more context switches.	So instead of starting 5 httpd's, I
dropped to 2.

Still, even when I'm the only user on the webserver, the queries were slower
than the "normal" way.	I ended up backing out the change, reverted to the
old apache binary and kept the number of webservers at 2; this did speed
things up a touch (should have thought of that sooner.)  Anyway, I'm
wondering if this is normal, and if not, what could I be doing wrong?  I used
Python 1.5.2b2. However, I also tried this exact same solution with Python
1.5.2 on a machine with more RAM (128MB, but slower chip - an old Sparc 5). 
The scripts ran slower with PyApache than without... this makes no sense to
me as at minimum I should be saving time by having fewer context switches...

Thanks for any help (and if you have an extra second, could you read the
P.S.?) -Fred (I don't ever check dejanews mail... if you want to e-mail me
instead of post here, my address is fred-at-cs-dot-umd-dot-edu)


P.S.  I'm a graduate student working on a project analyzing search engine
usage. What my project does is it presents the user with a type-in box just
like the "real" search engines.  I then take the query and pass it onto the
"real" engine that the user chose (either hotbot or altavista).  I get the
page back from altavista/hotbot and then use a class derived from SGMLParser
to parse the page and extract the hits.  I then present to the user the
hitlist, free from all the advertising junk present on the "real" search
engine sites.  I also have it set up so that whenever the user clicks on a
URL, I write something like "CLICK-ON #47" to a file.  My hope is to analyze
usage patterns and try to generate some sort of metric which can indicate a
user's satisfaction with the hitlist based on the user behavior (what
numbered hits they clicked on, etc.)  I am trying to collect as many query
sessions as I can over the next 2 weeks or so.	If you ever use Altavista or
Hotbot, could you please travel to http://www.cs.umd.edu/~fred/search/ and
bookmark my site?  Then the next time (or few times) you have to run a search
engine query, could you use the site? If you have privacy concerns or want a
more detailed description of the research goals, answers can be found at that
site.

Thanks,
-Fred

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own    




More information about the Python-list mailing list