[Mailman-Users] Mailman/htdig problem

Richard Barrett R.Barrett at ftel.co.uk
Wed Apr 30 16:31:07 CEST 2003


At 14:06 30/04/2003, Timothy Arnold wrote:
>Hi Richard,
>
>Thanks for this. I followed the instructions detailed in Operational
>Information in the install notes and it now had created the indicies.
>However, mmsearch still fails when I try to conduct a search.

First thing to check is the $prefix/archives/private/<listname>/htdig directory

If it looks something like this then nightly_htdig's execution of rundig 
worked approximately as planned:

mailman at mailman2:/mailman/run/archives/private/rbtest2/htdig> ls
db.docdb       db.wordlist  rbtest2.conf
db.docs.index  db.words.db  rundig_last_run

The db.* files are generated by the htdig programs called by rundig. If 
these are missing then something went wrong with rundig's execution.

The rundig_last_run file is a zero length file whose modification time is 
set by nightly_htdig.

the <listname>.conf is the per list htdig conf file.

There should also by a symlink to <listname>.conf in the directory 
$prefix/archives/htdig/

You didn't say whether the list's TOC page has an index updated date/time 
in it after running nightly_htdig. If nightly_htdig worked correctly the 
date/time group on the list TOC page should match the modification time on 
rundig_last_run for the list. If the TOC page hasn't been updated then 
rundig did not run cleanly and an error should have been printed out by 
nightly_htdig.

Things you can try:

1. try running nightly_htdig from the command line with the -v (verbose) 
option. This should print out the name of each list and whether or not it 
is running rundig. If it says it is skipping a list because of no recent 
posts this is because the modification time of a list's rundig_last_run is 
after the last recorded post to the list.

2. you can "force" nightly_htdig to run for a list by deleting the list's 
$prefix/archives/private/<listname>/htdig/rundig_last_run file.

Is anything unexpected being output when you run nightly_htdig with the -v 
option?

If you are sure you have everything configured correctly then we can try 
and explore how mmsearch is running htsearch by proceeding as follows:

1. create a small test script, putting it anywhere convenient and making 
sure that the user/group your web server runs as can access and execute the 
script:

test.py --------------------------------------
#! /usr/local/bin/python
# make sure the line above has the correct path for you python installation

import cgi
cgi.test()

end of test.py -------------------------------

2. in your mm_cfg file, temporarily substitute the path to this test.py 
script as the value of HTDIG_HTSEARCH_PATH

3. do a search from the search form for a list which you believe has all 
its indexes etc. You should get back a web page telling you about the 
environment that test.py found itself in when it was run by mmsearch; the 
purpose is to veryify that htsearch will find itself in the correct 
environment when it is run. Some things of interest are near top of the web 
page returned. For example, I get the following on my test system; the 
MiniFieldStorage tells me the names and values of the query; the config 
field should contain the name of the list being searched and the words 
field should contain what you entered in the search form. The CONFIG_DIR 
value should point to $prefix/archives/htdig/.

extract for page retuned by test.py----------------------

Command Line Arguments:
['/usr/local/httpd/cgi-bin/test.py']
Form Contents:
config: <type 'instance'>
MiniFieldStorage('config', 'rbtest2')
format: <type 'instance'>
MiniFieldStorage('format', 'short')
method: <type 'instance'>
MiniFieldStorage('method', 'and')
sort: <type 'instance'>
MiniFieldStorage('sort', 'score')
words: <type 'instance'>
MiniFieldStorage('words', 'htdig search for words)
Shell Environment:
CONFIG_DIR

/mailman/run/archives/htdig
end extract for page returned by test.py------------------

If that gives the expected results then mmsearch is doing the right thing 
but htsearch is failing for some reason I've not yet seen. .

The exit status being returned by htsearch is the one shown in MM's 
$prefix/logs/error entry written when the failure occurs. What value are 
you seeing?

What version of htdig are you using? The htdig integration patches are 
tested using htdig 3.1.6 stable, not the 3.2 beta, although there is no 
particular reason that 3.2 should not work but some pople have reported 
problems, which are not MM specific.

Try the above and get back to me.


>Mmsearch reports:
>
>Htdgi Archives Access Failure
>
>Search failed -12-
>
>The messages that I reported last time are still being reported in the
>syslog. Any ideas what -12- means?

The mmsearch can fail for a number of reason, which are numbered 1, 2 ...

Error 12 is when execution of htsearch fails.

If things are working as planned users should not see these cryptic 
diagnostics. They just tie the failure, definitively, to a point in the 
mmsearch source code.

>Thanks,
>Tim.
>
>-----Original Message-----
>From: Richard Barrett [mailto:R.Barrett at ftel.co.uk]
>Sent: 30 April 2003 13:51
>To: Timothy Arnold; mailman-users at python.org
>Subject: Re: [Mailman-Users] Mailman/htdig problem
>
>
>At 11:18 30/04/2003, Timothy Arnold wrote:
> >Hiya,
> >
> >I have recently installed mailman 2.1.2 with the relevent htdig patches
> >and I still cannot get it to work.
> >
> >For a given list, the search box is displayed and the configuration
> >file for the list is generated but the search funtionality doesn't work
> >- I look in the error log and I find
> >
> >Apr 30 10:56:39 2003 (12488) htsearch for list: test5, existatus: 1 Apr
> >30 10:56:39 2003 (12488) htsearch for list: test5, cause: htsearch,
> >detail: -12-
> >
> >On the archive page, it reports that the search index has yet to be
> >built - do I need to run nightly_htdig for each list when it is
> >created?
>
>Until nightly_htdig is first run there will be no per-list htdig search
>indexes for htsearch to use. It is usually a good idea to send an initial
>message to a list which is being archived, which will cause htdig to be
>setup for the list (per-list htdig.conf created and search form added to
>the list's TOC page), and then run nightly_htdig from the command line (as
>the mailman uid) to generate an initial set of htdig indexes for the list.
>
>Just copy the command for running nightly_htdig in the modified mailman
>crontab generated by the htdig integration patch
>
>Bear in mind that, in common with most search engines, htdig does not
>search the web pages when a user submits a search query; rather some part
>of the search engine's suite of programs (rundig in the case of htdig)
>builds search indexes for the material and it is from these indexes that a
>user's search query is satisfied. That is also why the search form has the
>date/time indexing was last run for the list embedded in it with the
>caution that more recent archive material will not be found.
>
>btw: this issue is covered in the third paragraph under the heading
>"Operational Information" in INSTALL.htdig-mm.
>
> >Thanks,
> >Tim.
> >
> >--
> >Timothy Arnold, Server & Network Infrastructure Support Officer,
> >Internet Services,
> >Becta, Coventry, CV4 7JJ, UK                      Voice: +44 24 7684 7169
> >email: timothy.arnold at becta.org.uk                Fax:   +44 24 7641 1418
> >
> >
> >**********************************************************************
> >This email and any files transmitted with it are confidential and
> >intended solely for the use of the individual or entity to whom they
> >are addressed. If you have received this email in error please notify
> >the system manager. This footnote also confirms that this email message
> >has been swept by MIMEsweeper for the presence of computer viruses.
> >www.mimesweeper.com
> >**********************************************************************
>
>
>**********************************************************************
>This email and any files transmitted with it are confidential and
>intended solely for the use of the individual or entity to whom they
>are addressed. If you have received this email in error please notify
>the system manager.
>This footnote also confirms that this email message has been swept by
>MIMEsweeper for the presence of computer viruses.
>www.mimesweeper.com
>**********************************************************************





More information about the Mailman-Users mailing list