[Mailman-Users] Trouble getting htdig to work

Richard Barrett r.barrett at openinfo.demon.co.uk
Sat Feb 8 19:23:00 CET 2003

At 16:53 08/02/2003, Paul Kleeberg wrote:
>I am attempting to get htdig to work on a RedHat 8.0 system with Apache 
>2.0, htdig 3.2.0 and Mailman 2.1
>I installed the 4 patches (668685, 661138, 444879 & 444884) to Mailman 2.1 
>to create the searchable archives for Mailman with htdig, and then 
>reinstalled mailman.  Created the link:
>   ln -s /var/mailman/archives/htdig /etc/htdig-mailman

As long as htidg was configured with /etc as the default directory to 
contain htdig configuration files this should be OK.

>and in mm_cfg.py, to make this compatible with RH8 I added:
>  HTDIG_RUNDIG_PATH  = '/usr/bin/rundig'

If that's where the Redhat RPM installed rundig, that's OK

>Then added:
>  USE_HTDIG = 1
>to mm_cfg.py
>and then ran the indexing engine:
>   /var/mailman/cron/nightly_htdig -v
>and I get:
>   /usr/bin/rundig: line 48:  1104 Aborted       $BINDIR/htnotify $opts
>   htfuzzy: Unable to open word database /var/lib/htdig/db.words.db
>but I would think htfuzzy should look in:
>   /var/mailman/archives/private/<listname>/htdig/db.words.db

Have you checked out the section under heading "htdig Permissions 
Considerations" in the file INSTALL.htdig-mm which patch 444884 installs in 
$build? Some of the htdig 'databases' generated by the components called by 
rundig can be safely shared between lists while others need to be list 
specific to avoid information leakage from one list's indexes into another's.

<quote from=INSTALL.htdig-mm>
htdig Permissions Considerations

Python scripts added by this patch (nightly_htdig and its relatives) run
the htdig rundig script identified by HTDIG_RUNDIG_PATH to build search
indices for Mailman archives. Code added by this patch generates per
list htdig configuration files which are passed as a parameter to the
rundig script. These configuration files identify a list specific
directory ($prefix/archives/private/<listname>/htdig) in which list
specific data files generated by and used by htdig are to be placed.

However, the rundig script identified by HTDIG_RUNDIG_PATH may attempt
to generate some files in htdig's COMMON_DIR when it is first run by
nightly_htdig; the files concerned are likely to be root2word.db,
word2root.db, synonyms.db and possibly some others generated by htidg's
htfuzzy program. The standard rundig script generates these files
selectively if they do not already exist. Depending on how you have
installed htdig and how the rundig script is first run, there may be a
permissions problem when nightly_hdig executes rundig under the mailman
UID if it tries to generate these files.

Basically you may have to change permssions over the htdig common directory 
. For instance on my internal test system I have the following setup:

mailman at mailman2:/opt/www/htdig> ls -l
total 16
drwxr-xr-x    2 root     root         4096 Jan 13 16:28 bin
drwxrwxr-x    2 root     mailman      4096 Jan 14 17:19 common
drwxr-xr-x    2 root     root         4096 Jan 14 17:22 conf
drwxrwxr-x    2 root     mailman      4096 Jan 14 17:19 db
mailman at mailman2:/opt/www/htdig> ls -l db
total 0
mailman at mailman2:/opt/www/htdig> ls -l common/
total 6248
-rw-rw-r--    1 root     mailman        84 Jan 13 16:28 bad_words
-rw-rw-r--    1 root     mailman    923308 Jan 13 16:28 english.0
-rw-rw-r--    1 root     mailman      5756 Jan 13 16:28 english.aff
-rw-rw-r--    1 root     mailman       190 Jan 13 16:28 footer.html
-rw-rw-r--    1 root     mailman       877 Jan 13 16:28 header.html
-rw-rw-r--    1 root     mailman       194 Jan 13 16:28 long.html
-rw-rw-r--    1 root     mailman      1390 Jan 13 16:28 nomatch.html
-rw-rw-r--    1 mailman  mailman   2285568 Jan 14 17:19 root2word.db
-rw-rw-r--    1 root     mailman        67 Jan 13 16:28 short.html
-rw-rw-r--    1 root     mailman     14481 Jan 13 16:28 synonyms
-rw-rw-r--    1 mailman  mailman     90112 Jan 14 17:19 synonyms.db
-rw-rw-r--    1 root     mailman      1261 Jan 13 16:28 syntax.html
-rw-rw-r--    1 mailman  mailman   3022848 Jan 14 17:19 word2root.db
-rw-rw-r--    1 root     mailman      1087 Jan 13 16:28 wrapper.html
mailman at mailman2:/opt/www/htdig>

As you can see 3 of the files in common were written by the mailman userid 
when nightly_htdig first ran rundig. You will have to tweak things to suit 
your htdig installation setup.

>In addition, when I look at the source for the search form on an archive 
>page I see <form method="post" action="/cgi-bin/htsearch">. But on my 
>system, htsearch exists in /usr/bin.

The htsearch program has to be available to the web server in a directory 
from which the server is prepared to run cgi programs.

Remember that execution of htdig's components is in two parts. The indexing 
of the material is typically done by a cron script running htidg components 
as one some user id from whatever was set up as htdig's bin directory, for 
example /opt/www/htdig/bin.

The 'search' operation, i.e. looking up stuff in the search indexes, using 
htsearch is run as a cgi-bin script under the auspices of the User/Group 
your web server is configured to run as.

I think it is usual for the htdig installation process to involve copying 
htsearch into the cgi-bin directory in whatever is configured by the 
ServerRoot directive in your web server's httpd.conf file.

Personally, I build htdig from source and in any event run SuSe Linux so I 
do not know how the Redhat RPMs have been configured.

If all else fails, as root, copy htsearch into the web server's cgi-bin 
directory and make sure that it readable and excutable but not writable by 
owner, group and other.

>What am I overlooking?
>Paul Kleeberg
>paul at fpen.org

Let me know if you continue to have problems.

More information about the Mailman-Users mailing list