htdig-mailman patch(es) - blank page results only so far

Hiya Mailman Users,
This list is so trafficked and since this question is mostly regarding the htdig-mailman patches, it took a big puzzle for me to actually sign up and post :)
I am running Mailman 2.1.12 with the following 2.1.12 patches applied (archiver index control/htdig integration)
http://www.openinfo.co.uk/mm/patches/444879/index.html
http://www.openinfo.co.uk/mm/patches/444884/index.html
My method of building and build results were a little rockier than I had hoped for. I utilize FC10 systems mainly and so I wanted to make an FC12 RPM with the patches. I found that the 'Fedora way' of building mailman involved a fairly complex set of patches already, and it would be a lot of effort to find out through relative set of diff and patch to noodle out how to package this in a Fedora manner, so I followed this course of action:
Changed a FC10 mailman 2.1.11 spec to only patch these two patches, removing all Fedora specific patches. Changed some symlink behavior to preserve the precious /etc/mailman link *to* a /var/lib/mailman/data/sitelist.cfg target (the FC10 makes the symlink the other direction) Substituted a different init script (the FC10 /etc/init.d/mailman), which does an 'install' of the cron script and then a 'python mailmanctl -s -q start' to start:
The configure options for the FC10 spec file looks like so (mmdir=/usr/lib/mailman):
./configure --libdir=%{_libdir} --prefix=%{mmdir} --with-var-prefix=%{varmmdir} --with-config-dir=%{configdir} --with-lock-dir=%{lockdir} --with-log-dir=%{logdir}
--with-pid-dir=%{piddir} --with-queue-dir=%{queuedir} --with-python=%{__python} --with-mail-gid=%{mailgroup} --with-cgi-id=%{cgiuser}
--with-cgi-gid=%{cgigroup} --with-mailhost=localhost.localdomain --with-urlhost=localhost.localdomain --without-permcheck
So after I got it built I had missed a few things:
The %{mmdir}/archives/htdig folder needed to be created to match my mm_cfg.py goodies: USE_HTDIG = 1 HTDIG_HTSEARCH_PATH = '/usr/bin/htsearch' HTDIG_RUNDIG_PATH = '/usr/bin/rundig' HTDIG_CONF_LINK_DIR = '/var/lib/mailman/archives/htdig' OWNERS_CAN_DELETE_THEIR_OWN_LISTS = Yes MTA = 'Postfix' SHORTCUT_ICON = 'xxxxx.png' WEB_HEADER_COLOR = '#3399FF'
Then I ran this to create indices (and htdig dbs/symlinks):
#!/bin/bash
LISTS=ls /var/lib/mailman/lists
for list in ${LISTS[@]};do
/usr/lib/mailman/bin/arch $list
done
I made a symlink from /var/www/htdig where FC10 puts htdig (3.2.0-0.3.b6.fc10) common files /usr/share/htdig and added to my VirtualHost Apache definition: Alias /htdig/ /usr/share/htdig/
Then ran '/usr/bin/python -v /usr/lib/mailman/cron/nightly_htdig' and great, I was up and had a per list (even private) search form with all my search fields available.
I then went to do a search and all I got back was a blank page, no errors in the mailman log nor the Apache error log.
I tried passing mail through a public and private list to see if that would 'prime the pump' as it were, but to no avail.
Greatly appreciate any insight the group can offer here!
Thanks! Jeremy Capps

Capps, John M wrote:
I don't know what else there is, but John Dennis' original RedHat FHS patch is at <http://mail.python.org/pipermail/mailman-developers/2004-October/017343.html>.
Changed a FC10 mailman 2.1.11 spec to only patch these two patches, removing all Fedora specific patches. Changed some symlink behavior to preserve the precious /etc/mailman link *to* a /var/lib/mailman/data/sitelist.cfg target (the FC10 makes the symlink the other direction)
Note that sitelist.cfg is not actually used for anything by Mailman. It is intended ONLY as suggested input to bin/config_list for configuring the 'mailman' site list since the default new list configuration is probably not appropriate for that list.
Some of those config options rely on RedHat patches to configure.
You probably should have included the --wipe option to bin/arch and a more robust script is
#!/bin/bash for list in $(/usr/lib/mailman/bin/list_lists --bare);do /usr/lib/mailman/bin/arch --wipe $list done
The action for the search form should be to post to a url like http://www.example.com/mailman/mmsearch/listname.
If you just go to that URL in a browser, you should get a response like:
htdig Archives Access Failure CGI problem. -5-Field count -4- fields:
If you want to make another attempt to access a list archive then go via the list users information page.
If this problem persists then please e-mail the following information to the mailman@example.com:
Referer not known
/mailman/mmsearch/listname
If that is all working correctly, the problem is in htdig. mmsearch just sets CONFIG_DIR in the environment to your HTDIG_CONF_LINK_DIR setting and then opens a pipe to and from the command in your HTDIG_HTSEARCH_PATH setting, writes the search parameters to the pipe and reads and displays the result.
It detects a bad status and a null response. It logs a bad status in Mailman's error log and should display either an error message or the non-null response.
What happens if you run '/usr/bin/htsearch' by hand?
I tried passing mail through a public and private list to see if that would 'prime the pump' as it were, but to no avail.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Thanks Mark,
Yeah, the FHS patch is there, and you are indeed correct there are options missing without this PATCH applied, options like lock dir, etc... Not sure which one may be causing an issue with htdig, but will make the attempt to at least get that patch in conjunction with the indexing and the htdig patch.
Other patches in addition to FHS are (ctypo,selinux,Unicode,privurl,mmcfg,fhsinit).
Thanks for the --wipe and scripting suggestion, seems more appropriate indeed.
The url http://mailmansite.com/mailman/mmsearch/listname does in fact produce the desired result (referrer not known).
<snip> If that is all working correctly, the problem is in htdig. mmsearch just sets CONFIG_DIR in the environment to your HTDIG_CONF_LINK_DIR setting and then opens a pipe to and from the command in your HTDIG_HTSEARCH_PATH setting, writes the search parameters to the pipe and reads and displays the result. ... What happens if you run '/usr/bin/htsearch' by hand? <snip>
If I run it from bash, specifying a config file with -c in the config directory, it also returns blank results (specifying text format).
Like So: /usr/bin/htsearch -c /var/lib/mailman/archives/htdig/listname.conf Enter value for words: recentword Content-type: text/html
Enter value for format: text #
Could it be the config file written?
<snip> database_dir: /var/lib/mailman/archives/private/listname/htdig start_url: http://mailmansite.com/mailman/htdig/listname/ limit_urls_to: ${start_url} local_urls: http://mailmansite.com/mailman/htdig/listname/=/var/lib/mailman/archives/pri... local_urls_only: true url_part_aliases: http://mailmansite.com/mailman/htdig/listname/ *mm-htdig* script_name: http://mailmansite.com/mailman/mmsearch/bta_developers noindex_end: <!--/htdig_noindex--> noindex_start: <!--htdig_noindex--> exclude_urls: /cgi-bin/ .cgi <snip> (the following template content seems appropriate)
Well, I might go back to patch fun, but you do seem to have narrowed the problem down to htdig itself... perhaps some way it is creating the dbs?
I notice that the /var/lib/mailman/archives/private/listname/htdig directory does not have permissions appropriate for the Apache host to access....
Thanks! Jeremy (John) Capps Software Engineer DHG - Consumer Electronics -----Original Message----- From: Mark Sapiro [mailto:mark@msapiro.net] Sent: Thursday, December 03, 2009 9:25 AM To: Capps, John M; mailman-users@python.org Subject: Re: [Mailman-Users] htdig-mailman patch(es) - blank page results onlyso far
Capps, John M wrote:
I don't know what else there is, but John Dennis' original RedHat FHS patch is at <http://mail.python.org/pipermail/mailman-developers/2004-October/017343.html>.
Changed a FC10 mailman 2.1.11 spec to only patch these two patches, removing all Fedora specific patches. Changed some symlink behavior to preserve the precious /etc/mailman link *to* a /var/lib/mailman/data/sitelist.cfg target (the FC10 makes the symlink the other direction)
Note that sitelist.cfg is not actually used for anything by Mailman. It is intended ONLY as suggested input to bin/config_list for configuring the 'mailman' site list since the default new list configuration is probably not appropriate for that list.
Some of those config options rely on RedHat patches to configure.
You probably should have included the --wipe option to bin/arch and a more robust script is
#!/bin/bash for list in $(/usr/lib/mailman/bin/list_lists --bare);do /usr/lib/mailman/bin/arch --wipe $list done
The action for the search form should be to post to a url like http://www.example.com/mailman/mmsearch/listname.
If you just go to that URL in a browser, you should get a response like:
htdig Archives Access Failure CGI problem. -5-Field count -4- fields:
If you want to make another attempt to access a list archive then go via the list users information page.
If this problem persists then please e-mail the following information to the mailman@example.com:
Referer not known
/mailman/mmsearch/listname
If that is all working correctly, the problem is in htdig. mmsearch just sets CONFIG_DIR in the environment to your HTDIG_CONF_LINK_DIR setting and then opens a pipe to and from the command in your HTDIG_HTSEARCH_PATH setting, writes the search parameters to the pipe and reads and displays the result.
It detects a bad status and a null response. It logs a bad status in Mailman's error log and should display either an error message or the non-null response.
What happens if you run '/usr/bin/htsearch' by hand?
I tried passing mail through a public and private list to see if that would 'prime the pump' as it were, but to no avail.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Capps, John Mwrote:
The url http://mailmansite.com/mailman/mmsearch/listname does in fact produce the desired result (referrer not known).
Just to be clear, it should produce a page like
htdig Archives Access Failure CGI problem. -5-Field count -4- fields:
If you want to make another attempt to access a list archive then go via the list users information page.
If this problem persists then please e-mail the following information to the mailman@example.com:
Referer not known
/mailman/mmsearch/listname
It looks like that's what you got. That says at least Mailman's mmsearch CGI is accessable and probably working.
format should be 'short' or 'long'
What do you get if you enter 'short' instead of 'text'. It probably doesn't matter. You should get an HTML search results page either way.
I assume bta-developers is what is listname elsewhere
As does all the above.
Well, I might go back to patch fun, but you do seem to have narrowed the problem down to htdig itself... perhaps some way it is creating the dbs?
Possibly. Here is the contents of archives/private/mailman/htdig
-rw-rw-r-- 1 mark mailman 24576 Sep 19 03:30 db.docdb -rw-rw-r-- 1 mark mailman 24576 Sep 19 03:30 db.docs.index -rw-rw-r-- 1 mark mailman 49152 Sep 19 03:30 db.excerpts -rw-rw-r-- 1 mark mailman 49152 Sep 19 03:30 db.metaphone.db -rw-rw-r-- 1 mark mailman 49152 Sep 19 03:30 db.soundex.db -rw-rw-r-- 1 mark mailman 126976 Sep 19 03:30 db.words.db -rw-rw-r-- 1 mark mailman 16384 Sep 19 03:30 db.words.db_weakcmpr -rw-rw-r-- 1 root mailman 2070 Dec 26 2008 mailman.conf -rw-rw-r-- 1 mark mailman 0 Sep 19 03:30 rundig_last_run
and here is a larger, more active list
-rw-rw-r-- 1 mark mailman 22503424 Dec 7 03:29 db.docdb -rw-rw-r-- 1 mark mailman 2678784 Dec 7 03:29 db.docs.index -rw-rw-r-- 1 mark mailman 57622528 Dec 7 03:29 db.excerpts -rw-rw-r-- 1 mark mailman 5914624 Dec 7 03:30 db.metaphone.db -rw-rw-r-- 1 mark mailman 5668864 Dec 7 03:30 db.soundex.db -rw-rw-r-- 1 mark mailman 74221568 Dec 7 03:29 db.words.db -rw-rw-r-- 1 mark mailman 16384 Dec 7 03:29 db.words.db_weakcmpr -rw-rw-r-- 1 root mailman 2056 Dec 26 2008 gpc-talk.conf -rw-rw-r-- 1 mark mailman 0 Dec 7 03:30 rundig_last_run
How to those compare with yours?
I notice that the /var/lib/mailman/archives/private/listname/htdig directory does not have permissions appropriate for the Apache host to access....
That's as it should be. The directory should be g+rws and Mailman's group, because the CGIs that access it are run through SETGID wrappers that set Mailman's group. This is true of all Mailman's data.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Capps, John M wrote:
I don't know what else there is, but John Dennis' original RedHat FHS patch is at <http://mail.python.org/pipermail/mailman-developers/2004-October/017343.html>.
Changed a FC10 mailman 2.1.11 spec to only patch these two patches, removing all Fedora specific patches. Changed some symlink behavior to preserve the precious /etc/mailman link *to* a /var/lib/mailman/data/sitelist.cfg target (the FC10 makes the symlink the other direction)
Note that sitelist.cfg is not actually used for anything by Mailman. It is intended ONLY as suggested input to bin/config_list for configuring the 'mailman' site list since the default new list configuration is probably not appropriate for that list.
Some of those config options rely on RedHat patches to configure.
You probably should have included the --wipe option to bin/arch and a more robust script is
#!/bin/bash for list in $(/usr/lib/mailman/bin/list_lists --bare);do /usr/lib/mailman/bin/arch --wipe $list done
The action for the search form should be to post to a url like http://www.example.com/mailman/mmsearch/listname.
If you just go to that URL in a browser, you should get a response like:
htdig Archives Access Failure CGI problem. -5-Field count -4- fields:
If you want to make another attempt to access a list archive then go via the list users information page.
If this problem persists then please e-mail the following information to the mailman@example.com:
Referer not known
/mailman/mmsearch/listname
If that is all working correctly, the problem is in htdig. mmsearch just sets CONFIG_DIR in the environment to your HTDIG_CONF_LINK_DIR setting and then opens a pipe to and from the command in your HTDIG_HTSEARCH_PATH setting, writes the search parameters to the pipe and reads and displays the result.
It detects a bad status and a null response. It logs a bad status in Mailman's error log and should display either an error message or the non-null response.
What happens if you run '/usr/bin/htsearch' by hand?
I tried passing mail through a public and private list to see if that would 'prime the pump' as it were, but to no avail.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Thanks Mark,
Yeah, the FHS patch is there, and you are indeed correct there are options missing without this PATCH applied, options like lock dir, etc... Not sure which one may be causing an issue with htdig, but will make the attempt to at least get that patch in conjunction with the indexing and the htdig patch.
Other patches in addition to FHS are (ctypo,selinux,Unicode,privurl,mmcfg,fhsinit).
Thanks for the --wipe and scripting suggestion, seems more appropriate indeed.
The url http://mailmansite.com/mailman/mmsearch/listname does in fact produce the desired result (referrer not known).
<snip> If that is all working correctly, the problem is in htdig. mmsearch just sets CONFIG_DIR in the environment to your HTDIG_CONF_LINK_DIR setting and then opens a pipe to and from the command in your HTDIG_HTSEARCH_PATH setting, writes the search parameters to the pipe and reads and displays the result. ... What happens if you run '/usr/bin/htsearch' by hand? <snip>
If I run it from bash, specifying a config file with -c in the config directory, it also returns blank results (specifying text format).
Like So: /usr/bin/htsearch -c /var/lib/mailman/archives/htdig/listname.conf Enter value for words: recentword Content-type: text/html
Enter value for format: text #
Could it be the config file written?
<snip> database_dir: /var/lib/mailman/archives/private/listname/htdig start_url: http://mailmansite.com/mailman/htdig/listname/ limit_urls_to: ${start_url} local_urls: http://mailmansite.com/mailman/htdig/listname/=/var/lib/mailman/archives/pri... local_urls_only: true url_part_aliases: http://mailmansite.com/mailman/htdig/listname/ *mm-htdig* script_name: http://mailmansite.com/mailman/mmsearch/bta_developers noindex_end: <!--/htdig_noindex--> noindex_start: <!--htdig_noindex--> exclude_urls: /cgi-bin/ .cgi <snip> (the following template content seems appropriate)
Well, I might go back to patch fun, but you do seem to have narrowed the problem down to htdig itself... perhaps some way it is creating the dbs?
I notice that the /var/lib/mailman/archives/private/listname/htdig directory does not have permissions appropriate for the Apache host to access....
Thanks! Jeremy (John) Capps Software Engineer DHG - Consumer Electronics -----Original Message----- From: Mark Sapiro [mailto:mark@msapiro.net] Sent: Thursday, December 03, 2009 9:25 AM To: Capps, John M; mailman-users@python.org Subject: Re: [Mailman-Users] htdig-mailman patch(es) - blank page results onlyso far
Capps, John M wrote:
I don't know what else there is, but John Dennis' original RedHat FHS patch is at <http://mail.python.org/pipermail/mailman-developers/2004-October/017343.html>.
Changed a FC10 mailman 2.1.11 spec to only patch these two patches, removing all Fedora specific patches. Changed some symlink behavior to preserve the precious /etc/mailman link *to* a /var/lib/mailman/data/sitelist.cfg target (the FC10 makes the symlink the other direction)
Note that sitelist.cfg is not actually used for anything by Mailman. It is intended ONLY as suggested input to bin/config_list for configuring the 'mailman' site list since the default new list configuration is probably not appropriate for that list.
Some of those config options rely on RedHat patches to configure.
You probably should have included the --wipe option to bin/arch and a more robust script is
#!/bin/bash for list in $(/usr/lib/mailman/bin/list_lists --bare);do /usr/lib/mailman/bin/arch --wipe $list done
The action for the search form should be to post to a url like http://www.example.com/mailman/mmsearch/listname.
If you just go to that URL in a browser, you should get a response like:
htdig Archives Access Failure CGI problem. -5-Field count -4- fields:
If you want to make another attempt to access a list archive then go via the list users information page.
If this problem persists then please e-mail the following information to the mailman@example.com:
Referer not known
/mailman/mmsearch/listname
If that is all working correctly, the problem is in htdig. mmsearch just sets CONFIG_DIR in the environment to your HTDIG_CONF_LINK_DIR setting and then opens a pipe to and from the command in your HTDIG_HTSEARCH_PATH setting, writes the search parameters to the pipe and reads and displays the result.
It detects a bad status and a null response. It logs a bad status in Mailman's error log and should display either an error message or the non-null response.
What happens if you run '/usr/bin/htsearch' by hand?
I tried passing mail through a public and private list to see if that would 'prime the pump' as it were, but to no avail.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Capps, John Mwrote:
The url http://mailmansite.com/mailman/mmsearch/listname does in fact produce the desired result (referrer not known).
Just to be clear, it should produce a page like
htdig Archives Access Failure CGI problem. -5-Field count -4- fields:
If you want to make another attempt to access a list archive then go via the list users information page.
If this problem persists then please e-mail the following information to the mailman@example.com:
Referer not known
/mailman/mmsearch/listname
It looks like that's what you got. That says at least Mailman's mmsearch CGI is accessable and probably working.
format should be 'short' or 'long'
What do you get if you enter 'short' instead of 'text'. It probably doesn't matter. You should get an HTML search results page either way.
I assume bta-developers is what is listname elsewhere
As does all the above.
Well, I might go back to patch fun, but you do seem to have narrowed the problem down to htdig itself... perhaps some way it is creating the dbs?
Possibly. Here is the contents of archives/private/mailman/htdig
-rw-rw-r-- 1 mark mailman 24576 Sep 19 03:30 db.docdb -rw-rw-r-- 1 mark mailman 24576 Sep 19 03:30 db.docs.index -rw-rw-r-- 1 mark mailman 49152 Sep 19 03:30 db.excerpts -rw-rw-r-- 1 mark mailman 49152 Sep 19 03:30 db.metaphone.db -rw-rw-r-- 1 mark mailman 49152 Sep 19 03:30 db.soundex.db -rw-rw-r-- 1 mark mailman 126976 Sep 19 03:30 db.words.db -rw-rw-r-- 1 mark mailman 16384 Sep 19 03:30 db.words.db_weakcmpr -rw-rw-r-- 1 root mailman 2070 Dec 26 2008 mailman.conf -rw-rw-r-- 1 mark mailman 0 Sep 19 03:30 rundig_last_run
and here is a larger, more active list
-rw-rw-r-- 1 mark mailman 22503424 Dec 7 03:29 db.docdb -rw-rw-r-- 1 mark mailman 2678784 Dec 7 03:29 db.docs.index -rw-rw-r-- 1 mark mailman 57622528 Dec 7 03:29 db.excerpts -rw-rw-r-- 1 mark mailman 5914624 Dec 7 03:30 db.metaphone.db -rw-rw-r-- 1 mark mailman 5668864 Dec 7 03:30 db.soundex.db -rw-rw-r-- 1 mark mailman 74221568 Dec 7 03:29 db.words.db -rw-rw-r-- 1 mark mailman 16384 Dec 7 03:29 db.words.db_weakcmpr -rw-rw-r-- 1 root mailman 2056 Dec 26 2008 gpc-talk.conf -rw-rw-r-- 1 mark mailman 0 Dec 7 03:30 rundig_last_run
How to those compare with yours?
I notice that the /var/lib/mailman/archives/private/listname/htdig directory does not have permissions appropriate for the Apache host to access....
That's as it should be. The directory should be g+rws and Mailman's group, because the CGIs that access it are run through SETGID wrappers that set Mailman's group. This is true of all Mailman's data.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (2)
-
Capps, John M
-
Mark Sapiro