how to debug Hyperkitty Django Q tasks?

Hi Mailman developers!
We faced an issue with Hyperkitty Django Q tasks and we now trying to deep dive into the technology.
The issue was initially noticed by a very high load of CPU when running periodic tasks.
There are no errors in any log files (mailmanweb.log and uwsgi-*.log), but if check the web URL http://lists.server.tld/admin/django_q/success/ - there were no successful tasks for a month (all other pages in DJANGO Q section are also empty).
We now trying to debug the issue but it seems that the errors which occur in the tasks are not logged anywhere.
Adding "DEBUG = True" into the settings_local.py changes nothing and it's not clear how to change the "verbosity" parameter used here: https://gitlab.com/mailman/hyperkitty/-/blob/master/hyperkitty/search_indexe...
Do you have any advice on how to debug those tasks?
I appreciate any help you can provide!
With kind regards, Danil Smirnov

Danil Smirnov writes:
The issue was initially noticed by a very high load of CPU when running periodic tasks.
I may have seen this but no longer have access to the system where it happened, and there were other issues so I have no confidence that the high loads were due to the problem described below. This was during the initial import of a very large number (~5000) of archives, and running mailmanweb update_index for the whole thing. The problem was that with qcluster running, the whole archive indexing and the periodic indexing stepped on each other's locks.
We solved this by telling people archive search would be available randomly over the next few days (ie, when the archiving process completed that list's archive ;-) and stopping the periodic process (just renaming the index task in qcluster's queue does this) before running the index everything task..
I suggest doing this to get your archives close to up to date so the periodic indexing task doesn't get overwhelmed and fail to complete with its interval. I have no confidence this will solve your problem completely, but the indexing task may get in the way of solving it.
I also recommend getting in touch with the Django community as they're more likely to know things like how to set logging verbosity in these low-level components.
Steve

Thank you Steve for your comments, but I am still interested in knowing how Mailman developers would debug issues with Hyperkitty periodic tasks.
If no logging is possible for those tasks, I'd consider this as an issue to fix. But I'm not a Python developer and looking at the application from the infrastructure side. So I'm not sure if it's possible or not and if is it an issue or not. I appreciate any clarification you can provide.
Danil
On Fri, Sep 29, 2023 at 6:30 PM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

Danil Smirnov writes:
I'd start by checking if the relevant process (probably qcluster) is actually running -- in my experience the most common reason for ZERO logs is the process isn't running.
I'll tell you what I'd do after that *because* a developer should track this stuff down, but if everything failed, purely as a matter of debugging I'd just change the default in the update_index function to verbosity=1, then change it back (or to 2, etc) once you have whatever information that gives you. That's what I recommend you do once you have confirmed the process is running.
Then while waiting on the user channels I'd grep for 'search_indexes' in import statements to see where search_index.update_index is called to see how that 'verbosity' parameter gets set. It may be ignored (defaulted to 0) or hardcoded, but there may also be a option that can be set in the LOGGING parameter of settings.py.
If that doesn't work, I'd trace back through the imports and scan source to try to find alternative logging options for django_haystack, since you're not actually that interested in the HyperKitty function which does no real work.
The thing is that phrase "no real work". Even the verbosity parameter delegates the logging to third-party code. The work here is being done by third-party code in Django and django_haystack, plus whatever indexer you're using (Whoosh or Xapian). Probably all of these have many options for logging.
If no logging is possible for those tasks, I'd consider this as an issue to fix.
Sure, but it's going to be a "patches welcome" HyperKitty issue because all HyperKitty can tell you is the command was called. Haystack can tell you that too, and a lot more. django_q (which calls the HyperKitty tasks) probably can too.
OTOH, we should document third-party logging better. Note, that doesn't require Mailman developers, non-developer contributors can do that too.

Danil Smirnov writes:
The issue was initially noticed by a very high load of CPU when running periodic tasks.
I may have seen this but no longer have access to the system where it happened, and there were other issues so I have no confidence that the high loads were due to the problem described below. This was during the initial import of a very large number (~5000) of archives, and running mailmanweb update_index for the whole thing. The problem was that with qcluster running, the whole archive indexing and the periodic indexing stepped on each other's locks.
We solved this by telling people archive search would be available randomly over the next few days (ie, when the archiving process completed that list's archive ;-) and stopping the periodic process (just renaming the index task in qcluster's queue does this) before running the index everything task..
I suggest doing this to get your archives close to up to date so the periodic indexing task doesn't get overwhelmed and fail to complete with its interval. I have no confidence this will solve your problem completely, but the indexing task may get in the way of solving it.
I also recommend getting in touch with the Django community as they're more likely to know things like how to set logging verbosity in these low-level components.
Steve

Thank you Steve for your comments, but I am still interested in knowing how Mailman developers would debug issues with Hyperkitty periodic tasks.
If no logging is possible for those tasks, I'd consider this as an issue to fix. But I'm not a Python developer and looking at the application from the infrastructure side. So I'm not sure if it's possible or not and if is it an issue or not. I appreciate any clarification you can provide.
Danil
On Fri, Sep 29, 2023 at 6:30 PM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

Danil Smirnov writes:
I'd start by checking if the relevant process (probably qcluster) is actually running -- in my experience the most common reason for ZERO logs is the process isn't running.
I'll tell you what I'd do after that *because* a developer should track this stuff down, but if everything failed, purely as a matter of debugging I'd just change the default in the update_index function to verbosity=1, then change it back (or to 2, etc) once you have whatever information that gives you. That's what I recommend you do once you have confirmed the process is running.
Then while waiting on the user channels I'd grep for 'search_indexes' in import statements to see where search_index.update_index is called to see how that 'verbosity' parameter gets set. It may be ignored (defaulted to 0) or hardcoded, but there may also be a option that can be set in the LOGGING parameter of settings.py.
If that doesn't work, I'd trace back through the imports and scan source to try to find alternative logging options for django_haystack, since you're not actually that interested in the HyperKitty function which does no real work.
The thing is that phrase "no real work". Even the verbosity parameter delegates the logging to third-party code. The work here is being done by third-party code in Django and django_haystack, plus whatever indexer you're using (Whoosh or Xapian). Probably all of these have many options for logging.
If no logging is possible for those tasks, I'd consider this as an issue to fix.
Sure, but it's going to be a "patches welcome" HyperKitty issue because all HyperKitty can tell you is the command was called. Haystack can tell you that too, and a lot more. django_q (which calls the HyperKitty tasks) probably can too.
OTOH, we should document third-party logging better. Note, that doesn't require Mailman developers, non-developer contributors can do that too.
participants (2)
-
Danil Smirnov
-
Stephen J. Turnbull