Clean the index after purging old versions

Hello, I have a devpi server where we have recently deleted a lot of old package versions, using devpi-client. The data directory now looks like: 7.6 GiB [##########] /.indices 956.5 MiB [# ] .sqlite 461.4 MiB [ ] /+files Is this expected? Is there any command to run to rebuild/purge the index? Thanks, -- Patrick Mézard

On 15 Jan 2020, at 9:00, Patrick Mézard wrote:
Do you have a lot of documentation uploaded? If not, then this seems way too big. Could it be that it is still indexing? Check that first by looking at the status page, top, the log and whether the files in .indices are changing. With recent versions (web >=4 and server 5.x) you can also check the json status output for some internals with curl -H "Accept: application/json" http://.../+status What kind of files are in .indices? For me there is a toc, a writelock and some seg files of varying size for a total of ~450MB. With devpi-web 4.x there is the devpi-clear-search-index command. For older devpi-web there is the --recreate-search-index option for devpi-server. For both you need to shut down devpi-server and start it again after running the command. Regards, Florian Schulze

On 15/01/2020 09:23, Florian Schulze wrote:
Unfortunately, we do not have a lot of documentation, uploaded or not.
I am not sure what to look at on the status page. Recent logs contain: ``` 2020-01-15 08:44:56,813 INFO [NOTI] [Rtx38939] Queuing projects for index update 2020-01-15 09:13:30,074 INFO [NOTI] [Rtx38940] Queuing projects for index update 2020-01-15 09:13:32,336 INFO [IDX] Indexer queue size ~ 1 ``` So I believe it is not indexing massively, just handling new publication generated by our CI. Some information from the status endpoint: ``` "versioninfo": { "devpi-server": "5.2.0", "devpi-web": "4.0.0" }, ... "serial": 38939, "last-commit-timestamp": 1579077895.4344556, "event-serial": 38939, "event-serial-timestamp": 1579077896.8356566, "event-serial-in-sync-at": 1579077896.8370607, "metrics": [ [ "devpi_web_whoosh_index_queue_size", "gauge", 0 ], [ "devpi_web_whoosh_index_error_queue_size", "gauge", 8589 ], ``` The indexing queue looks empty.
What kind of files are in .indices? For me there is a toc, a writelock and some seg files of varying size for a total of ~450MB.
First, there is: 5.3 GiB [ 69.6%] /project.tmp Containing 551945 .ctmp files. I checked yesterday, and they have been modified recently, like this year. In .indices, ignoring project.tmp 551952 .trm 551950 .pst 19 .seg 1 .toc And a project_WRITELOCK Note the server and data directory are fairly old, they have been used for development for at least 5 years. They have been upgraded several times, maybe reverting from one configuration to another. I would not be surprised deprecated files have been left around.
With devpi-web 4.x there is the devpi-clear-search-index command. For older devpi-web there is the --recreate-search-index option for devpi-server. For both you need to shut down devpi-server and start it again after running the command.
Thank you, I will look into that. -- Patrick Mézard

On 15 Jan 2020, at 9:00, Patrick Mézard wrote:
Do you have a lot of documentation uploaded? If not, then this seems way too big. Could it be that it is still indexing? Check that first by looking at the status page, top, the log and whether the files in .indices are changing. With recent versions (web >=4 and server 5.x) you can also check the json status output for some internals with curl -H "Accept: application/json" http://.../+status What kind of files are in .indices? For me there is a toc, a writelock and some seg files of varying size for a total of ~450MB. With devpi-web 4.x there is the devpi-clear-search-index command. For older devpi-web there is the --recreate-search-index option for devpi-server. For both you need to shut down devpi-server and start it again after running the command. Regards, Florian Schulze

On 15/01/2020 09:23, Florian Schulze wrote:
Unfortunately, we do not have a lot of documentation, uploaded or not.
I am not sure what to look at on the status page. Recent logs contain: ``` 2020-01-15 08:44:56,813 INFO [NOTI] [Rtx38939] Queuing projects for index update 2020-01-15 09:13:30,074 INFO [NOTI] [Rtx38940] Queuing projects for index update 2020-01-15 09:13:32,336 INFO [IDX] Indexer queue size ~ 1 ``` So I believe it is not indexing massively, just handling new publication generated by our CI. Some information from the status endpoint: ``` "versioninfo": { "devpi-server": "5.2.0", "devpi-web": "4.0.0" }, ... "serial": 38939, "last-commit-timestamp": 1579077895.4344556, "event-serial": 38939, "event-serial-timestamp": 1579077896.8356566, "event-serial-in-sync-at": 1579077896.8370607, "metrics": [ [ "devpi_web_whoosh_index_queue_size", "gauge", 0 ], [ "devpi_web_whoosh_index_error_queue_size", "gauge", 8589 ], ``` The indexing queue looks empty.
What kind of files are in .indices? For me there is a toc, a writelock and some seg files of varying size for a total of ~450MB.
First, there is: 5.3 GiB [ 69.6%] /project.tmp Containing 551945 .ctmp files. I checked yesterday, and they have been modified recently, like this year. In .indices, ignoring project.tmp 551952 .trm 551950 .pst 19 .seg 1 .toc And a project_WRITELOCK Note the server and data directory are fairly old, they have been used for development for at least 5 years. They have been upgraded several times, maybe reverting from one configuration to another. I would not be surprised deprecated files have been left around.
With devpi-web 4.x there is the devpi-clear-search-index command. For older devpi-web there is the --recreate-search-index option for devpi-server. For both you need to shut down devpi-server and start it again after running the command.
Thank you, I will look into that. -- Patrick Mézard
participants (2)
-
Florian Schulze
-
Patrick Mézard