On September 10, 2015 at 8:48:05 AM, David Wilson (firstname.lastname@example.org) wrote:
On Thu, Sep 10, 2015 at 03:07:14PM +0300, Ionel Cristian Mărieș wrote:
Wouldn't it be better if you'd just build an external search service? Getting a list of packages and descriptions should be possible no? (just asking, not 100% sure)
That would be the idea. In fact preferably not build a service at all, just pay someone $50/mo for hosted ElasticSearch, rip out the guts of the old thing and write a small sync cron job similar to the one existing in the Bitbucket repo I linked.
The old PostgreSQL based system has been gone for awhile, and we already have ElasticSearch with a small cron job that runs every 3 hours to index the data.
When we moved the database to Heroku this cronjob started taking 6+ hours to complete, because we were fetching data in too small of chunks which didn't actually hurt when the script and the database were running close to each other. That got "fixed" a day or two ago by increasing the size of the chunks we pulled from 1000 to 10000 and by switching to a SERIALIZABLE READ ONLY DEFERRABLE transaction so that we only needed to hold open a lock right at the very beginning which has the job finishing in 40 minutes now. I suspect further enhancements to the indexing speed will require locating the script in EC2 to get it closer to the PostgreSQL instance.
Given that these problems seem to be *new* since the move of the database to Heroku, I don't think the shape of our data in Elasticsearch nor the actual query we're using which hasn't changed should be at fault, so I've been trying to figure out what else we might have changed in the transition that would have caused it.
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA