Hi there,
My 2.5 year old offer to retrofit the old codebase with a new search system still stands[1]. :) There is no reason for this to be a complex affair, the prototype built back then took only a few hours to complete.
No doubt the long term answer is probably "Warehouse fixes this", but Warehouse seems no nearer a reality than it did in 2013.
David
[1] https://groups.google.com/forum/#!search/%22david$20wilson%22$20search$20pyp...
On Thu, Sep 10, 2015 at 12:35:04AM +0200, Giovanni Cannata wrote:
Hi, sorry to bother you again, but the search problem on PyPI is still present after different weeks and it's very annoying. I've just released a new version of my ldap3 project and it doesn't show up when searching with its name. For mine (and I suppose for other emerging project, especially related to Python 3) it's vital to be easily found by other developers that use pip and PyPI as THE only repository for python packages and using the number of download as a ranking of popularity of a project.
If search can't be fixed there should be at least a warning on the PyPI homepage to let users know that search is broken and that using Google for searching could help to find more packages.
Bye, Giovanni
Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Wouldn't it be better if you'd just build an external search service? Getting a list of packages and descriptions should be possible no? (just asking, not 100% sure)
I doubt the maintainers are just going to come out and say "ok, this guy has waited long enough, lets take his contribution in". If they didn't care about the search 2.5 years ago why would they care now.
Sorry for being snide here but my impression is that Warehouse could had been shipped a while ago instead of getting rewritten s everal times. I'm not saying that's bad, it's just that there's a mismatch in goals here.
Thanks, -- Ionel Cristian Mărieș
On Thu, Sep 10, 2015 at 2:01 AM, David Wilson dw+python-ideas@hmmz.org wrote:
Hi there,
My 2.5 year old offer to retrofit the old codebase with a new search system still stands[1]. :) There is no reason for this to be a complex affair, the prototype built back then took only a few hours to complete.
No doubt the long term answer is probably "Warehouse fixes this", but Warehouse seems no nearer a reality than it did in 2013.
David
[1] https://groups.google.com/forum/#!search/%22david$20wilson%22$20search$20pyp...
On Thu, Sep 10, 2015 at 12:35:04AM +0200, Giovanni Cannata wrote:
Hi, sorry to bother you again, but the search problem on PyPI is still
present
after different weeks and it's very annoying. I've just released a new
version
of my ldap3 project and it doesn't show up when searching with its name.
For
mine (and I suppose for other emerging project, especially related to
Python 3)
it's vital to be easily found by other developers that use pip and PyPI
as THE
only repository for python packages and using the number of download as a ranking of popularity of a project.
If search can't be fixed there should be at least a warning on the PyPI homepage to let users know that search is broken and that using Google
for
searching could help to find more packages.
Bye, Giovanni
Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Thu, Sep 10, 2015 at 03:07:14PM +0300, Ionel Cristian Mărieș wrote:
Wouldn't it be better if you'd just build an external search service? Getting a list of packages and descriptions should be possible no? (just asking, not 100% sure)
That would be the idea. In fact preferably not build a service at all, just pay someone $50/mo for hosted ElasticSearch, rip out the guts of the old thing and write a small sync cron job similar to the one existing in the Bitbucket repo I linked.
David
On September 10, 2015 at 8:48:05 AM, David Wilson (dw+python-ideas@hmmz.org) wrote:
On Thu, Sep 10, 2015 at 03:07:14PM +0300, Ionel Cristian Mărieș wrote:
Wouldn't it be better if you'd just build an external search service? Getting a list of packages and descriptions should be possible no? (just asking, not 100% sure)
That would be the idea. In fact preferably not build a service at all, just pay someone $50/mo for hosted ElasticSearch, rip out the guts of the old thing and write a small sync cron job similar to the one existing in the Bitbucket repo I linked.
The old PostgreSQL based system has been gone for awhile, and we already have ElasticSearch with a small cron job that runs every 3 hours to index the data.
When we moved the database to Heroku this cronjob started taking 6+ hours to complete, because we were fetching data in too small of chunks which didn't actually hurt when the script and the database were running close to each other. That got "fixed" a day or two ago by increasing the size of the chunks we pulled from 1000 to 10000 and by switching to a SERIALIZABLE READ ONLY DEFERRABLE transaction so that we only needed to hold open a lock right at the very beginning which has the job finishing in 40 minutes now. I suspect further enhancements to the indexing speed will require locating the script in EC2 to get it closer to the PostgreSQL instance.
Given that these problems seem to be *new* since the move of the database to Heroku, I don't think the shape of our data in Elasticsearch nor the actual query we're using which hasn't changed should be at fault, so I've been trying to figure out what else we might have changed in the transition that would have caused it.
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Thu, Sep 10, 2015 at 09:31:13AM -0400, Donald Stufft wrote:
The old PostgreSQL based system has been gone for awhile, and we already have ElasticSearch with a small cron job that runs every 3 hours to index the data.
That's awesome news. :)
David
Just curious, are we re-indexing the whole thing each time, or does it take 40 minutes to update the index for 3 hours' worth of changes?
*Dan Poirier*Developer
dpoirier@caktusgroup.com www.caktusgroup.com
On Thu, Sep 10, 2015 at 9:31 AM, Donald Stufft donald@stufft.io wrote:
On September 10, 2015 at 8:48:05 AM, David Wilson ( dw+python-ideas@hmmz.org) wrote:
On Thu, Sep 10, 2015 at 03:07:14PM +0300, Ionel Cristian Mărieș wrote:
Wouldn't it be better if you'd just build an external search service? Getting a list of packages and descriptions should be possible no? (just asking, not 100% sure)
That would be the idea. In fact preferably not build a service at all, just pay someone $50/mo for hosted ElasticSearch, rip out the guts of the old thing and write a small sync cron job similar to the one existing in the Bitbucket repo I linked.
The old PostgreSQL based system has been gone for awhile, and we already have ElasticSearch with a small cron job that runs every 3 hours to index the data.
When we moved the database to Heroku this cronjob started taking 6+ hours to complete, because we were fetching data in too small of chunks which didn't actually hurt when the script and the database were running close to each other. That got "fixed" a day or two ago by increasing the size of the chunks we pulled from 1000 to 10000 and by switching to a SERIALIZABLE READ ONLY DEFERRABLE transaction so that we only needed to hold open a lock right at the very beginning which has the job finishing in 40 minutes now. I suspect further enhancements to the indexing speed will require locating the script in EC2 to get it closer to the PostgreSQL instance.
Given that these problems seem to be *new* since the move of the database to Heroku, I don't think the shape of our data in Elasticsearch nor the actual query we're using which hasn't changed should be at fault, so I've been trying to figure out what else we might have changed in the transition that would have caused it.
Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig