[Catalog-sig] Flamenco queries

Phillip J. Eby pje at telecommunity.com
Mon Apr 9 05:28:28 CEST 2007


At 05:17 AM 4/9/2007 +0200, Martin v. Löwis wrote:
> > It looks like the __init__ builds a data structure and then
> > get_matches() and list_choices() run off this structure.  It would
> > certainly be worth fixing this!
>
>I have now committed a fix in PyPI (r441) which performs the computation
>of selected entries in SQL, considerably improving browsing if
>categories are selected.
>
>If no categories are selected, browsing is still slow. This could be
>improved by caching a tally table, as the standard browse page does
>not need to report any package names (just a tally).
>
>Is there any efficient way to compute a tally in PostgreSQL on the
>fly?

Perhaps this?

select rc.trove_id, count(*)
   from releases r, release_classifiers rc
  where r.name=rc.name and r.version=rc.version
    and r._pypi_hidden=FALSE
group by rc.trove_id

I'm basing this strictly off the other query you posted and with no real 
knowledge of the schema, so I could be way off here.  But it seems like 
this should be very efficient if there is an index on (trove_id, name, 
version) in the release classifiers table and one on (name, version, 
_pypi_hidden) in the releases table.  That would allow the query to be 
executed entirely on the indexes without needing any table contents.  Of 
course, given the relatively small space of trove identifiers compared to 
releases, some other scan pattern might be equally efficient I suppose.



More information about the Catalog-sig mailing list