[Catalog-sig] search queries in PyPI

Tarek Ziade tarek.ziade at ingeniweb.com
Wed May 14 21:06:09 CEST 2008


2008/5/14 Noah Kantrowitz <kantrn at rpi.edu>:

> Tarek Ziade wrote:
>
>> Hi,
>>
>> I was wondering how the search works in PyPI (didn't have time to digg the
>> code)
>>
>> I was unable to do specific queries. For instance, how do I get the
>> packages
>> that have
>> the word "nose" and the word "plugin" in their short descriptions ?
>>
>> I tried 'nose AND plugin', 'nose+plugin', etc.. without success.
>>
>> I tried '"nose plugin"' and I got back a package that had this sequence of
>> words, but also had a package that
>> has nothing to do with it (z3c.sampledata
>> 0.1.0<http://pypi.python.org/pypi/z3c.sampledata/0.1.0>
>> )
>>
>>
> Try "nose%plugin". Thats the syntax used in the XML-RPC API at least.


ah ! interesting, that worked, thanks !

I have also digged the code to get how it is done.

here's the pseudo code:

def search(query):
    results = {}

    terms = query.split('')
    for term in terms:
        for field in ('name', 'description', 'summary'):
            for result in store.query_packages(term):
                # ... some score calculation if result.name == field
                results[result.name] = result

    return results

Basically, there is one request over the storage (database) for each word
entered in the query,

'AND' is not used, it is event removed because it is listed as a stop word.

So, Noah's query, using %, doesn't split the words and sends them directly
to the DB
using the LIKE sql statement in one string.

In the meantime, store.query_package. *has* a feature to do AND and OR
searches:

def query_packages(query, operator='and'):
   ...


I think it wouldn't cost too much here to change the webui interface, to use
store.py features.
It woud also make it faster since only one database query could be done per
search.

I still need to install a PyPI instance for a patch I wanted to propose for
making pypi permissive on
unexisting classifiers, so maybe I can try a patch for this in the meantime
?

the change could take into account AND and OR words, to do the proper query,

Tarek




>
>
> --Noah
>
>
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig
>
>


-- 
Tarek Ziadé - Directeur Technique
INGENIWEB (TM) - SAS 50000 Euros - RC B 438 725 632
Bureaux de la Colline - 1 rue Royale - Bâtiment D - 9ème étage
92210 Saint Cloud - France
Phone : 01.78.15.24.00 / Fax : 01 46 02 44 04
http://www.ingeniweb.com - une société du groupe Alter Way
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20080514/ee52fcde/attachment.htm>


More information about the Catalog-SIG mailing list