[Catalog-sig] start on static generation, and caching - apache config.

Phillip J. Eby pje at telecommunity.com
Tue Jul 10 16:15:05 CEST 2007


At 07:33 AM 7/10/2007 +0200, Martin v. Löwis wrote:
> > Yes, in order to find the correct spelling for a package's name.  If a
> > user types, say "pylons" when the package is listed on PyPI as "Pylons",
> > setuptools looks at the root after the lookup of /pypi/pylons fails.
>
>I don't understand. How does it help to look at /pypi in this case?

It doesn't.  It looks at /pypi/ (note the trailing /) -- which lists 
all packages.


>The right spelling of Pylons is not listed there, unless there was
>a release of Pylons recently.
>
>If you want to correct the spelling, you need to look at
>
>http://cheeseshop.python.org/pypi?%3Aaction=index

Which is also spelled /pypi/ - the advantage of this is that a purely 
static index consisting of Apache directory indexes produces an 
equally useful result for setuptools.


> > A case-insensitive match by safe_name would be ideal, and could also be
> > used to prevent users from registering packages whose names differ only
> > by case or punctuation.
>
>Would it be acceptable to do an HTTP redirect in that case, ie.
>redirect /pypi/pylons/0.9.5 to /pypi/Pylons/0.9.5?

Yes, although setuptoools at the moment looks at /pypi/pylons/ 
(again, with a trailing /) and does not go to individual version 
pages unless the base page contains only links to individual version pages.

It will handle a redirect correctly, as far as interpreting relative 
links on result pages.


>  I would not
>want to have multiple URLs to render the same page, in general
>(I know it already does that in some cases).
>
>I can see how lower-casing helps; I'm doubtful about replacing
>spaces. I.e. why is it better to look for
>
>python-ftp-server-library--pyftpdlib-

That '--' would actually just be one '-'

>than
>
>Python FTP server library (pyftpdlib)

It's not much better, however, there are a lot of packages with 
shorter names for which it does help.  Mainly, though, setuptools 
just uses this for purposes of determining distribution filenames.


>IOW, if you have a mis-spelling of the latter, what are the
>chances that it is so misspelled that the safe_name is still
>the former? Shouldn't the package owner just correct the
>package name, to pyftpdlib, and put the other string into
>the summary?
>
>In any case, if it where postgres 8.1 or later, I could simply do
>
>select name from packages where
>regexp_replace(lower(name),'[^a-z0-9.]','-')='gnosis-utilities';
>
>to do the lookup; with 7.4, I would have to download all names
>and do the safe matching myself.

I think this will work instead:

    select name from packages where name ~* 'gnosis[^a-z0-9.]+utilities'

i.e., replace all '-' in the safe_name() with the appropriate 
regex.  '~*' is the case-insensitive regular expression match 
operator, according to:

    http://www.postgresql.org/docs/7.4/interactive/functions-matching.html

Of course, it may also suffice to do:

    select lower(name) from packages where name like 'gnosis_%utilities'

i.e. replace all '-' in the safe_name with '_%', which is sort of 
like '.+' in a regex.  You would still have to postprocess the result 
to catch the difference between say, "gnosis-utilities" and 
"gnosis3utilities" or some such, but there should be very few such matches.

The "like" query may be easier for postgres to use an index on - an 
expression index on lower(name) would do the trick.  Of course, I'm 
used to trying to optimize much larger databases than PyPI - with 
only a few thousand entries, a non-index query here may be just fine.

In any case, this query should also be used to check for uniqueness 
when adding packages.



More information about the Catalog-SIG mailing list