[Catalog-sig] start on static generation, and caching - apache config.
Phillip J. Eby
pje at telecommunity.com
Tue Jul 10 16:15:05 CEST 2007
At 07:33 AM 7/10/2007 +0200, Martin v. Löwis wrote:
> > Yes, in order to find the correct spelling for a package's name. If a
> > user types, say "pylons" when the package is listed on PyPI as "Pylons",
> > setuptools looks at the root after the lookup of /pypi/pylons fails.
>
>I don't understand. How does it help to look at /pypi in this case?
It doesn't. It looks at /pypi/ (note the trailing /) -- which lists
all packages.
>The right spelling of Pylons is not listed there, unless there was
>a release of Pylons recently.
>
>If you want to correct the spelling, you need to look at
>
>http://cheeseshop.python.org/pypi?%3Aaction=index
Which is also spelled /pypi/ - the advantage of this is that a purely
static index consisting of Apache directory indexes produces an
equally useful result for setuptools.
> > A case-insensitive match by safe_name would be ideal, and could also be
> > used to prevent users from registering packages whose names differ only
> > by case or punctuation.
>
>Would it be acceptable to do an HTTP redirect in that case, ie.
>redirect /pypi/pylons/0.9.5 to /pypi/Pylons/0.9.5?
Yes, although setuptoools at the moment looks at /pypi/pylons/
(again, with a trailing /) and does not go to individual version
pages unless the base page contains only links to individual version pages.
It will handle a redirect correctly, as far as interpreting relative
links on result pages.
> I would not
>want to have multiple URLs to render the same page, in general
>(I know it already does that in some cases).
>
>I can see how lower-casing helps; I'm doubtful about replacing
>spaces. I.e. why is it better to look for
>
>python-ftp-server-library--pyftpdlib-
That '--' would actually just be one '-'
>than
>
>Python FTP server library (pyftpdlib)
It's not much better, however, there are a lot of packages with
shorter names for which it does help. Mainly, though, setuptools
just uses this for purposes of determining distribution filenames.
>IOW, if you have a mis-spelling of the latter, what are the
>chances that it is so misspelled that the safe_name is still
>the former? Shouldn't the package owner just correct the
>package name, to pyftpdlib, and put the other string into
>the summary?
>
>In any case, if it where postgres 8.1 or later, I could simply do
>
>select name from packages where
>regexp_replace(lower(name),'[^a-z0-9.]','-')='gnosis-utilities';
>
>to do the lookup; with 7.4, I would have to download all names
>and do the safe matching myself.
I think this will work instead:
select name from packages where name ~* 'gnosis[^a-z0-9.]+utilities'
i.e., replace all '-' in the safe_name() with the appropriate
regex. '~*' is the case-insensitive regular expression match
operator, according to:
http://www.postgresql.org/docs/7.4/interactive/functions-matching.html
Of course, it may also suffice to do:
select lower(name) from packages where name like 'gnosis_%utilities'
i.e. replace all '-' in the safe_name with '_%', which is sort of
like '.+' in a regex. You would still have to postprocess the result
to catch the difference between say, "gnosis-utilities" and
"gnosis3utilities" or some such, but there should be very few such matches.
The "like" query may be easier for postgres to use an index on - an
expression index on lower(name) would do the trick. Of course, I'm
used to trying to optimize much larger databases than PyPI - with
only a few thousand entries, a non-index query here may be just fine.
In any case, this query should also be used to check for uniqueness
when adding packages.
More information about the Catalog-SIG
mailing list