Case sensitivity of package names

On Jul 12, 2007, at 2:43 PM, Phillip J. Eby wrote:
At 02:15 PM 7/12/2007 -0400, Jim Fulton wrote:
I want to make sure I understand this. I would hope that searching would be case insensitive and otherwise flexible wrt names.
PyPI's searching is indeed case insensitive, and is a substring/ keyword search as well.
Is there any reason we can't expect URLs and requirement specifications to be precisely spelled? That is, if someone names their package "sPaM", I see no reason why PyPI needs to support anything other than http:// www.python.org/pypi/sPaM as the one URL of the package. Someone should be able to use the search UI to search for "spam" and see a result that includes "sPaM". From then on, they should be able to type the name "sPaM". Or am I missing something?
You're missing that the subject is about similarity of names.
A typo of say, 'SPam' shouldn't return me some package *other* than the one I'm looking for. I
No, I understand that part. I understand the desire to avoid conflicts that cause problems down the road. I would prefer to "disallow" this by rejecting new package names that are too similar to already-registered packages.
t'd be nice if the resulting page said something besides "Not Found", too... like "there's no SPam, but here are a bunch of packages whose name contains 'spam'".
I think this would be fine in a human interface.
If it did that, setuptools would be able to find the right page without hitting the main index, too. But redirection, as proposed by Martin, also accomplishes the same thing.
I really don't like this for setuptools. My preference is that setuptools should be required to ask for a package with precise spelling.
And again, all this helps human direct users of the index, too.
I think it encourages humans to do bad things. Is someone misspells ZODB3 as zodb3 and is able to install it with easy_install, then they'll be tempted to use the name "zodb3" in their requirements specifications. That is a bad thing IMO. We're talking about technical users and I think it is reasonable to expect them to be precise in their specifications. I could live with case-insensitive package names if we (for some definition of we, possibly being Guido) decided we want them, but I'd prefer they be case sensitive. I'd still be in favor of avoiding confusing duplicates. If we stick with case-sentitive package names, then I'd prefer that the interaction of setuptools with the index be case sensitive. I wouldn't object to setuptools giving people help. So, for example, if I type "zodb3", I wouldn't object to setuptools letting the user know that maybe they should use ZODB3. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org

At 03:02 PM 7/12/2007 -0400, Jim Fulton wrote:
We're talking about technical users and I think it is reasonable to expect them to be precise in their specifications.
IMO, "technical users" is a wider range of people than you seem to be thinking of. In any case, this is a separate topic from disallowing too-similar names -- which you agree we should do. Whether to then also introduce case-sensitivity into various parts of easy_install is another subject that doesn't really matter to the catalog-sig. Please note, however, that it is not a minor change by any means -- case-insensitivity exists throughout pkg_resources and setuptools to handle operating system filename case-insensitivity, not just for index lookups. In fact, I believe the index lookups *are* case-sensitive; IIRC it's only link parsing that is case-insensitive.

On Jul 12, 2007, at 3:26 PM, Phillip J. Eby wrote: ...
Whether to then also introduce case-sensitivity into various parts of easy_install is another subject that doesn't really matter to the catalog-sig.
I'm not sure we agree on what matters to the catalog sig. :) (I still need to respond to your note on that topic.)
Please note, however, that it is not a minor change by any means -- case-insensitivity exists throughout pkg_resources and setuptools to handle operating system filename case-insensitivity, not just for index lookups. In fact, I believe the index lookups *are* case- sensitive; IIRC it's only link parsing that is case-insensitive.
I'm not suggesting that you shouldn't deal with file-system case insensitivity. If I were to change setuptools to match my opinion, I would probably just change the code that tries to get a package listing to look for close matches to print a suggestion and stop rather than guessing a package name and continuing. Jim -- Jim Fulton mailto:jim@zope.com Python Powered! CTO (540) 361-1714 http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org

I really don't like this for setuptools. My preference is that setuptools should be required to ask for a package with precise spelling.
I think the way setuptools currently works is this: Every name gets converted to its lower-case safe-name equivalent. All dependencies, file names, resource identifications etc are based on that version of the name, *not* the "true" name of the package. Then, when setuptools tries to find a package whose "true" name is in mixed-case, it uses the lower-cased safe-named version, and PyPI reports that the package does not exist. Then, setuptools queries the entire package list, trying to find out the original spelling of the package. I'm sure Phillip will correct me if I'm wrong.
I could live with case-insensitive package names if we (for some definition of we, possibly being Guido) decided we want them, but I'd prefer they be case sensitive. I'd still be in favor of avoiding confusing duplicates. If we stick with case-sentitive package names, then I'd prefer that the interaction of setuptools with the index be case sensitive.
See above - I believe setuptools package names are case insensitive today. Regards, Martin

At 11:38 PM 7/12/2007 +0200, Martin v. Löwis wrote:
I really don't like this for setuptools. My preference is that setuptools should be required to ask for a package with precise spelling.
I think the way setuptools currently works is this:
Every name gets converted to its lower-case safe-name equivalent. All dependencies, file names, resource identifications etc are based on that version of the name, *not* the "true" name of the package.
Object comparisons are done case-insensitively, but the objects themselves keep the case-insensitive forms ('key' attributes) separate from the originally-input names ('project_name' attributes).
Then, when setuptools tries to find a package whose "true" name is in mixed-case, it uses the lower-cased safe-named version, and PyPI reports that the package does not exist. Then, setuptools queries the entire package list, trying to find out the original spelling of the package.
This is almost correct, except that it actually tries to lookup whatever the user actually input, then the safe_name() form of that. For index lookups, it does not actually change the case of what was entered, so if the user enters something that exactly matches what's on PyPI, they'll have a better chance of getting everything in one request.... unless there are multiple versions listed, of course.
participants (3)
-
"Martin v. Löwis"
-
Jim Fulton
-
Phillip J. Eby