[Distutils] Proposal: Restrict the characters in a project name

Donald Stufft donald at stufft.io
Thu May 16 02:00:53 CEST 2013


PyPI already endures case insensitive uniqueness and considers - and _ the same for uniqueness checks

On May 15, 2013, at 12:08 PM, James Carpenter <nawkboy at gmail.com> wrote:

> While your at it, you might consider not allowing variation in case and dash vs. underscore when specifying a dependency. A project should have only one concrete name, without fuzziness.  A fuzzy match should result in a match failure. Fuzzy matches for a manual search is a different thing.
> 
> 
> On Wed, May 15, 2013 at 9:31 AM, Daniel Holth <dholth at gmail.com> wrote:
>> How to avoid confusables.
>> 
>> These scripts are recommended for use in identifiers:
>> http://www.unicode.org/reports/tr31/#Table_Recommended_Scripts
>> 
>> This report details a confusables detection algorithm:
>> http://www.unicode.org/reports/tr39/#Confusable_Detection
>> 
>> And ICU implements it:
>> http://www.icu-project.org/apiref/icu4c/uspoof_8h.html (see also
>> PyICU).
>> 
>> The package index would enforce uniqueness of the "skeleton" of each
>> registered package which is just an internal normalization based on
>> confusability. if skeleton(identifier1) == skeleton(identifier2) then
>> id1 and id2 are confusable.
>> 
>> The tooling could get away with a simpler rule like
>> re.sub("[^\w\d.]+", "_", distribution, re.UNICODE)
>> 
>> As a bonus to including the world, this should be able to prevent
>> people from exchanging zeroes for capital O.
>> 
>> On Wed, May 15, 2013 at 7:17 AM, Eric V. Smith <eric at trueblade.com> wrote:
>> > On 05/15/2013 07:10 AM, Donald Stufft wrote:
>> >>>>> Anyone want to run a scan over the PyPI package set to see
>> >>>>> how many packages would cause problems for a "[a-zA-Z0-9_.-]"
>> >>>>> only filter?
>> >>>>
>> >>>> See my previous email where I did queries against my local DB.
>> >>>> It's 225 total projects that wouldn't be allowed.
>> >>>
>> >>> Can you send the list of those projects?
>> >>>
>> >>> Eric.
>> >>>
>> >>
>> >> Here you go https://gist.github.com/dstufft/5583225 used a Python
>> >> oneliner and the PyPI API so others can reproduce easily if they
>> >> wish.
>> >
>> > Perfect. Thanks.
>> >
>> > It looks like space causes most of the issues. I'm not sure how
>> > "Twisted Flow >= 1.0" would be expected to parse.
>> >
>> > Eric.
>> >
>> >
>> > _______________________________________________
>> > Distutils-SIG maillist  -  Distutils-SIG at python.org
>> > http://mail.python.org/mailman/listinfo/distutils-sig
>> _______________________________________________
>> Distutils-SIG maillist  -  Distutils-SIG at python.org
>> http://mail.python.org/mailman/listinfo/distutils-sig
> 
> _______________________________________________
> Distutils-SIG maillist  -  Distutils-SIG at python.org
> http://mail.python.org/mailman/listinfo/distutils-sig
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20130515/dbbca4aa/attachment.html>


More information about the Distutils-SIG mailing list