[Catalog-sig] safe_names again

Phillip J. Eby pje at telecommunity.com
Sat Jul 8 20:07:31 CEST 2006


At 11:02 PM 7/8/2006 +1000, richardjones at optusnet.com.au wrote:
>[sorry for the terrible email quoting / formatting - I'm stuck in webmail 
>ATM. I'm also having trouble following the discussion PJE and Jim are 
>having - and that's a concern to me because some of the stuff I'm reading 
>worries me.]
>
>I have created a branch for this and have begun the slow process of 
>working in the setuptools name mangling. It's going to take some time 
>since package names are used all over the place and form an integral part 
>of the database referential structure.
>
>My "plan" for implementation of this is as follows:
>
>1. Convert meta-data supplied by end users using safe_name. We call this 
>"name" internally (replacing the current use of that column). The original 
>name is retained for display purposes, stored as "display_name" on the 
>packages table.
>
>2. All user input of names for filtering must be mangled before searching 
>is performed.
>
>3. Find all places where we display names and convert them to use the 
>"display_name" column.

It's just a suggestion, but you might find it easier to phase in by simply:

1. Reject creating a package if its safe-name conflicts with another package
2. Provide search facilities that search on mangled name (which is used by 
#1 to verify a package that's about to be added.

This seems like a more conservative route with a lot less effort involved 
if you want to ease into it.  In the simplest case it requires no database 
schema changes, although if performance is an issue, adding a mangled name 
column for the searches would be useful, but not strictly necessary.  More 
to the point, these two changes do not require wholesale refactoring of PyPI.

Of course, if you feel that all searches in PyPI should work this way, then 
your plan makes more sense.  I wasn't sure if you felt that way or not.  On 
the other hand, by leaving "name" alone, and having a "safe_name" column 
for searching, you can make the change more gradual, if that's a 
concern.  You could make safe_name be uniquely indexed, while leaving name 
as the primary key, and just trap the error when inserting a conflicting 
safe_name.

Anyway, just some random thoughts to see if there's an easier way for you 
to do this.



More information about the Catalog-sig mailing list