[Catalog-sig] Some pypi mirrors not up to date?

ken cochrane kencochrane at gmail.com
Mon Jul 2 14:37:20 CEST 2012


I started looking at this the other day, and I haven't had a chance to fix
it because the Amazon datacenter outage took all of my time the past few

Here is what I found out.

b.pypi.python.org lives on GAE and it's currently stuck, and looking at the
logs I figured out what went wrong, but I'm not sure how to fix it. [3] See
log snippet at the end of the email.

Basically there is a python package called '__past__'  (see [0] link below)
that is causing the sync process to break because we are trying to use the
project name as the key_name for the Product model [1], and GAE model
key_name's can't contain underscores [2].

I'm not sure how to fix the issue without possibly breaking other things.
My first thought was to remove the underscores, but that might break
something else, or conflict with another project with a similar name. I
wrote to Martin who gave me the following advice.

>From "Martin v. Löwis":

Renaming/escaping sounds good. I'd check if there is any string that
> can be used in a GAE key name, but not be used in a PyPI package name.
> If not, standard escaping needs to be applied: a prefix of "dunder"
> is added to any package whose name starts and ends with __, as well
> as to any package whose name starts with "dunder".

> When looking at all child nodes, remove "dunder" from any name;
> when doing lookups by name, escape as above.

> If you do find a character/string that can be in a key name but
> not in a package name, just escape the string with that name -
> no need to worry about escaping the escape character. However it
> may be that the only possible choice is "/" (which I know cannot
> appear in a package name).

I looked through most of the pypi code and I think the only character you
can't have is "/", all other characters look like they work.

So, I know what is causing it, we just need to fix the issue, test it, and
roll out the fix. I was planning on doing it this past weekend but thanks
to AWS, I didn't have any time to work on it. If anyone has any free time,
feel free to take over / help. Just let others know so there isn't any
duplicate effort.

Let me know if you have any questions.

Ken Cochrane


[0] http://pypi.python.org/pypi/__past__/0.0.1.dev


[2] Information about model key_names

The key name for the entity. The name becomes part of the primary key. If
None, a system-generated numeric ID is used for the key.

The value for key_name must not be of the form __*__.
Log snippet.

   1. 2012-06-28 06:45:18.222

   step package '__past__'

   2. E2012-06-28 06:45:18.778

   illegal name in key path element: __past__
   Traceback (most recent call last):
     File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/_webapp25.py",
line 701, in __call__
     File "/base/data/home/apps/pypi/3.358089379617981219/handlers.py",
line 171, in get
     File "/base/data/home/apps/pypi/3.358089379617981219/fetch.py",
line 293, in cron
       return step()
     File "/base/data/home/apps/pypi/3.358089379617981219/fetch.py",
line 259, in step
       actions[action](m, todo, param)
     File "/base/data/home/apps/pypi/3.358089379617981219/fetch.py",
line 91, in package
       data = simple_page(m, name)
     File "/base/data/home/apps/pypi/3.358089379617981219/fetch.py",
line 70, in simple_page
     File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/db/__init__.py",
line 1074, in put
       return datastore.Put(self._entity, **kwargs)
     File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py",
line 579, in Put
       return PutAsync(entities, **kwargs).get_result()
     File "/base/python_runtime/python_lib/versions/1/google/appengine/api/apiproxy_stub_map.py",
line 604, in get_result
       return self.__get_result_hook(self)
     File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py",
line 1579, in __put_hook
     File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py",
line 1216, in check_rpc_success
       raise _ToDatastoreError(err)
   BadRequestError: illegal name in key path element: __past__

On Mon, Jul 2, 2012 at 8:11 AM, Jannis Leidel <jannis at leidel.info> wrote:

> On 02.05.2011, at 22:10, Martin v. Löwis <martin at v.loewis.de> wrote:
> > Am 02.05.2011 19:24, schrieb Jannis Leidel:
> >>
> >> On 02.05.2011, at 18:12, Maurits van Rees wrote:
> >>
> >>> Hi,
> >>>
> >>> I noticed that some distributions are not on all mirrors.  For example
> >>> http://a.pypi.python.org/simple/plone.app.referenceablebehavior/
> >>> has 0.1 and 0.2 (last one released 30 April)
> >>> but 0.2 is missing from
> >>> http://b.pypi.python.org/simple/plone.app.referenceablebehavior/
> >>>
> >>> Same for c and d.  Ah, no: those two have it now.  I know for sure
> that at least d did not have it five minutes ago.  And this version has
> been released two days ago, so it should have been slightly faster. :-)
> >>
> >> Hm, d doesn't seem to have the file on disk even thought it's on the
> simple page, see
> http://d.pypi.python.org/packages/source/p/plone.app.referenceablebehavior/
> >>
> >> Martin: Anything I can do to make sure this doesn't happen again?
> >
> > As the starting point, we should figure out why it happened in
> > the first place - it shouldn't have, of course. Most likely,
> > it's a bug :-)
> Looks like http://b.pypi.python.org is out of date again:
> http://www.pypi-mirrors.org
> Can we do something about that?
> Jannis
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20120702/d94d932d/attachment-0001.html>

More information about the Catalog-SIG mailing list