[Catalog-sig] disabling the serving of links from description_html?

M.-A. Lemburg mal at egenix.com
Tue Dec 18 19:36:14 CET 2012


On 18.12.2012 18:54, Holger Krekel wrote:
> On Tue, Dec 18, 2012 at 5:46 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> 
>> On 18.12.2012 15:54, Holger Krekel wrote:
>>> Hi Richard, hi all,
>>>
>>> While reading the pypi main and other sources i wondered how we could
>>> switch off serving links from description_html, at least on a per-project
>>> basis.  It's really annoying that when you start to add some links to a
>>> long_description that installation of your package will thus slow down
>>> around the world.  Even if you remove the links from the next release.
>>>
>>> How could we arrange for a maintainer to communicate to the pypi-server
>>> that a particular project should not ever serve links from
>> description_html
>>> (and maybe not even from the homepage while we are at it)?
>>>
>>> Preferably it should be something that can be done from existing setup.py
>>> files, like adding a special trove-classifier or keyword.  But a little
>>> custom tool or a web page form would be ok as well.
>>>
>>> If maintainers could easily switch off these extra links, then this means
>>> less stress for the pypi server and a global considerable speedup of
>>> installing python packages as often most of the pip/easy_install time is
>>> spent with checking out these URLs.
>>
>> Are you sure about about this ?
>>
>> AFAIK, setuptools/distribute only looks at links with rel="homepage"
>> or rel="download" attributes, not all links on the PyPI project page.
>> The links from the description don't receive such attributes.
>>
>> See e.g. http://pypi.python.org/simple/pytest/
>>
>>
> You are right, Marc.  Only the download and home page links (from all
> versions ever published) are considered from pip/easy_install.  I just
> examined it more closely via urlsnarf.  They were so many in some projects
> and mixed with the other links so i didn't see it clearly before (although
> i did notice the rel classification).
> 
> So to avoid the overhead one could retroactively remove all download links
> and maybe also all homepage links except the one for the latest version or
> so.   But that can be done without changes to pypi itself i guess.

It may be useful to add rel="description" to the links from the
descriptions. That way, a download tool could more easily detect
the origin of the links.

And perhaps rel="distribution_file" to links of the distribution
files.

Given that the simple index lists links for all releases, it may
also be useful to add a new version="x.y.z" attribute to the
links, so that a download tool can more easily determine which links
belong to which release. (More correct would be to add the version to
the rel attribute, but doing so would break setuptools, since it
does s substring search rather than parse the HTML.)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 18 2012)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2012-12-14: Released mxODBC.Connect 2.0.2 ...     http://egenix.com/go38
2012-12-05: Released eGenix pyOpenSSL 0.13 ...    http://egenix.com/go37
2013-01-22: Python Meeting Duesseldorf ...                 35 days to go

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/


More information about the Catalog-SIG mailing list