[Distutils] Python people want CPAN and how the latter came about

Fri Dec 25 10:09:29 CET 2009

On Fri, Dec 25, 2009 at 09:00, Sridhar Ratnakumar
<sridharr at activestate.com> wrote:
> Greetings Lennart,
>
> On 12/24/2009 10:27 PM, Lennart Regebro wrote:
>>
>> On Fri, Dec 25, 2009 at 05:39, Sridhar Ratnakumar
>> <sridharr at activestate.com>  wrote:
>>>
>>> Is it because of this benefit to package authors that we are withholding
>>> the implementation of a simple archive that would: 1) simplify the tools
>>> to no rely on adhoc web scrapping
>>
>> There are better ways to do that.
>
> May I ask, what would they be?

Have links in the metadata to the file locations. That means you don't
have to scrape the websites to find the links, the links would be in
the metadata for the packages, or accessible in some other easy way.
Scraping would no longer be needed, without requiring uploads to PyPI.

>> That's *their* problem. If they don't want to upload, then they don't
>> want to upload.
>
> As the original proposal is to retain the existing behavior for already
> registered/uploaded package releases (such as Twisted) so existing systems
> will continue to work, but implement the suggested upload rules only for new
> requests (creation/register)- so as to gradually improve the quality of PyPI
> like that of other packaging systems - by encouraging authors to generate a
> reasonably good sdist (setup.py + PKG-INFO) and uploading them

No, that's not encouraging, that's requiring and forcing. That is NOT
the same thing.
Again: If you tell people you have to upload to register, the effect
of that is to NOT register. It will not make anybody upload, it will
make them NOT register.

It has already been explained in this discussion why the Twisted folks
doesn't upload to PyPI: Because it for various reasons doesn't work
for them. Your solution to get more packages to PyPI is to tell people
to upload or bugger off. My solution is to fix the reason they don't
upload.

It really is that easy: If you tell people to upload or bugger off,
they will bugger off.

> If I want to use a web service, I obviously have to adhere to their rules
> and policies. Nobody is forcing me to do so.

Exactly. Nobody is forcing anybody to use PyPI. By making it HARDER to
use and have MORE requirements LESS people will use it. Is it really
that hard to understand?

> I assume in good faith that package authors will be happy to adapt to the
> new system

You are wrong.

> .. for the benefit of everyone.

No, its not for the benefit of everyone. It's for the benefit of
adherence to random rules with no purpose. If we want more packages on
PyPI, we should fix the reasons that not everyone uploads their
packages. And yes, I *am* going to repeat this in different ways and
wordings until your ears fall off. ;-)

> Why not? Do you conceive of any reason apart from CPAN-like archives that
> would help in proliferation of mirror sites and third-party sites?

The point is that we *have* a CPAN like archive.

> because I personally went through significant hurdles to setup a daily PyPI
> mirror-like area. I just don't see how someone merely interested in writing
> a third-party service, or setup a mirror of PyPI would be *most likely
> inclined* to face similar hurdles before giving up.

You are so focused and stuck on that before you can do anything else
you have to mirror PyPI completely, only using rsync. I don't see what
that would have to be so. Rsync is not the be all and end all of
mirroring and most third party services do not need to mirror. They
need to get data, and that's possible quite easily.

>> Yes, but it's not particularly unreliable to compare the filename to
>> see if it had been handled before. You don't even need to parse the
>> version number for most services that work on the tarballs.
>
> It is indeed unreliable to rely on filenames to get package versions

Yes, but it's not particularly unreliable to compare the filename to
see if it had been handled before. You don't even need to parse the
version number for most services that work on the tarballs.

> I am not speculating as I've actually experimented with the PyPI index,
> mirroring it .. handling the metadata in packages, and building it.

Yes, but again, most third party service would not mirror it. A mirror
would. But a mirror is only one type of third-party service out a
many.

>> Yes, but since thay have the source package, and will have to unpack
>> it and build the packages anyway, they also have the metadata.
>
> It is not that simple. PyPM backend, for instance, is not monolithic as in
> doing only a sequential build of packages. It first loads the dependency
> graph (for which metadata - PKG-INFO/requires.txt - is required) from our
> internal mirror over the network. It is expensive to go extract each and
> every tarball .. from each build machine. After loading the dependency
> graph, and then comparing it with existing repository .. every day, new
> builds happen.

You mean you build every package that also *depends* on a package that
has changed? Yeah, that does require the metadata. But as I said, an
easy way to mirror the metadata would definitely be an improvement.

> Further, I can imagine search.cpan.org (which is not hosted by cpan.org
> folks) using only the metadata without touching the source distributions.

Right. Hence, they would *not* need both. Which was my point.

-- 
Lennart Regebro: http://regebro.wordpress.com/
Python 3 Porting: http://python-incompatibility.googlecode.com/
+33 661 58 14 64