[Distutils] PyPI v2 (Was: PyPI pull request #7)

Donald Stufft donald at stufft.io
Sat Nov 16 06:00:59 CET 2013


On Nov 14, 2013, at 5:00 PM, anatoly techtonik <techtonik at gmail.com> wrote:

> First things first, thanks for the detailed response.
> 
> On Fri, Nov 8, 2013 at 3:41 PM, Donald Stufft <donald at stufft.io> wrote:
>> 
>>> First some background questions:
>>> 1. Everything for core development is in HG. Why Git now? Why
>>> Mercurial suxx (three major personal annoyances will do)? Why
>>> BitBucket suxx?
>> 
>> I’m most familiar with git, so I used git when I started it.
> 
> Seems fair.
> 
>>> 2. Why Apache 2.0? Is it because everybody is using that? Why not try
>>> CC0/CC-BY/MIT/ISC - code under these licenses is easier to copy/paste,
>>> and is a better investment for your time as a coder studying the stuff
>>> and filling up the space of your brain.
>> 
>> Apache 2.0 is a good license, it is similar to MIT/ISC except it has a
>> patent clause.
> 
> Now everybody is also afraid of patents. Are there any other *short* licenses
> with patent grant? Why patents started to work by default? I thought you
> should explicitly mention if you work is patented or pending for them to work.
> 
> Regarding licenses, I'd still prefer to learn codebase that I can freely
> copy/paste to my own projects without additional obligations like some that
> imposed by Apache 2.0. I can understand crediting, but everything else just
> keeps people away.
> 
> If not sure if CC-BY applies to software projects, but I'd choose something
> as simple as this not only for code, but also for pydotorg content.

If people don’t like the requirements of Apache License 2.0 that’s fine they
don’t need to contribute. Perhaps there might be a contributor or two lost
to that but I’m not too worried about it.

License is a bike shed issue and as it’s mine and Richard’s bike shed
it’ll be painted our color.

> 
>>> 3. Why the raw SQL? It cripples the ability to scale, to host package
>>> indexes on GAE or OpenStack. And it is right at the core. Why not to
>>> "port NDB" to SQL or use HyperDB from Roundup?
>> 
>> It was just recently switched to raw sql from using the SQL Alchemy
>> SQL expression layer. The reason for the switch was we were not
>> using the portability features and it was more complicated to understand
>> the expression layer than just plain SQL.
> 
> Ok. SQL Alchemy is too big and complicated. Why NoSQL didn't fit?
> MongoDB is all over the net nowadays.

NoSQL is a buzzword that doesn’t mean a whole lot other than “Not a
relational database”. The various “NoSQL” solutions each fit well in
different scenarios however this is not one of them. The database
is well within the limits of PostgreSQL and referential integrity, indexes,
and the like are useful constructs.

> 
>>> 4. Why not Django/TG/... framework? (+pinax, +openid, +other_stuff)
>> 
>> I’ll be adding a section to the documentation on this eventually, but
>> basically.
>> 
>> Django’s ORM wasn’t powerful enough to represent the existing Schema. When
>> I had to throw out the ORM I found it difficult to integrate other pieces without
>> instating more global state. Ultimately I felt that a Fraken-Django that used Django
>> on the surface but with pieces replaced was going to require as much or more
>> of  a learning curve, even for Django developers, that it didn’t make sense to
>> do it.
> 
> Maybe a good partitioning of application with big diagram could bring the best
> from both world? Bare SQL and getting to the basics seems to radical to me.

I’m not real keen on needing to do big diagrams. I don’t see any issue with writing
SQL here, it’s a perfectly fine language for querying a database.

> 
>> At that point I was a bit frustrated and fed up and turned to using just Werkzeug
>> as a library. This enabled a rapid increase in pace (I got more down over the
>> initial weekend then I had in a month previously).
> 
> Isn't it just a WSGI library? What about reusable components that are already
> written? I wouldn't risk rewriting OAuth support from scratch.

Werkzeug is just a WSGI library yes. This means that any reusable pieces we use
cannot be tied to a single framework. This might mean we need to make our own
reusable pieces and release them. I consider this a benefit to the greater ecosystem
as there is very little reason an OAuth implementation (for instance) needs to be
tied to a single framework.

> 
>> Feedback is good, but it’ll be tempered to prevent http://theoatmeal.com/comics/design_hell
> 
> That's a nice argument. =)
> 
>>>>> You also need a backlog for
>>>>> collaboration. My ETA for new PyPI is no earlier than PyCon 2014 if
>>>>> Donald and Richard will be working on it full time.
>>>> 
>>>> Possibly! I’m unsure of how long it will take, it’s primarily Richard and I but we’ve a
>>>> few domain experts in particular pieces who have offered to help out as well when
>>>> we’re ready for their pieces.
>>> 
>>> PostgreSQL is killing all the fun. It takes a few hours to get
>>> acquainted with its way of doing things - groups as users, per table
>>> grants and insufficient DB permissions. There should be an option to
>>> run on SQlite for development, like in Trac, Roundup etc.
>> 
>> Warehouse only runs on PostgreSQL. It’s not that hard to have a local copy of
>> PostgreSQL and It’ll enable using the advanced feature set of PostgreSQL
>> instead of being limited to the lowest common denominator.
> 
> What feature sets you need? Does they really worth killing the fun of working,
> hack-a-toning and for local development?

As I said, running a local copy of PostgreSQLis very easy. There are installers for
Windows and OSX, and pretty much every *nix distribution will have it packaged.

Off the top of my head I’m planning on using Array support, HStore, and functional/partial
indexes.

> 
>>>>> So, instead of
>>>>> all-or-nothing scenario I can try to find some help with incremental
>>>>> approach.
>>>> 
>>>> Mostly the problem with improving the current base is every change is particularly
>>>> dangerous. The code base is large, it’s untested, and it’s not very nice. It’s extremely
>>>> easy to break things by accident with seemingly unrelated change. Richard and I
>>>> have both done this multiple times.
>>> 
>>> This smells like a bad spaghetti. Do you think it will be maintainable
>>> if it is already so fragile on the early stages of development?
>> 
>> This is talking about the current code base, not about Warehouse. Warehouse has
>> and requires 100% unit test coverage and will be gaining a functional test suite
>> on top of that to test high level user stories.
> 
> At least somebody who is not getting mad using "user story" buzzwords.
> A pleasure. =)
> Sorry I missed the point that you broke old PyPI code, now the new
> one. Still, 100% of
> test coverage makes it hard to make changes, sometimes more hard than necessary.
> Perhaps I should take a look at this.

It raises the bar for adding new code, but decreases the bar for maintaing the existing
code. Given the fact that the code will be maintained more than new features will
be written I consider this a positive.

> 
> BTW, why didn't you just copy crate.io?

Crate’s code base suffers from a lot of bad assumptions that are baked into it when
I was still learning how packaging really worked. On top of that most of the pieces
that we need to replace PyPI don’t exist in Crate since Crate just mirrored PyPI. As it
is Warehouse almost is at feature parity with Crate (but not PyPI).

> 
>> Warehouse
>> is (hopefully) easy to iterate on, has what I believe is a well laid out
>> code base, and has an extensive unit test suite.
> 
> Ok. I should take a closer look.
> 
>>> Do you have a diagram of the system what are you trying to build, or
>>> is it just an experiment? It may worth to put IDE aside and try to
>>> de-couple some things first on the whiteboard. I am afraid that I will
>>> never be ready to help with reinventing Django. At least not without
>>> some research and art coverage.
>> 
>> It’s not a reinvention of Django. The “framework” parts of Warehouse
>> are very small and are mostly glue that holds together pieces like
>> Werkzeug and SQLAlchemy.
> 
> Still, is there a diagram of Warehouse architecture? Blueprint,
> CAD model of a building?

I have a todo list item to write up architecture documentation that explains
the layout and where the different pieces are at.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20131116/cc3150d2/attachment.sig>


More information about the Distutils-SIG mailing list