[Distutils] PyPI v2 (Was: PyPI pull request #7)

Sat Nov 23 17:57:45 CET 2013

On Sat, Nov 16, 2013 at 8:00 AM, Donald Stufft <donald at stufft.io> wrote:
>
> If people don’t like the requirements of Apache License 2.0 that’s fine they
> don’t need to contribute. Perhaps there might be a contributor or two lost
> to that but I’m not too worried about it.
>
> License is a bike shed issue and as it’s mine and Richard’s bike shed
> it’ll be painted our color.

Just sad to see lawyers are winning battle with simplicity, which makes
more jobs for them and less progress for us in the same time span.

>>>> 3. Why the raw SQL? It cripples the ability to scale, to host package
>>>> indexes on GAE or OpenStack. And it is right at the core. Why not to
>>>> "port NDB" to SQL or use HyperDB from Roundup?
>>>
>>> It was just recently switched to raw sql from using the SQL Alchemy
>>> SQL expression layer. The reason for the switch was we were not
>>> using the portability features and it was more complicated to understand
>>> the expression layer than just plain SQL.
>>
>> Ok. SQL Alchemy is too big and complicated. Why NoSQL didn't fit?
>> MongoDB is all over the net nowadays.
>
> NoSQL is a buzzword that doesn’t mean a whole lot other than “Not a
> relational database”. The various “NoSQL” solutions each fit well in
> different scenarios however this is not one of them. The database
> is well within the limits of PostgreSQL and referential integrity, indexes,
> and the like are useful constructs.

Right. That's why it is extremely interesting to learn by example from people
experienced in both. The question "why" that is not answered "because we
know how to work with it", but "because {{this}} and {{that}} use cases are
not solved on MongoDB level, more complicated, slow and not reliable".

>>>> 4. Why not Django/TG/... framework? (+pinax, +openid, +other_stuff)
>>>
>>> I’ll be adding a section to the documentation on this eventually, but
>>> basically.
>>>
>>> Django’s ORM wasn’t powerful enough to represent the existing Schema. When
>>> I had to throw out the ORM I found it difficult to integrate other pieces without
>>> instating more global state. Ultimately I felt that a Fraken-Django that used Django
>>> on the surface but with pieces replaced was going to require as much or more
>>> of  a learning curve, even for Django developers, that it didn’t make sense to
>>> do it.
>>
>> Maybe a good partitioning of application with big diagram could bring the best
>> from both world? Bare SQL and getting to the basics seems to radical to me.
>
> I’m not real keen on needing to do big diagrams. I don’t see any issue with writing
> SQL here, it’s a perfectly fine language for querying a database.

You're writing SQL, but nobody except you will understand why did you wrote it
(SQL) here and why SQL is the best choice here. Because there is no high level
overview that gives the details of the problem (queries that needs to be run) in
this particular part of application. Partitioning and blueprints are required to
understand what the code does - they don't specify that this is SQL or NoSQL -
these are implementation details, but blueprint helps to understand what exactly
SQL does and if it could be replaced.

>>> At that point I was a bit frustrated and fed up and turned to using just Werkzeug
>>> as a library. This enabled a rapid increase in pace (I got more down over the
>>> initial weekend then I had in a month previously).
>>
>> Isn't it just a WSGI library? What about reusable components that are already
>> written? I wouldn't risk rewriting OAuth support from scratch.
>
> Werkzeug is just a WSGI library yes. This means that any reusable pieces we use
> cannot be tied to a single framework. This might mean we need to make our own
> reusable pieces and release them. I consider this a benefit to the greater ecosystem
> as there is very little reason an OAuth implementation (for instance) needs to be
> tied to a single framework.

"reusable pieces" on WSGI level? Or on application level? I see
"reusable pieces" in
WSGI context strictly as middleware. In that case you make implicit API which is
highly dependent on middleware order, which makes code more convoluted and not
intuitive. Well, it may be that it is the best solution from
available, but I am not aware
of too much problems with isolating architecture inside application
itself without going
out to the level of global WSGI variables.

If application structure becomes dependent on middleware stack in
addition to API
versions, URL routing scheme and component specific API calls, it
needs a diagram
of all that zoo for every affected state, or some good description of this API.

>>>> PostgreSQL is killing all the fun. It takes a few hours to get
>>>> acquainted with its way of doing things - groups as users, per table
>>>> grants and insufficient DB permissions. There should be an option to
>>>> run on SQlite for development, like in Trac, Roundup etc.
>>>
>>> Warehouse only runs on PostgreSQL. It’s not that hard to have a local copy of
>>> PostgreSQL and It’ll enable using the advanced feature set of PostgreSQL
>>> instead of being limited to the lowest common denominator.
>>
>> What feature sets you need? Does they really worth killing the fun of working,
>> hack-a-toning and for local development?
>
> As I said, running a local copy of PostgreSQLis very easy. There are installers for
> Windows and OSX, and pretty much every *nix distribution will have it packaged.

That's not "easy" as in "clone and run" as it is now with Roundup,
Spyder and many
other projects _accessible_ for hacking and fixing typos and minor bugs.

> Off the top of my head I’m planning on using Array support, HStore, and functional/partial
> indexes.

Do you have examples for people not aware with these details, what
will they be good for?
Is that complication really necessary?

>> 100% of
>> test coverage makes it hard to make changes, sometimes more hard than necessary.
>> Perhaps I should take a look at this.
>
> It raises the bar for adding new code, but decreases the bar for maintaing the existing
> code. Given the fact that the code will be maintained more than new features will
> be written I consider this a positive.

Ok, I accept it.

>> BTW, why didn't you just copy crate.io?
>
> Crate’s code base suffers from a lot of bad assumptions that are baked into it when
> I was still learning how packaging really worked. On top of that most of the pieces
> that we need to replace PyPI don’t exist in Crate since Crate just mirrored PyPI. As it
> is Warehouse almost is at feature parity with Crate (but not PyPI).

I may be asking too much, but I am really interested to learn from these bad
assumptions. There is a good chance that I and other people who will be willing
to contribute still have them.

>>> It’s not a reinvention of Django. The “framework” parts of Warehouse
>>> are very small and are mostly glue that holds together pieces like
>>> Werkzeug and SQLAlchemy.
>>
>> Still, is there a diagram of Warehouse architecture? Blueprint,
>> CAD model of a building?
>
> I have a todo list item to write up architecture documentation that explains
> the layout and where the different pieces are at.

Will be much appreciated. Thank you.