Very quiet here, so thought I would toss in some notes I've been making regarding Python's module system, the current DistUtils 1.x and some of the proposals I've seen for Distutils 2. These notes are very rough so I dunno how much sense they'll make to anyone else in their current state, but I figure it's better to pitch them in to find out if there's any interest in discussing them further than spend time polishing them if there isn't.
Let us know what you think, and we can take it from there if folk are interested.
"The ultimate goal: Must be backwards-compatible with existing setup.py scripts."
This is both a red herring and likely recipe for DU2 becoming a big ball of mud before it's even out the door...
- Compatibility for existing setup.py scripts can easily be ensured by retaining DU1. DU1 should be declared at end of its development life. DU1 API may eventually be re-implemented on top of DU2, allowing DU1 core to be ditched to reduce maintenance cost. Deprecate DU1 API.
- DU1 doesn't scale down as well as it could/should. Doesn't scale up as well as it could/should. Current DU2 proposals don't seem to address these points, seeking only to add new material on top rather than reexamine/reevaluate existing architecture. Some current DU2 proposals smack of rampant architecture astronomy, lacking sufficient evaluation of their potential cost or whether the same goals could be achieved through other, simpler means.
- DU2 provides an opportunity to review everything learnt over course of DU1 development and do it better. DU1 development has stagnated under its own weight. DU1 architecture is a rat's nest. Not a good base to build DU2 on. Better to design afresh: assemble list representative range of use cases and their relative frequencies in real-world use, determine "ideal" solution, determine "practical" solution. "Practical" solution = "ideal" solution minus anything that would prove too disruptive to Python, or too expensive for the benefits it'd provide, or where existing material from DU1 could be leveraged in at less cost than reimplementing from scratch.
- Before adding new features/complexity, refactor current _design_ to simplify it as much as possible. Philosophy here is much more hands-off than DU1; less is more; power and flexibility through simplicity: make others (filesystem, generic tools, etc.) do as much of the work as possible; don't create dependencies.
-- e.g. c.f. Typical OS X application installation procedure (mount disk image and copy single application package to Applications folder; no special tools/actions required) versus typical Windows installation procedure (run InstallShield to put lots of bits into various locations, update Registry, etc.) or typical Unix installation procedure (build everything from source, then move into location). Avoiding overreliance on rigid semi-complex procedures will allow DU2 to scale down very well and provide more flexibility in how it scales up.
- Eliminate DU1's "Swiss Army" tendencies. Separate the build, install and register procedures for higher cohesion and lower coupling. This will make it much easier to refactor design of each in turn.
- Every Python module should be distributed, managed and used as a single folder containing ALL resources relating to that module: sub-modules, extensions, documentation (bundled, generated, etc.), tests, examples, etc. (Note: this can be done without affecting backwards-compatibility, which is important.) Similar idea to OS X's package scheme, where all resources for [e.g.] an application are bundled in a single folder, but less formal (no need to hide package contents from user).
- Question: is there any reason why modules should not be installable via simple drag-n-drop (GUI) or mv (CLI)? A standard policy of "the package IS the module" (see above) would allow a good chunk of both existing and proposed DU "features" to be gotten rid of completely without any loss of "functionality", greatly simplifying both build and install procedures.
--Replace current system where user must explicitly state what they want included with one where user need only state what they want excluded. Simpler and less error-prone; fits better with user expectations (meeting the most common requirement should require least amount of work, ideally none). Manifest system would no longer be needed (good riddance). Most distributions could be created simply by zipping/tar.gzipping the module folder and all its contents, minus any .pyc and [for source-only extension distributions] .so files.
-- In particular, removing most DU involvment from build procedures would allow developers to use their own development/build systems much more easily.
- Installation and compilation should be separate procedures. Python already compiles .py files to .pyc on demand; is there any reason why .c/.so files couldn't be treated the same? Have a standard 'src' folder containing source files, and have Python's module mechanism look in/for that as part of its search operation when looking for a missing module; c.f. Python's automatic rebuilding of .pyc files from .py files when former isn't found. (Q. How would this folder's contents need to be represented to Python?)
- What else may setup.py scripts do apart from install modules (2) and build extensions (3)?
-- Most packages should not require a setup.py script to install. Users can, of course, employ their own generic shell script/executable to [e.g.] unzip downloaded packages and mv them to their site-packages folder.
-- Extensions distributed as source will presumably require some kind of setup script in 'src' folder. Would this need to be a dedicated Python script or would something like a standard makefile be sufficient?
-- Build operations should be handled by separate dedicated scripts when necessary. Most packages should only require a generic shell script/executable to zip up package folder and its entire contents (minus .pyc and, optionally, .so files).
- Remove metadata from setup.py and modules. All metadata should appear in a single location: meta.txt file included in every package folder. Use a single metadata scheme in simple structured nested machine-readable plaintext format (modified Trove); example:
------------------------------------------------------------------ Name roundup
Intended Audience End Users/Desktop Developers System Administrators
License OSI Approved Python Software Foundation License
Topic Communications Email Office/Business Software Development Bug Tracking
Dependencies etc... ------------------------------------------------------------------
- Improve version control. Junk current "operators" scheme (=, <, >, >=, <=) as both unnecessarily complex and inadequate (i.e. stating module X requires module Y (>= 1.0) is useless in practice as it's impossible to predict _future_ compatibility). Metadata should support 'Backwards Compatibility' (optional) value indicating earliest version of the module that current version is backwards-compatible with. Dependencies list should declare name and version of each required package (specifically, the version used as package was developed and released). Version control system can then use both values to determine compatibility. Example: if module X is at v1.0 and is backwards-compatible to v0.5, then if module Y lists module X v0.8 as a dependency then X 1.0 will be deemed acceptable, whereas if module Z lists X 0.4.5 as a dependency then X 1.0 will be deemed unacceptable and system should start looking for an older version of X.
- Make it easier to have multiple installed versions of a module. Ideally this would require including both name and version in each module name so that multiple modules may coexist in same site-packages folder. Note that this naming scheme would require alterations to Python's module import mechanism and would not be directly compatible with older Python versions (users could still use modules with older Pythons, but would need to strip version from module name when installing).
- Reject PEP 262 (installed packages database). Complex, fragile, duplication of information, single point of failure reminiscent of Windows Registry. Exploit the filesystem instead - any info a separate db system would provide should already be available from each module's metadata.