On Fri, May 14, 2004 at 03:16:31PM +0100, has wrote:
- Before adding new features/complexity, refactor current _design_ to
simplify it as much as possible. Philosophy here is much more hands-off than DU1; less is more; power and flexibility through simplicity: make others (filesystem, generic tools, etc.) do as much of the work as possible; don't create dependencies.
I'd translate this to "an implementation that effectively manages package metadata". If the metadata is consistently represented, translating to differing build, install and packaging modules becomes simpler and more straight forward. The idea of different setup.cfg sections for different bdist commands needs to die. Every bdist command should be able to process effectively off the same set of metadata.
-- e.g. c.f. Typical OS X application installation procedure (mount disk image and copy single application package to Applications folder; no special tools/actions required) versus typical Windows installation procedure (run InstallShield to put lots of bits into various locations, update Registry, etc.) or typical Unix installation procedure (build everything from source, then move into location). Avoiding overreliance on rigid semi-complex procedures will allow DU2 to scale down very well and provide more flexibility in how it scales up.
Distutils cannot dictate to any platform how packages are to be installed. It needs to go the other way. Distutils needs to meet the platforms' requirements for packaging.
I know perl-mongers (and some pythoneers as well) hold up CPAN as a shining example of how things should work, but I shoot any of my admins that use plain CPAN for installation. It doesn't register with the platform's native software manager and, therefore, completely destroys any attempt to manage system inventory information. No big deal on a system or two, maybe, but at 3:00 in the morning, with an hour outage to move an application from one system to another, suddenly discovering that you didn't prepare the destination host with 37 required CPAN (or Distutils) modules because there was no record of them ever having been installed really, really, sucks. IMHO, if Distutils leverages native packager bdist commands for as many platforms as possible, it will kick CPAN's behind in the enterprise enviroment.
I, for one, do not want to visit hundreds of machines to "python setup.py install", even if it downloads it for me. I don't need or want a full development environment on production machines. I _need_ a package once (per platform), install -> hundreds of boxes AND I need those hundreds of boxes to be able to tell me exactly what is installed on them for disaster recovery and application portability.
- Eliminate DU1's "Swiss Army" tendencies. Separate the build,
install and register procedures for higher cohesion and lower coupling. This will make it much easier to refactor design of each in turn.
I think DU1 should be viewed in a "lessons learned" context. What needs to be done is to document the internal APIs, extrapolate what those APIs should really be now that there's some real-life experience available, then refactor the APIs to be more representative of actual use.
- Every Python module should be distributed, managed and used as a
single folder containing ALL resources relating to that module: sub-modules, extensions, documentation (bundled, generated, etc.), tests, examples, etc. (Note: this can be done without affecting backwards-compatibility, which is important.) Similar idea to OS X's package scheme, where all resources for [e.g.] an application are bundled in a single folder, but less formal (no need to hide package contents from user).
Maybe for modules, but this is patently false for applications. Applications have configuration files and data that may be different for different instances or users of the application on the same box. Dumping all that into site-packages is impractical for both usability principles (separation of programs and data) and space considerations (how do I size /usr if /usr/lib/pythonx.x/site-packages/package is going to contain variable user data?). This begs the questions "Should Distutils handle modules and applications the same?" and "Does it make more sense to package modules one way and applications another?" I don't know.
- Question: is there any reason why modules should not be installable
via simple drag-n-drop (GUI) or mv (CLI)? A standard policy of "the package IS the module" (see above) would allow a good chunk of both existing and proposed DU "features" to be gotten rid of completely without any loss of "functionality", greatly simplifying both build and install procedures.
Yes, because that's not how all platforms manage their software inventory. Servers may not (most, in fact, _should_ not) have GUI capable resources installed on them. "mv" (or "cp" or any other command) is not a software management tool. I use Distutils because it helps be manage my software inventory. If all I wanted was to be able to make something run on a machine, I'd just build a tarball and drop it in where I needed it. The problem is that once I've done that, I don't have any record of the fact that I needed it on any particular machine and I have lost the ability to easily replicate a machine's environment on another machine, be it for failover or upgrades or for a test/QA environment.
--Replace current system where user must explicitly state what they want included with one where user need only state what they want excluded. Simpler and less error-prone; fits better with user expectations (meeting the most common requirement should require least amount of work, ideally none). Manifest system would no longer be needed (good riddance). Most distributions could be created simply by zipping/tar.gzipping the module folder and all its contents, minus any .pyc and [for source-only extension distributions] .so files.
This I agree with. I recently did a quick setup.py on something (can't remember what) and key .xml files where ignored. If it's in the source tree, it's probably needed for something. If it's not, then I should be able to exclude it, but I'd rather be sure everything necessary "gets there" by default. There are still a lot of python modules available that don't have setup.py included. It would be far easier for a 3rd party to submit Distutils enabling setup.py "patches" if they didn't have to do a complete code analysis to determine what's necessary and what's not.
-- In particular, removing most DU involvment from build procedures would allow developers to use their own development/build systems much more easily.
I may be on the fence on this one. Since mastering simple setup.py formats, I'm not sure I remember how to manually build a C extension anymore ;) Fro my perspective, DU1 has got what it takes to support a newbie's foray into building extensions and I'd hate to see that get lost. A cleaner API and documentation, however, would support better integration of Distutils _into_ alternative build systems.
- Installation and compilation should be separate procedures. Python
already compiles .py files to .pyc on demand; is there any reason why .c/.so files couldn't be treated the same? Have a standard 'src' folder containing source files, and have Python's module mechanism look in/for that as part of its search operation when looking for a missing module; c.f. Python's automatic rebuilding of .pyc files from .py files when former isn't found. (Q. How would this folder's contents need to be represented to Python?)
They already are. "python setup.py build" and "python setup.py install" can be done on separate machines (of the same architecture). My argument continues to be that "python setup.py install" is not a software mangement tool. "python setup.py bdist_whatever" produces the installation packages I need to support the "whatever" architecture with a build once, install anywhere and -- most importantly -- effectively manage the software configuration of N hosts of "whatever" architecture.
bdist_* is the true value of Distutils. Without the bdist commands, Distutils does nothing more effectively than `./configure && make && make install`.
- What else may setup.py scripts do apart from install modules (2)
and build extensions (3)?
Install and optionally upgrade configuration files. Build native packages including preinstall, preremove, postinstall, postremove and reconfiguration scripts. Dynamically relocate packages to the target host's python installation directory. These are all features of the bdist_command capability that I think you're missing.
-- Most packages should not require a setup.py script to install. Users can, of course, employ their own generic shell script/executable to [e.g.] unzip downloaded packages and mv them to their site-packages folder.
They can do that today. Where's the "value added?" Maybe setup.py could be replaced by something that scans a directory for __init__.py files and make some deductions, and maybe that's a good thing for simple packages. Personally, I'd like to see DU2 go further in supporting the simple production of multiple binary packages from a single source tree. Marc's egenix packages come to mind as a good example of the type of thing that it would be really nice to NOT have to sub-class a bunch of Distutils classes in a mega-setup.py script in order to produce a set of related, but not necessarily interdependent, packages.
-- Extensions distributed as source will presumably require some kind of setup script in 'src' folder. Would this need to be a dedicated Python script or would something like a standard makefile be sufficient?
My belief is that Distutils role is to reduce the need to distribute extenstions as source. One, good, Distutils configuration by a Python for Windows developer should simply repackage (absent win32 dependencies) for any supported platform simply by changing "python setup.py wininst" to "python setup.py myplatform". This drives back to my focus on getting the meta-data right and in a consistent format regardless of the original development platform or initially perceived target.
-- Build operations should be handled by separate dedicated scripts when necessary. Most packages should only require a generic shell script/executable to zip up package folder and its entire contents (minus .pyc and, optionally, .so files).
Again, this provides no software configuration management. What, then, do I gain by using Distutils at all?
- Remove metadata from setup.py and modules. All metadata should
appear in a single location: meta.txt file included in every package folder. Use a single metadata scheme in simple structured nested machine-readable plaintext format (modified Trove); example:
Isn't that what setup.cfg is? I agree that there shouldn't be redundancy between what can go in setup.py and what can go in setup.cfg.
There's a whole slew of meta-data required for native packagers that's required for software configuration management. Most (iirc) are addressed in various PEP's, although it would be good to revisit the fields and establish a matrix between Distutils meta-data fields and various native package manager fields to make sure all posibilities are covered.
- Improve version control. Junk current "operators" scheme (=,
<, >, >=, <=) as both unnecessarily complex and inadequate (i.e. stating module X requires module Y (>= 1.0) is useless in practice as it's impossible to predict _future_ compatibility). Metadata should support 'Backwards Compatibility' (optional) value indicating earliest version of the module that current version is backwards-compatible with. Dependencies list should declare name and version of each required package (specifically, the version used as package was developed and released). Version control system can then use both values to determine compatibility. Example: if module X is at v1.0 and is backwards-compatible to v0.5, then if module Y lists module X v0.8 as a dependency then X 1.0 will be deemed acceptable, whereas if module Z lists X 0.4.5 as a dependency then X 1.0 will be deemed unacceptable and system should start looking for an older version of X.
This is more appropriately addressed in the context of what/how native package managers support version control. A point implicit in this discussion, however, is that a package name registry is required. Unless package names are registered with some authority, you can have multiple packages of the same name which shoves a huge bone done the throat of dependency resolution.
- Make it easier to have multiple installed versions of a module.
Ideally this would require including both name and version in each module name so that multiple modules may coexist in same site-packages folder. Note that this naming scheme would require alterations to Python's module import mechanism and would not be directly compatible with older Python versions (users could still use modules with older Pythons, but would need to strip version from module name when installing).
This is conditional on support of the platform's native package manager. Some support multiple installations and some do not. Most can be dealt with in any case with package named fudging and intelligent install scripts.
I don't see a need for multiple instances in the same site-packages, however. Futzing with the import mechanism would be fixing something that ain't broke. Installing to an alternate path and optionally having postinstall scripts update site.py or requiring the user modify PYTHONPATH is adequate. I do this on HP-UX, which supports multiple installs, in different locations, of the same binary package, which allows users to install into their own target python library. When an alternate path is selected, the installer spits out all the necessary steps required to make use of the alternate path.
- Reject PEP 262 (installed packages database). Complex, fragile,
duplication of information, single point of failure reminiscent of Windows Registry. Exploit the filesystem instead - any info a separate db system would provide should already be available from each module's metadata.
I agree with rejecting 262 as well, but not in favor of the filesystem but in favor of the native platform tools via bdist support. Solaris people use pkgtools for everything. RH and friends use RPM. HP people use SD-UX. Debianites use dpkg. etc. etc. etc.... God help those of us supporting multiple platforms.
In each case, absent [expletive deleted] commercial package installs, all software and configuration management is consistent and, more importantly, effective. Anything on top of that; CPAN, Distutils, PEP 262, rogue admins with tarballs; _anything_ at all and people who have to deal with anything over a handfull of machines WILL eventually get caught with their pants down.
If Distutils does not support simple, native package manager integration, then it ceases to be a solution and becomes just one more problem. A successfull implementation that creates native packages gets immediate support from apt, yum, yast, urpmi, pkg-get, swinstall and anything else, now and in the future.
/me steps off the soapbox