[Deep breath, everyone; it's gonna get longer before it gets shorter...;]
Stefan Seefeld wrote:
Example: compilation of extension modules.
Scons is aiming at providing an abstraction layer for portable compilation. DU2 should at least allow to just delegate compilation of extension modules to scons. (and as I said previously, I think anything that doesn't allow to wrap traditional build systems based on 'make' and possibly the autotools is 'not good enough' in as a general solution).
I'm far from expert on build systems (interpreted language weenie), but do think it makes sense for DU to hand off compilation duties to a third-party as quickly as it can. That third-party might be a separate Python-based system, or makefile, or anything else; however DU shouldn't need to know this.
Thoughts on how one would separate extension compilation from the rest of the installation procedure...
Let's say we had a standard 'src' folder to contain everything needed to produce a module's .so file(s), and we treat that folder basically as a self-contained entity. How would we instruct it to compile and deliver those .so files? The party requesting the compile operation should not need to know anything about the compilating system used. Presumably the easiest way to decouple the two is to have a standard 'compile.py' within the 'src' folder that is executed whenever somebody wants .so files created. Whatever code that compile.py file then executes is its own business, and if it needs any information from the OS/Python installation then it's up to it to request that information itself; ideally through existing Python APIs if possible, or through a specific DU API if not.
Once that's done, it should be easy for developers to select their own build system from all those available to them. The Python-based build system that's currently incorporated into DU could, of course, be spun off as a peer to make, etc. - giving developers one more option to choose from without forcing it upon them.
BTW, once .so compilation is decoupled from installation, it should be possible/practical/easy? to defer .so compilation to import time (as is currently done for .pyc files).
- Every Python module should be distributed, managed and used as a
single folder containing ALL resources relating to that module: sub-modules, extensions, documentation (bundled, generated, etc.), tests, examples, etc. (Note: this can be done without affecting backwards-compatibility, which is important.) Similar idea to OS X's package scheme, where all resources for [e.g.] an application are bundled in a single folder, but less formal (no need to hide package contents from user).
are you really talking about 'package' here when you say 'module' ? I don't think that mandating modules to be self contained is a good idea. Often modules only 'make sense' in the context of the package that contains them. Also, are you talking about how to distribute packages, or about the layout of the installed files ? I don't think DU2 should mandate any particular layout for the target installation. It may well suggest layout of the files inside the (not yet installed) package.
(Like I say, rough notes; please keep pointing out where I'm not making sense.:)
Basically, what I'm proposing is that module developers stop distributing 'naked' Python modules and use package format only (even when there's only a single .py file involved). We then take all the other stuff that's traditionally been bundled alongside the module/package - documentation, unit tests, examples, etc. - and put those into the package folder too.
The term 'package' would basically become redundant; you could just describe everything as 'modules' and 'sub-modules'.
It's largely a philosophical shift from treating .py and .so files as separate from source files, documentation, unit tests, examples, etc. to treating _all_ of them equally: each being an integral component of the module/package as a whole.
It won't require any modifications to Python itself, since Python's import mechanism already supports the package format. For module developers, it's really just a logistical shift from being able to distribute 'bare' modules to always using package format. Module developers should be happy with this, given that it's much more accommodating towards documentation, unit tests, examples, etc. Stuff that they already need to put somewhere, and where better than as part of the module/package itself? And users should benefit too, as they'll always know where to look for documentation, etc.
DU will benefit too in that the distributions will become much simpler to create: in most cases the only thing the developer will have to do is zip the package folder before uploading it, something that won't even require DU to do. (That's what I'm hoping for, anyway. In practice there might be some reason that I'm unaware of why certain platforms would require all that extra shuffle that DU currently does when installing packages - creating folders, copying files, etc. I'm not a cross-platform expert. But I'd be kinda surprised if that were the case.)
[Sidenote: in an ideal world, a Python end-user should _never_ need to know whether FooLib exists in bare module or package form; the transition from operating in a file-based namespace to class-/object-based namespace would be seamless. Python's import statement is a bit flawed here; e.g. import foo.bar can be used when bar is a module/package within package foo, but not when it's a attribute in module foo.]
- Question: is there any reason why modules should not be
installable via simple drag-n-drop (GUI) or mv (CLI)? A standard policy of "the package IS the module" (see above) would allow a good chunk of both existing and proposed DU "features" to be gotten rid of completely without any loss of "functionality", greatly simplifying both build and install procedures.
Again, I don't think it is DU2's role to impose anything concerning the target layout. This is often platform dependent anyways.
Not quite sure if we're talking on same wavelength here. Let me try to clarify my previous point first, then maybe you can explain yours to me (feel free to phrase it in terms even an idiot like me can understand; I won't be offended;).
I'm talking of how a module/package gets put in a suitable Python directory (e.g. site-packages), which I'm assuming (unless proven otherwise) only requires that one knows which directory to put it in and moving the module/package to it. I'm also assuming that DU should not need to rearrange the contents of that package folder when installing it (except perhaps in special cases where it must install one of several platform-specific versions of a file, say; but that'll be the exception rather than the rule, and packages that don't require such special handling shouldn't need to go through the same in-depth procedures to install).
I can't immediately see anything that DU adds to this process of duplicating a package folder from A to B, apart from filling my Terminal window with lots of technical-looking stuff about how it's creating new directories in site-packages and copying files over to them. Which looks impressive, but I'm not convinced is really necessary given a single 'mv' command can shift the package directory and all its contents over just fine from what I can tell. And if 95-100% of modules can be installed with just a simple mv, then let's make that the default procedure for installing modules and squeeze DU out of that part of the process too.
--Replace current system where user must explicitly state what they want included with one where user need only state what they want excluded.
That depends on how much control users want over the process. I believe both are equally valid, and should be supported (similar in spirit to the MANIFEST.in syntax 'include' and 'exclude')
<ASIDE> Quick bit of background info so you know where I'm coming from...
I'm also big on the "There should be [preferably] only one way to do it" philosophy (one of the things that attracts me to Python). This is as much out of necessity as anything, mind: I'm absolutely awful at absorbing and retaining technical information, especially compared to 'real' programmers who seem to soak up knowledge like a sponge. e.g. I admire Perl for its "hell, let's totally go for broke and put in _everything_ we can possibly think of" approach and am glad there's somebody out there doing it cos then other languages can look at Perl to see what's worked and what hasn't and steal the best stuff for themselves. But it's not a language I can really use; my brain capacity is far too limited to accommodate more than a fraction of Perl's vast featureset and rules, so I much prefer to stick to tighter languages like Python where I can work at a decent clip without having to look up some 1000-page reference book at every other line.
Thus I tend to set the bar for feature inclusion pretty high; probably much higher than most other programmers who can happily cope with a bit of API flab without any problem. Don't take my feature-flaying tendencies as a religious thing. It's more a matter of simple survival: I can't keep up with y'all otherwise. ;) </ASIDE>
The problem I see is that manifests seem to be involved whether you need/want them or not. If [as I'm assuming] the majority of distributions are trivial to assemble, then manifests should be the exception, not the rule. I dunno how other folks work, but in my Home folder I have a PythonDev folder containing folders for each of my module projects - FooDev, BarDev, etc. Within each of these I have a folder named Distro, which contains all the files and folders that'll go into my distribution.
For me, manifests are nothing but a menace: this folder setup already makes clear what I want put into the distribution, and I can't see why I should have to explain it twice to the stupid machine. There's been several occasions where an error or omission in a manifest file has gone unnoticed until I've received an email from a user to say that the package they downloaded is missing some parts (embarrassing). Right now I manually unzip and check distributions before uploading, but this is kinda crazy; I shouldn't have to worry that DU might have screwed up a build, seeing as one of the reasons for automating the process should be to avoid making such mistakes.
Thus my conclusion: explicit inclusion is inherently unsafe; a single mistake or forgetting to update the manifest file to keep it in sync with changes to the package can easily result in a broken distribution.
A much more sensible default is to include everything by default, and leave it to the developer to exclude anything they don't want included. The worst accident likely to occur here with any regularity is that you forget to strip out a few .pyc files resulting in a distribution that's a few KB bigger than it really needs to be. Plus it adheres to the philosophy that the most common case should require the least amount of work: in this case, the majority of modules won't ever require a manifest file and can safely skip it.
[BTW, will check out the include/exclude feature which I wasn't previously aware of. Though my argument would be that I shouldn't need to know about such 'advanced' features just to produce a simple, reliable distribution: the process should be as simple as falling off a log to begin with.]
We can take this manifest issue quite a bit further, btw. Another big frustration with the things is they're quite brain-dead. All I want to say is "Package everything in Folder X for distribution except for .pyc and .so files". Thus a more pragmatic approach might be to do away with dumb manifest files completely, and leave the developer to optionally supply a 'build.py' script that will be automatically executed as part of the build process.
-- In particular, removing most DU involvment from build procedures would allow developers to use their own development/build systems much more easily.
yes !! Though that's more easily said than done: a minimum of collaboration between the two is required, at least the adherence to some conventions.
Of course (see earlier comments). Just how many... no, _few_ conventions would be needed?
- Installation and compilation should be separate procedures.
As a starting point, the whole 'build_ext' mechanism should be re-evaluated. The current 'Extension' mechanism is by far not abstract enough. Either the build_ext or the Extension class should be made polymorphic to wrap any external build system that could be used (make, scons, jam, ...)
Or invert and decouple the process to put the [e.g.] 'src/compile.py' script in control.In this case, I think we could greatly simplify the extension building process if DU can say to the 'src' folder: "Build me some .so files", then stand back and let it get on with it (while being happy to lend any support if/when it's asked for).
[i.e. It's a mental trick I often try when trying to resolve an API design: seeing if I can switch from a complex 'push' process to a simpler 'pull' process (or from a complex 'pull' process to a simpler 'push' process). It can make quite a difference.]
- What else may setup.py scripts do apart from install modules (2)
and build extensions (3)?
- building documentation (that, too, is highly domain specific. From Latex over Docbook to doxygen...)
Yup. So let's say we have a standard 'docs' folder within a package that may optionally contain a 'format.py' script that will be called as necessary.
- running unit tests
Have a standard 'tests' folder containing an optional 'test.py' script. (Hey, think I see a pattern evolving here...)
- Remove metadata from setup.py and modules.
I don't quite agree in general. What metadata are we talking about anyways ? There's metadata that is to be provided to the packager backends, i.e. a package description of some sort. Some of these can be generated automatically (such as MANIFEST.in -> MANIFEST, build / host platform, etc.), others have to be explicitely provided (maintainer address, package description).
I mean user-defined metadata (I'll assume those that generate metadata automatically for their own consumption can be left to handle that as best suits themselves):
1. A module may contain various bits of user-defined metadata, e.g. __version__, __author__. This info is almost certainly recorded elsewhere, so [afaik] shouldn't need to be duplicated here.
2. The setup.py script also contains module name, version, author, etc... potentially quite a lot of metadata, in fact. All mooshed together with code for building and installing packages. We should move this data out of there into a separate, dedicated metadata file that's included in each package.
Having 'all metadata' lumped together brings us back to the 'swiss army knife' syndrome.
Well, swiss armyness is always a concern. If it's really a problem here, we'd just need to have more than one metadata file. But I don't think it'll come to that.
Also, one great advantage of pulling metadata out of module and setup.py files is that it'll make it much easier for other clients to access it. Right now it's kinda locked away: the only folks who know how to access and use it are Python (module metadata) and DU (setup.py metadata).
- Improve version control. Junk current "operators" scheme (=,
<, >, >=, <=) as both unnecessarily complex and inadequate (i.e. stating module X requires module Y (>= 1.0) is useless in practice as it's impossible to predict _future_ compatibility). Metadata should support 'Backwards Compatibility' (optional) value indicating earliest version of the module that current version is backwards-compatible with. Dependencies list should declare name and version of each required package (specifically, the version used as package was developed and released).
Good idea, though this issue highly depends on the packager backend used.
Could you cite some examples to help me understand the issues involved?
- Make it easier to have multiple installed versions of a module.
That, too, isn't really an DU2 issue, or is it ?
Not really; more a general packaging and Python import issue. But I included it here as I think packaging issues have a big impact on DU policy.
- Reject PEP 262 (installed packages database). Complex, fragile,
duplication of information, single point of failure reminiscent of Windows Registry. Exploit the filesystem instead - any info a separate db system would provide should already be available from each module's metadata.
I don't quite agree. I couldn't live without rpm these days.
Well, it's not to say that users can't build their own databases listing all their installed gunk if they want to. Ensuring user freedom in such areas is crucial. Perhaps it would be clearer to say that the intention is sound (make information on installed modules easy to retrieve; something I'm all for), but the way 262 proposes to do it is not.
In fact, one of my main objections to 262 is that it could well restrict user freedom: by creating lots of dependencies and synchronisation issues, users could find themselves locked into using a single 'official' Package Manager because it's the only one smart enough to deal with all these complexities. Users who venture into their site-packages folder by any other means will quickly find themselves being punished the PackMan Police for unlawful infractions.
This should be one of the benefits that comes from decoupling module metadata from implementation as I've suggested above. There'll be no need for a central authority (262's DB) to maintain metadata, because each module already contains and looks after its own. And because there's only one metadata instance in existence for each module, there's no dependency/synchronisation issues to worry about. You can still provide users with exactly the same API that the 262 DB would have done for accessing this info, of course, so you still get all the functionality 262 would have provided, but without any of the headaches.
Funnily enough though, one of the possible DB implementations floated for 262 is to put the metadata for each module into a separate file on disk. So perhaps I should say that 262's idea of maintaining a _separate_ database simply isn't necessary: all the info it would have provided can already be retrieved from files in the filesystem; the only difference is that each file is bundled in package. The module/file system _is_ the database, if you like. (After all, what's a filesystem but a big ol' object database by any other name?;)
p.s. If you're interested, you can see a module system I designed a couple years back at applemods.sourceforge.net. It actually uses a version of the "module = package = distribution with all batteries included" concept I'm floating here. (Which I think was itself influenced by Python's package system.)
p.p.s. Anything folk can do to help me understand the issues involved in cross-platform and extension compilation lest I spout off too much about things I know not will be much appreciated, ta. :)