Re: [Distutils] Questions about distutils strategy

[Great analysis, Tim!]
I'm not sure what stuff by which Gordon you're referring to. I am only familiar with his installer, which I thought is win32 only (but I may be mistaken) and is an installer for a whole application, not just a bunch of modules. Please correct me if I'm wrong. But this reminds me of a different issue, which Jim Ahlstrom has been hammering about before: there's a completely separate set of cases where what you are distributing is a stand-alone application, and the target consists of end users who are entirely uninterested in whether it's written in Python, C or Elvish. (And then there's still the distinction between Win32, Unix or both.) The current distutil dools don't deal with this at all. I think it should though, and I think its framework is powerful enough to be able to add this, e.g. as a new "appdist" command. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido wrote:
It needed a name. I hate the word "Installer", but it expresses in one word the most common use of my stuff. I'll be releasing a beta for Linux real soon. Only some of the tricks are Windows only (such as self-extracting executables, which is only culturally appropriate on Windows, anyway). But more importantly it's not just for installing. The Python I use (interactively) on my wife's machine is 1 directory with about 6 files in it. On my Linux box I've been using the std lib in a .pyz for about a month now. Someone distributing a pure Python package could instead ship 3 files (imputil.py, archive.py and <package>.pyz) with the "install" consisting of adding one line to site.py in the user's perfectly normal Python installation. And yeah, I solved the "manifest" problem, too. Mine predates Distutils, so don't accuse me of duplicate effort, (I pointed them to it a couple times). It uses ConfigParser and a config file, so it allows finer control. While .pyz's are completely cross-platform, I have yet to work out endianness issues in the other archive I use (which should probably be zip format - it can hold anything). And at the "Installer" end, I have yet to work out how things should work on non-ELF/COFF platforms (where I can't append the archive to the executable). But there aren't any technical issues involved; just lack of time. So no, it's not just for Windows; and no, it's not just for creating standalones (though that's what almost everyone uses it for). - Gordon

Gordon, I'm sorry, but from this description I still have no idea what your stuff is (and I forgot the URL so I can't look it up). For example, if it's not (just) for installing, what *is* it for? What is the ``"manifest" problem'' and how did you solve it? Also, note that editing site.py is a no-no! You can create/edit sitecustomize.py, but you should leave site.py alone! --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido,
Gordon, I'm sorry, but from this description I still have no idea what your stuff is (and I forgot the URL so I can't look it up).
http://starship.python.org/crew/gmcm/installer.html The Linux stuff has a couple alpha testers and will probably get announced in a week or two.
For example, if it's not (just) for installing, what *is* it for?
At the bottom level, it's a bunch of tools using freeze's modulefinder, imputil.py and 2 kinds of archives. There's at least 2 layers above that, with "Installer" being the top. There's a clean separation between the layers, so you can break in wherever you like.
What is the ``"manifest" problem'' and how did you solve it?
The problem is specifying a set of resources, hopefully without having to list them explicitly. I solve this with a config file that lets you specify packages, directories, directory trees.. with filters that can work from paths, names, extensions, regular expressions...
Also, note that editing site.py is a no-no! You can create/edit sitecustomize.py, but you should leave site.py alone!
That would work fine. One of the standalone configurations will write a site.py, but that's for a completely self-contained installation (ie, one which will have no conflicts with another Python installation). I'd also note that, for Windows at least, the path-expanding mechanism created by site.py has not caught on. I've got lots installed, and no site-python, site-packages or sitecustomize. - Gordon

[me]
Also, note that editing site.py is a no-no! You can create/edit sitecustomize.py, but you should leave site.py alone!
[Gordon]
You shouldn't see site-python or site-packages, they only exist on Unix. On Windows, everything is installed in the top Python directory. However you should see .pth files there, which is what site.py looks for. I believe NumPy and PIL use those. --Guido van Rossum (home page: http://www.python.org/~guido/)

You mean "they only exist _for_ Unix", (site.py looks for them on Windows). I don't like that. For one thing, modulo a few platform differences, the same mechanism should work for multi-user Unix and Windows LAN installations. And single- user Windows (I know, redundant, even on NT) should be a degenerate case of the above.
No NumPy, no PIL, no .pth files. 99% of everything out there just says "unzip this somewhere on your Python path". In this case, Jim Ahlstrom may be right - there are too many options, or at least an insufficiently emphasized "proper" method. Until I worked out my own way of installing stuff, I used to lose a large number of packages whenever I upgraded my Windows Python. Much as I love Mark's stuff (and hesitate to criticize crazy Aussies), I wish there weren't so much special casing here for Windows. And no, I don't have any solutions to this, I'm just griping... - Gordon

[Gordon]
You mean "they only exist _for_ Unix", (site.py looks for them on Windows).
No it doesn't. The code in site.py only adds site-packages and site-python when os.sep is '/'. RTSL.
What do you mean by "the same mechanism should work"? The same mechanism for what? Are you talking about sharing the installed files somehow?
Fair enough. Of course I know about .pth files so I unzipped them elsewhere and added a .pth file pointing there...
The .pth files are designed for this. Maybe they haven't been explained as well as they should.
It's not Mark's fault, it's Microsoft's fault. If you don't do things the way MS wants you to, experienced Windows users will gripe, misunderstand what you do, etc.
And no, I don't have any solutions to this, I'm just griping...
Ditto. Understanding the problems is half of the solution though. The problems seem pretty complex! --Guido van Rossum (home page: http://www.python.org/~guido/)

[Guido]
No it doesn't. The code in site.py only adds site-packages and site-python when os.sep is '/'. RTSL.
Oops. Missed that.
In the above, "mechanism" basically meant that which creates sys.path. Basically, this came up for me because in standalone configurations (my Installer again), I have to take complete control of sys.path. After doing so differently on Windows and Linux, I finally realized that I can do it the same way on both. Which makes me question why they are so different.
The .pth files are designed for this. Maybe they haven't been explained as well as they should.
I'd say "badgered" or "browbeaten" instead of "explained" ;-).
Even MS doesn't do things the way MS says they want you to. I find MS users equally divided between those who scream bloody murder if you touch the registry, and those who scream if you don't. It's not like *nixen suffer from an excessive degree of conformity in preferred installation procedures, but somehow Python survives there...
Grumpily agreed ;-). - Gordon

I finally got around to reading the current Linux Journal (which just keeps getting better and better) and lo! there was a picture of a familiar face I just couldn't quite.... Oh no! Could it be true? I heard rumors but I refused to believe them until now. The glasses are gone! Guido now looks like an investment banker! The sky is falling! Next will probably be a Python 1.6 as a 27 Meg DLL, and a Python IPO. Well, maybe not. Now that I look more closely, he is wearing a black and white and mustard (??MUSTARD) T-shirt which says "You Need Python". At least we ought to make him wear a name tag at IPC8. JimA

James C. Ahlstrom writes:
I'm afraid this non-distinctive look was introduced at IPC7... it's too bad we can't tell people Python was invented by the guy with the glasses anymore.
It's really the blue & white & orange IPC7 shirt. -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives

"JCA" == James C Ahlstrom <jim@interet.com> writes:
JCA> Oh no! Could it be true? I heard rumors but I refused to JCA> believe them until now. The glasses are gone! Guido now JCA> looks like an investment banker! The sky is falling! He's not the only one who's, like, "gone corporate", but I won't mention any names, so as to protect the guilty.

"Barry A. Warsaw" wrote:
He's not the only one who's, like, "gone corporate", but I won't mention any names, so as to protect the guilty.
OK, Buzz. Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats.

[Gordon]
[Guido]
Something just occurred to me: MS's guidelines aren't arbitrary, they actually have very good reasons. In the case of putting all an app's crucial info in the Registry, it's the only way to allow a site administrator to set policy and site options remotely (an admin can fiddle other machines' registries remotely). This works very well indeed when there's only "one copy" of an app on a machine (or at most one copy "per user"). What just occurred to me is that JimA is concerned with *not* letting any info from a previously-installed Python affect the app he's installing. Similarly, Gordon's Win32 "standalone installer" modifies python.exe and pythonw.exe to use a PYTHONPATH he forces, leaving the registry out of it. Similarly, the woes I've had in trying to sell Python as a general Win32 scripting tool at work mostly boil down to that there's no effortless way to do it that doesn't risk picking up info from-- or forcing info onto --pre-existing or future distinct Python installations (in contrast, Perl "just works" in this respect). IOW, the three of us find getting path info out of the registry intolerable because we are in fact trying to do the opposite of what the registry mechanism was *designed* for: we want perfect isolation, not perfect sharing. This has come up on Python-Help a few times too, in the guise of someone installing a product that in turn installs an older version of Python, which in turn confuses another product that relies on features in a newer version of Python. So while the traditional Windows .ini file (like Unix this-or-that.rc file) model was replaced by the registry for excellent reasons, those reasons don't apply to the way we're using Python! The .ini file model was exactly right for what most of us seem to want to do, and the registry model is exactly wrong. just-thought-i'd-cheer-you-up<wink>-ly y'rs - tim

Tim> So while the traditional Windows .ini file (like Unix Tim> this-or-that.rc file) model was replaced by the registry for Tim> excellent reasons, those reasons don't apply to the way we're using Tim> Python! The .ini file model was exactly right for what most of us Tim> seem to want to do, and the registry model is exactly wrong. Alright! Now I understand what all the hubbub is about! My eyes have mostly been glazing over trying to follow all this Windows registry/path/ini stuff. MS believes that Python is the application. Those of us writing Python programs view those programs as the applications, not the Python interpreter per se. Is there some way that people writing applications in Python can set up registry entries that are specific to their application (e.g. tabnanny.py) instead of only specific to the Python interpreter? Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/ 847-971-7098 | Python: Programming the way Guido indented...

Skip Montanaro wrote:
I think this is a good point. Windows app programmers (mostly) view Python as part of their app and try it install it in their app directory. Unix installs Python as a system app in multiple versions and users use PATH to pick a version. Unix users view the Python interpreter as a system service which is needed for running their app. I think this is because a Windows app is a visual program, and the Python release compiles to a console app (not really a visual program). So all (?most) Windows Python apps are custom mains with Python as a component, but the stock python.exe is not the main. This makes it difficult to document a way to install Python in the Unix fashion, since all apps need their own binary main and python15.dll is the only thing in common. IMHO archive files can solve this a lot more simply. JimA

[Skip Montanaro]
Eww -- that's a helpful and insightful way to put it, Skip! Now maybe *I* can understand what the hubbub is about <wink>.
Yes, but they can't get Python to look at those before it's too late. I spent a whole evening a month or two ago just trying to figure out where all the cruft in my Windows sys.path *came* from. This is out-of-the-box; I haven't added anything myself: ['', 'D:\\Python\\win32', 'D:\\Python\\win32\\lib', 'D:\\Python', 'D:\\Python\\Pythonwin', 'D:\\Python\\Lib\\plat-win', 'D:\\Python\\Lib', 'D:\\Python\\DLLs', 'D:\\Python\\Lib\\lib-tk', 'D:\\PYTHON\\DLLs', 'D:\\PYTHON\\lib', 'D:\\PYTHON\\lib\\plat-win', 'D:\\PYTHON\\lib\\lib-tk', 'D:\\PYTHON'] That's bizarre on the face of it, and tracking it all down was draining. I've forgotten the details. I do remember concluding that it was impossible to do what I wanted to do without changing the implementation, though, and nobody on Python-Dev disputed that at the time. In a pragmatic crunch, I wrote the little app I needed to distribute at the time in Perl instead, meaning to come back to this. I haven't had time. IIRC, the ultimate problem wasn't really that Python looked at the registry to get *some* path info, it was a combination of A) It looked at the registry so early that it was impossible to stop it from executing whatever site.py the registry pointed at (well, I could with the -S option -- but then there was no way to get it to do the site.py that was *wanted* instead). B) No way to override what was in the registry; e.g., I was greatly surprised to discover that setting a PYTHONPATH envar didn't override anything, it simply plunked the PYTHONPATH entries into sys.path along with everything else -- and too late to stop anything anyway. In a long msg I haven't yet read all the way thru, Guido at least suggested associating different registry path info with different Python versions. That would address a number of otherwise currently intractable problems. I suspect it still wouldn't help with the problem I was facing, though. That is, I wanted to be able to tell people to run \\dragres01\mrec\reduce\python \\dragres01\mrec\reduce\reduce.py which is just a Windows way of saying "run a Python executable from a shared network location". When they tried that, though, the network Python looked in *their* individual registries for its Python path info, and some of the hackers with mondo customized Python setups on their own machines watched things go down in flames. This certainly can't be a common problem, but it speaks to an unforgiving rigidity in the current approach. There seemed to be nothing I could do to guarantee this would work, short of telling users to edit their registries before running this tool (that's a non-starter on Windows -- editing the registry is dangerous) or putting a customized Python on the network pointing to a bogus registry key (it was faster to write the app in Perl! Perl doesn't *try* to be so infernally helpful <wink>, so doesn't get in the way either). I'm left wondering what purpose putting Python library path info into the Windows registry serves. Is there anyone on Windows who *doesn't* have their Python Lib/ etc as direct subdirectories of the directory containing python.exe? Not that I've seen. Python puts *those* in sys.path too -- but only after it (in the normal case; see my sys.path above) pulls identically redundant paths out of the registry first, or (in the cases we're griping about) pulls irrelevant or downright harmful paths out of the registry first (paths appropriate to the last Python you *installed*, not to the Python that's *running*!). Perhaps all this cruft is needed to support embedded Python, though (something I've never done). Regardless, I expect it would have been enough for me if PYTHONPATH simply worked the way I mistakenly assumed it would (that is, this is sys.path, and that's *it*; feel free to prepend the current directory when initialization is complete, but before then looking at any file not reached from PYTHONPATH is verboten). the-cleverer-the-code-the-more-vital-that-there-be-a-way-to- short-circuit-it-ly y'rs - tim

Tim Peters wrote:
Excellent discussion Tim!
I think a sensible way to run little apps is to put everything in an archive file including the main.py. On Windows you concattenate that to python.exe, and it Just Works.
Point on the curve. We don't. We freeze everything except the main.py. JimA

[Guido]
And actually, the business about separate subtrees for the machine's configuration and the user's configuration is pretty clever. MS doesn't explain it well, and it gets misused, but when done right, it's a lot simpler than the maze of .xxxrc files you sometimes find in other OSes.
In my Linux version, I went to the heart of the matter - getpath.c. It occurs to me that getpath.c might do better to follow a normal bootstrap process - ie, create the absolute minimal sys.path required to go to the next step. Then the rest of what goes on in getpath.c could be written in Python. Maybe that Python code needs to get frozen in (to prevent bozos from destroying an installation by stepping on getpath.py), but it would make it a lot easier to create independent installations, and also reduce the variations between platforms at the C level. (Then again, I've never heard of anyone stepping on exceptions.py.) If some registry manipulation primitives were exposed (say, through ntpath) that would mean that Windows developers could (if they wanted) play by the MS rules with at least the option of not stepping on each other. - Gordon

I agree. And I am guilty of not even try to find MS' explanation -- I just looked in the registry at what other apps did and tried to mimic that (plus what Mark had already done), without really knowing what I was doing. I now know a little better -- see the end of this message.
Yes, this is exactly what was proposed in the thread on the Big Import Rewrite.
That's a good idea. These functions are already available through Mark's win32api extension -- much of which will eventually (I hope before 1.6 is out!) become part of the core distribution. In the mean time, I've been thinking a bit more about how Python should be using the Windows registry. (It's clear to me that Python should use the registry -- those who disagree can go build their own Python distribution.) The basic ideas of Python's current registry usage are sound: there's a resource built into the DLL which is part of the key into the registry used for all information. The problem lies in which key is used. All versions of Python 1.5.x (1.5, 1.5.1, 1.5.2) use the same key! This is a main cause of trouble, because it means that different versions cannot peacefully live together even if the user installs them into different directories -- they will all use the registry keys of the last version installed. This, in turn, means that someone who writes a Python application that has a dependency on a particular Python version (and which application worth distributing doesn't :-) cannot trust that if a Python installation is present, it is the right one. But they also cannot simply bundle the standard installer for the correct Python version with their program, because its installation would overwrite an existing Python application, thus breaking some *other* Python apps that the user might already have installed. (There's a solution for app builders who are willing to do a lot of work -- you can change the registry key resource in the DLL. For example, Alice comes with its own version of Python 1.5.1 and it uses "1.5.1-alice" as its registry key. The Alice installer installs Python in a subdirectory of the Alice installation directory and points the 1.5.1-alice registry entries there. The problem is that this is a lot of work for the average app builder.) I thought a bit about how VB solves this. I think that when you wrap up a VB app in, all the support code (mostly a big DLL) is wrapped with it. When the user runs the installer, the DLL is installed (probably in the WINDOWS directory). If a user installs several VB apps built with the same VB version, they all attempt to install the exact same DLL; of course the installers notice this and optimize it away, keeping a reference count. (Ignoring for now the fact that those reference counts don't always work!) If an app builty with a different VB version is installed, it has a DLL with a different name, and that is installed separately. Other support files, I presume, are dealt with in much the same way. Voila, there's the theory. How can we do something similar for Python? A app written in Python should need to install only three or four files: - a driver EXE to start the app - a copy of the Python DLL - the Python library in an archive - the app code in an archive The latter two could be combined into a single archive, but I propose that we use two archives so that the DLL and the Python library archive can be shared between installations of independent Python apps as long as they use the exact same Python version and don't need additional 3rd party packages. (I believe that Jim A's proposal combines the archives with the EXE and the DLL, reducing the number of files to two. That's fine too.) Is there a use for the registry here at all? Maybe not. (I notice that VB seems to have a single registry entry, pointing to a DLL; all other VB files also seem to live there.) Complications: - Some apps may need a custom extension module, which has to be installed as a PYD file. So it seems that there needs to be a directory per app, and perhaps per version of the app (if the app distributor cares). - Some apps need other, non-pyc files (e.g. data tables or help files); it would be handy if these could be stored in the archives as well. - Some standard extension modules are in their own PYD files; these also need to be installed. They aren't typically marked with a version, so perhaps a path directory per version of Python (if not per installed app) is wise. - How to distribute an app that needs 3rd party stuff, e.g. Tcl/Tk, or PIL, or NumPy? Their Python code can easily be wrapped up in another archive with a standard name incorporating a version number; but the required PYD and DLL files are a separate story. (E.g. for Tkinter, you need _tkinter.pyd which links against tcl80.dll.) Basically the same solution as for standard PYD files can work; the needed DLL files can be installed either systemwide (if they have a reliable version number in their name, like tcl80.dll) or in the per-app or per-package directory (like NumPy). - Presumably, the archives will contain PYC files only. This means that tracebacks will not show source code, only line numbers. For Jim A, this is probably exactly what he wants (if the user gets a traceback, his "robust app" has miserably failed, and he takes it in pride that this doesn't happen). But for some others, access to the sources could be essential. For example, I might want to distribute IDLE using this mechanism; users of IDLE who are curious about the standard library (or about IDLE itself) should be able to open the source for an arbitrary module (and maybe even edit it, although that's not a priority and perhaps should even be discouraged). Library source access is an important feature of the IDLE debugger as well. A way out for IDLE is to install a classic distribution of the Python library sources, into the filesystem at an IDLE specific location. Other apps, with only the need for source code in tracebacks, might choose to to have the PY files in the archives sitting next to the PYC files, and somehow the traceback mechanism should be accessing the archive to get a hold of the source. And yes, I realize that Jim A's latest offering solves most of these problems to a large extent -- well done. (Jim, would you care to comment on the issues that you don't address? Will you address them in a future version?) Final notes: There are two different problems here. One is how to distribute Python apps robustly to end users who don't particular care about Python. This is Jim A's problem (and he has a solution that works for him). In general the solutions here try to isolate the installed app from other Python installations. I'm proposing that at least the DLL and the Python library archive can probably be shared between apps without reducing robustness if we keep track more carefully of version numbers. The other problem is how to distribute packages of Python and extension modules for use by Python users. These typically need to drop into some existing Python installation. This is Paul Dubois' problem with NumPy (amongst others) and is the current focus of the distutil SIG. However I believe that there could be a lot of common infrastructure that would help us create better solutions for both problems. For package distribution, common infrastructure (a.k.a. standards) is essential. For app distribution, common infrastructure isn't so important (since the solutions strive for total isolation, there's no problem if different apps use solutions). However, this changes when app creators want to distribute robust self-sufficient apps that use 3rd party packages -- then the 3rd party packages must allow being packaged up using the app distribution creator of choice. Solving this compound problem (creating package distributions that can be redistributed easily as part of robust Python app distributions) should be an important goal for the infrastructure we're building here. The Big Import Rewrite ought to add this to its list of objectives if it isn't already on it. My guess is that the solution for this compound problem will increase the dependency of app distribution tools on the package distribution infrastructure; which to me seems like a Good Thing because it would lead to more code sharing. --Guido van Rossum (home page: http://www.python.org/~guido/)

Briefly backtracking to an old thread: [Guido]
Right, that's one class of intractable problem under Windows. *Inside* my workplace, another kind of problem is caused when people try to make a Python app available over the Windows network. They stick the Python they want and its libraries out on the network, with python.exe in the same directory as the app. Now some people have highly customized Python setups, and the network Python picks up "the wrong" site.py etc. That sucks, and there appears no sane way to stop it. Telling internal app distributors they need to invent a unique registry key and fiddle their python.exe's resources is a non-starter. Ditto telling people with highly customized Pythons "don't do that". Ditto telling anyone they have to run any sort of installation script just to use a network app (sometimes they don't even know they're running it! e.g., when it's a subsystem invoked by another app). So while everyone is thinking about the hardest possible scenarios, please give a thought to the dirt simple one too <0.5 wink>: an app distributor who knows exactly what they're doing, and for whom *any* magical inference is simply a barrier to overcome. The latter can be satisfied by any number of means, from an envar that says "please don't try to be helpful, *this* is the directory you look in, and if you don't find stuff there give up" to a cmdline switch that says the same. Nothing Windows-specific there -- any OS with an envar or a cmdline will play along <wink>.
This is the way most *MS* DLLs work; stuff like the C runtime libraries and MS database drivers work exactly the same way. It's rare for pkgs other than MS's to attempt to use this mechanism, though (the reason is given below).
(Ignoring for now the fact that those reference counts don't always work!)
? They work very well, in my experience. Where they fail is when installers & uninstallers break the rules. MS publishes the list of MS DLLs that are to be treated this way: an installer "must" use refcounting on the DLLs in the list. Alas, some (especially older) installation pkgs don't. Then the refcounts get screwed up. That's what makes the mechanism brittle: "the system" doesn't enforce it, it relies on universal & intelligent cooperation. It's very likely that someone distributing a Python app will neglect (out of ignorance) to bump the refcount on their Python components, so the refcount will be artificially low, and a later uninstall of some unrelated pkg that *did* follow the rules will merrily delete Python. Gordon and I will repeat this until it sinks in <wink>: almost everyone with a successful Windows product ships the non-MS DLLs they rely on and copies them into their own app directory. It's simple and it works; alternatives are complicated and don't work. Many even ship & copy MS DLLs (e.g., Scriptics copies its own msvcrt.dll (the MS C runtime) into Tcl's directories). Worrying about space consumed by redundant Python components is a bad case of premature optimization <0.3 wink>.
... How can we do something similar for Python?
Seriously, short of getting MS to distribute Python and put the Python DLLs on The List of refcounted resources, we should pursue this line reluctantly if at all. MS may have a better scheme in the future, but for now better safe than sorry. a-couple-mb-on-a-modern-pc-isn't-worth-the-time-it-took- to-read-this<wink>-ly y'rs - tim

Tim Peters wrote:
The registry is still a bad idea because it lumps critical and app data into single files and brings up the ugly problem of protecting individual registry entries instead of just files. Microsoft should have put all app config into the app directory and provided for remote admin of that. But that is not really your point (just ranting about the registry again).
Or, in other words, no isolation is possible if critical info depends on global data like PYTHONPATH or a _common_ registry entry. We could have different registry entries, but this is confusing and not documented. I think we can solve this with archive files in a way compatible with Unix without going off on a Windows-only wavelength. If the archive file contains everything, and it is in the dir of the app, and the app looks there and finds it, then it Just Works. See also my reply to Skip. JimA

Eh? Doesn't work for me. This does: http://starship.python.net/crew/gmcm/distribute.html

[Guido]
[Great analysis, Tim!]
I beg to differ: it's internally inconsistent and should have identified at least 3 axes and hence at least 8 cases. Still, you got more than you paid for <wink>.
I'm not sure what stuff by which Gordon you're referring to.
You guessed right!
If it can install a whole app, what makes you suspect it couldn't install just a bunch of modules <0.5 wink>? It started life as Windows-only, and I believe it's been virtually ignored by non-Windows folk because of that. Bad blind spot. It supplies already-working approaches to many of the issues that are still being *talked* about on Distutils (at least archive formats, code to manipulate same, manifest files (how do you tell the tool which files to package?), and transparently bundling a Python interpreter when needed).
I include part of that in my case #4 above, where the app happens to be written in Pure Python -- but the user doesn't have to know that. Gordon is addressing at least that part of it. AFAIK he can't deal with transparently compiling C or exorcising Elvish on the target platform, but if you're just distributing the binaries I expect his work is directly usable already.
(And then there's still the distinction between Win32, Unix or both.)
I vote "both". The world really doesn't need another Win32-only (or Unix-only) installer, archive format, compression format, or distribution model. Jim seems mostly interested in Win32-only to me, and his concerns haven't been about the mechanics of distribution but about how-- regardless of tool --to create a bulletproof Python installation by hook or by crook. Last time we went thru this, it was concluded that one couldn't without patching the Python Windows binary with a resource editor (to point to its own infernal <0.5 wink> registry entries). Distutils hasn't talked about that at all (that I've seen, anyway); if there were a less radical approach to that, I suspect Jim would be delighted to use one of the commercial Win32 installation pkgs (and if that's what his customers expect, delighted or not that's what he'll do).
The current distutil dools don't deal with this at all.
That's why I said I thought what Gordon is doing seems more appropriate to case #4 than what Distutils has been doing.
I think it should though,
Ditto.
and I think its framework is powerful enough to be able to add this, e.g. as a new "appdist" command.
I cordially invite (since Gordon will uncordially browbeat <wink>) people to look seriously at what he's done. Best I can tell, for apps that don't need compilation "on the other end", it's mostly "there" already! give-the-man-a-hand-ly y'rs - tim

Tim Peters wrote:
Not exactly. I am interested in how to create a bullet-proof installation. But I am equally interested in Unix (especially Linux) and dislike the current dichotomy in the code base. Lately I have been more active in distribution via archive files. Part of the solution is an archive file format which is identical on Unix and Windows, and which can hold the Python library and packages as single files. For my own efforts on this see: ftp://ftp.interet.com/pub/pylib.html This is an archive file format similar to Gordon's format, although Gordon's work goes well beyond just file formats. I currently have fifth generation code for this format, and am adding features as suggested by Fredrik Lundt. I hope it gets considered as a candidate for a Python standard format.
Distutils hasn't talked about that at all (that I've seen, anyway);
Gordon, Greg Stein and I have discussed file formats before. I think it was on distutils. Anyway that was months ago. JimA

"James C. Ahlstrom" wrote: [...]
ftp://ftp.interet.com/pub/pylib.html
Ouch - what's wrong with zip archives? There are utilities to convert to/from zip, to re-pack, to mount zip transparently so it's entries look like regular files, FTP servers, etc. Both Java (jar) and Tcl (Jan Nijtman's "Wrap") have adopted this format. Zips would seem natural with JPython. And suppose that scripting ever starts to consolidate to a common scripting kernel (yah, well), do you really want a system which is closing all doors to cross-fertilization? Zip has an advantage over .tar.gz in that its table of contents is available without having to decompress the whole kaboodle. Your format has no checksum, which for deployment and long-term storage can be important. If you want a marshalled TOC, then why not add a manifest entry for it, sort of like what ranlib does with ar? You designed the format so archives can be concatenated without any tool (other than "cat"), but this works just as well with zip files, as the Tcl Wrap approach demonstrates. Allow me to very, very loosely paraphrase Guido here: sure, everyone can design an archive format, but they are likely to make the same mistakes all over again - so why not adopt a format which is tried and tested? With all due respect - I sincerely hope you will reconsider and alter your code to work with zip files. It's probably a small adjustment? Unless your *intent* is to create a diverging standard, of course... -- Jean-Claude

Jean-Claude Wippler replied:
Exactly my sentiments. We have rough Python code to deal with zip files; it's very rough because we got kind of carried away adding features and ended up with spaghetti code :-( But it's working code nevertheless and we're offering it up for anyone in this group to clean up (we could do that ourselves but it's not high on our current priority list). I don't know anything about Tcl Wrap. I do know a great deal about the ZIP format, but apparently I missed the concatenation feature. How does this work? Does that work for all zip tools, or just for the ZIP reader in Wrap? (I looked up how Jim A does it -- his central directory at the end of the file contains the total size of the data covered by that directory, so he seeks back to the beginning of it and sees if another magic number precedes it; and so on. Very simple.) I quickly looked at the Wrap page; it shows how to access data files stored in the archive. Question: does the wrap::open code go out to the regular filesystem if it finds there's no wrap archive? That would be handy so you can test the code in its unwrapped form without change. Python needs this too. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
[... my not-really-meant-as-rant about adopting zip as format ...]
[zip concatenation feature]
Same for Wrap. Standard tools would not see the preceding ZIP groups. In terms of maintenance, I'd avoid this trick. I merely wanted to point out that zip archives can be stacked, if the reader is set up to it.
IIRC, Wrap overrides "open" for embedded entries as "file.zip/abc.py". There's more being developed in this area: a "virtual file system" which lets you mount archives and such (VFS by Matt Newman, mentioned with his permission), so that the file-system model can be extended to navigate into a lot more things than real file systems. Andrew Kuchling's post hints at another tangent: opendir/readdir is of course simply an enumeration. There's a lot of "genericity" lurking in scanning across file systems, trees, networks, and resources in general. <minirant> The filesystem <-> OO dichotomy needs a review. </minirant>
Python needs this too.
<voice location=in-the-desert level=timid> Concepts like these have a lot to offer - and would make even more sense if they were done in a way which benefits multiple scripting languages. Feel free to reply by email if you ever want to further discuss this. </voice> -- Jean-Claude

"JW" == Jean-Claude Wippler <jcw@equi4.com> writes:
JW> Same for Wrap. Standard tools would not see the preceding ZIP JW> groups. JW> In terms of maintenance, I'd avoid this trick. I merely JW> wanted to point out that zip archives can be stacked, if the JW> reader is set up to it. I agree. I can't recall the details now, but I had a lot of problems with zip concatenation in JPython. I think at least some of the older Java tools for groking zips don't work with contatenation. -Barry

The Java "jar" tool mostly ignores the central directory -- it seems to read the archive from the front, using the local header records, and ignoring the central directory (of course it writes one when it creates an archive). --Guido van Rossum (home page: http://www.python.org/~guido/)

Jean-Claude Wippler:
I agree. We have experimented with this a bunch in the Knowbot sofware, where we have some code that wants to look at a "filesystem" but could be talking to some kind of filesystem emulation across an RPC connection or alternatively could be accessing a zip file. Our conclusion is that a convenient interface is modeled after (a subset of) the os and os.path functionality. In fact, the only thing you would need to add to the os module would be a function to open a file object; I've proposed to add os.fopen() as an alias for the built-in open(). The idea that you could mount one VFS inside another is nice, although I'm not sure how practical it is. For one thing, in our fs code, os.path.sep and friends (e.g. os.path.normcase behavior) were set per filesystem; what would happen if you mounted a Unix filesystem in an NT tree? Doing the translations is hard too; e.g. on a Mac fs, the separator is ':' and a '/' can be part of a filename -- do you simply swap them? What if a Mac file has both '/' and '\' and you mount it on a Windows FS? I'd rather stay away from this. On the other hand the VFS concept could be used as a totally different solution to the sys.importers vs. sys.path
I'd still rather see listdir() (which our sample virtual FS API supported). I don't think it necessarily makes sense to do this on a more generic basis -- other trees and graphs have sufficiently different semantics that using a FS like API doesn't necessarily cut it. Take for example the Windows registry -- looks a lot like a filesystem, doesn't it? Yet it has one fundamental property that a typical FS doesn't: directory nodes can have data *and* children... I've written a tree widget and found that it's remarkably hard to come up with a workable API to talk to trees *in general*. Trees are a universal concept, but code sharing is still elusive... Perhaps because the concept is so simple?
<minirant> The filesystem <-> OO dichotomy needs a review. </minirant>
I think that my proposal above should cover this. (We looked briefly at doing a similar thing for Java, and found that it's actually harder there -- they have all these nice objects representing paths, but it's not easily subclassable to represent paths in some virtual filesystem.)
I see only very hope for this point of view, but I will refrain to comment more. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum writes:
os.path.sep and friends (e.g. os.path.normcase behavior) were set per
Hah! Caught you in public! "sep" & friends are defined in the os module; this is where the separation breaks down. I think these should be located in os.path, and os can just pick them up from there to be backward compatible. os.pathsep is a problem, somewhat; it is related to os.sep, but is very different in many ways. I don't think there's a good way to deal with it.
And this is tightly related to the sep/pathsep problem as well. I agree, we should stay away from it.
But it was easy to create a set of interfaces with a reasonable API; getting back to the "typical" Java classes was what really changed the most. For those of us not working on the KOE: I set up Filesystem and FSFile interfaces; the Filesystem represented the entire filesystem and the FSFile was very similar to the java.io.File class, but had additional methods to get input and output stream objects (of the standard Java flavor); all the buffering and such could be wrapped on top of that just like any other Java I/O. The specific application was to provide access to an isolated directory structure which untrusted code "owned", but ensured that parent directories were unreachable. Additional security checks can be worked into such a structure as applicable. -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives

Guido van Rossum wrote:
[... horrors of cross-OS mounts and ":\/" separators ...] I agree, this has some very hairy sides to it. But VFS is really more about mounting non-FS things in a "root" FS (presumably the real one).
On the other hand the VFS concept could be used as a totally different solution to the sys.importers vs. sys.path
Heck, I'll be the "enfant terrible" once more: yes, and this stuff could well be implemented generically across scripting languages. Of course the act of "importing" is a very Pythonic issue - but FS/VFS traversal and the actual shared library load need not be. Anyway, enough of that.
What you're saying is that dir = set-of-subdirs + set-of-files, and that this is a more general requirement than plain FS's. Doesn't that simply mean that the more general model is needed as basis to handle both?
Trees are a universal concept, but code sharing is still elusive...
Ah, but think of the implications: archives, networks, XML, the world! -- Jean-Claude

Jean-Claude Wippler wrote:
Ouch - what's wrong with zip archives?
Thanks very much for looking over the format. In general Zip archives store whole branches of a file system. A Python ./Lib zip archive would contain: N:/python/Python-1.5.2/Lib/string.pyc N:/python/Python-1.5.2/Lib/os.pyc N:/python/Python-1.5.2/Lib/copy.pyc N:/python/Python-1.5.2/Lib/test/testall.pyc Zip archives are isomorphic to branches of a file system. That means there must be a sys.path for each zip archive file. How would this be specified? The archive format stores modules as dotted names, just as they appear in the import statement. The search path is "." in every archive file by definition. The import statement "import foo" just results in a dictionary lookup for key "foo", not a search through a zip directory along a local search path for "foo.something" where "something" can be pyc, pyo, py, etc. The intent was to link the archives to the import statement, not re-create a directory tree. It borrowed this feature from the archive formats of Greg and Gordon.
There are utilities to convert to/from zip, to re-pack, to mount zip transparently so it's entries look like regular files, FTP servers, etc.
Basic operations (to, from, repack) are easy in Python.
Both Java (jar) and Tcl (Jan Nijtman's "Wrap") have adopted this format.
Hmmm....
Your format has no checksum, which for deployment and long-term storage can be important.
Actually the pylib.py "dir()" method reads all *.pyc with marshal, and I am depending on marshal to object to bad data and also out-of-date magic numbers. But this is a good point.
If you want a marshalled TOC, then why not add a manifest entry for it, sort of like what ranlib does with ar?
Sorry, I don't understand. Please explain.
Are you saying that cat zip1.zip zip2.zip > myzip.zip works? An important feature is the ability to concatenate to a binary: cat python.exe zip1.zip > myapp.exe Searching for this isn't fast unless magic numbers are at the end. Are zip files recognizable from the end (I don't know)?
The intent is to create a standard but not a diverging standard. Are there any zip experts out there? Can zip files satisfy all the design requirements I listed in pylib.html? Is there zip code available? All my code is in Python. JimA

James C. Ahlstrom wrote:
Jean-Claude Wippler wrote:
Ouch - what's wrong with zip archives?
In general Zip archives store whole branches of a file system.
As I've stated before, I have 2 archive formats. This may seem a needless complication, but my suspicion is that sooner or later, people will want 2 different kinds. One is a .pyz format, which corresponds closely to Jim's .pyl format (with a number of minor differences: it's compressed, the archive as a whole has the Python magic number, instead of each entry, and it's not designed for concatenation). The other is like a zip, and probably should be zip format. It's designed to hold _anything_, and can be manipulated from C and from Python. It can be concatenated and / or embedded (and the innner one opened without extraction). It's table of contents is more file-system like. Importing from one is slower, but that's not really what it's for. It's for packaging up arbitrary resources. Like .pyz's, or Tcl/Tk for Tkinter apps, or configuration files. Jim is correct that a good importer (which can say "No, it's not mine" as quickly as possible) is better satisfied by a simple dictionary lookup than fooling with file extensions and directories (virtual or real).
The table of contents is just another entry.
Where do you think we got this idea?
Hmm. My bookmark appears to be dead (I was there not long ago): http://www.cubic.org/source/archive/fileform/packers/appnote.t xt There have been several references on this list to Guido et al having some Python / zip code. - Gordon

Not true. It's easy (using the proper Zip tools) to creat an archive containing this instead: string.pyc os.pyc copy.pyc testall.pyc Thus the entire archive is considered the directory. The Java "jar" tool uses this approach. It's also easy to have packages in there (again this is what Java does): test/ test/__init__.pyc test/pystone.pyc test_support.pyc (etc.)
Maybe you've gone overboard. The time it takes to translate the dots into slashes really isn't the big deal.
Yes (all of us here at CNRI), yes, yes (we have the spaghetti code). While zip files support compression, they support uncompressed files as well and we could go either way. Their most popular compression format is gzip compatible and can be read and written with the zlib module, which is in the standard Python distribution (even on Windows) -- though to build it you need the zlib C library which is of course external (but solid open source). --Guido van Rossum (home page: http://www.python.org/~guido/)

Jean-Claude Wippler wrote:
Ouch - what's wrong with zip archives?
With all due respect - I sincerely hope you will reconsider and alter your code to work with zip files. It's probably a small adjustment?
OK, you talked me into it. Ya, small adjustment, no problem ;-) JimA

OK, I now have a new module "zipfile" which reads and writes ZIP files. It is written in Python and has been tested on Windows and Linux. I tested it with WinZip and found that the files it creates are read OK with WinZip, and WinZip files are read OK with zipfile. So I am withdrawing my Python archive file format, and re-writing all my stuff using zipfile. It should all be done in a week. Basically everything works fine. But there are some problems. Python seems to lack a CRC-32 function, so I wrote one in Python. It is slow. We need to add a CRC-32 function to some Python built-in module that it always present, like md5 or binascci. The zlib module is not necessarily present. I can't seem to get WinZip to record a partial path. That is, I want the ./Lib/test package to have these ZIP paths: test/__init__.pyc test/testall.pyc ... but WinZip creates files with either no path at all or the fully specified path. Am I missing something? Do all other ZIP tools do this too? JimA Return-Path: <owner-python-dev@python.org> Delivered-To: python-dev@dinsdale.python.org Received: from python.org (parrot.python.org [132.151.1.90]) by dinsdale.python.org (Postfix) with ESMTP id EFDA11CDB9 for <python-dev@dinsdale.python.org>; Mon, 13 Dec 1999 10:21:56 -0500 (EST) Received: from cnri.reston.va.us (ns.CNRI.Reston.VA.US [132.151.1.1] (may be forged)) by python.org (8.9.1a/8.9.1) with ESMTP id KAA06423 for <python-dev@python.org>; Mon, 13 Dec 1999 10:21:55 -0500 (EST) Received: from kaluha.cnri.reston.va.us (kaluha.cnri.reston.va.us [132.151.7.31]) by cnri.reston.va.us (8.9.1a/8.9.1) with ESMTP id KAA04774 for <python-dev@python.org>; Mon, 13 Dec 1999 10:21:56 -0500 (EST) Received: from eric.cnri.reston.va.us (eric.cnri.reston.va.us [10.27.10.23]) by kaluha.cnri.reston.va.us (8.9.1b+Sun/8.9.1) with ESMTP id KAA04556 for <python-dev@python.org>; Mon, 13 Dec 1999 10:22:34 -0500 (EST) Received: from CNRI.Reston.VA.US (localhost [127.0.0.1]) by eric.cnri.reston.va.us (8.9.3+Sun/8.9.1) with ESMTP id KAA18858 for <python-dev@python.org>; Mon, 13 Dec 1999 10:22:34 -0500 (EST) Resent-Message-Id: <199912131522.KAA18858@eric.cnri.reston.va.us> Message-Id: <199912131522.KAA18858@eric.cnri.reston.va.us> To: "James C. Ahlstrom" <jim@interet.com> Subject: Re: [Python-Dev] Re: [Distutils] Questions about distutils strategy In-reply-to: Your message of "Mon, 13 Dec 1999 09:50:11 EST." <385507A3.9F6AAF0F@interet.com> References: <000301bf4206$b39e5b80$36a2143f@tim> <384FC47A.BB4DA517@interet.com> <384FDAF5.C25C447C@equi4.com> <38510254.ED15D32B@interet.com> <385507A3.9F6AAF0F@interet.com> Date: Mon, 13 Dec 1999 10:22:12 -0500 From: Guido van Rossum <guido@CNRI.Reston.VA.US> Resent-Cc: python-dev@python.org Resent-Date: Mon, 13 Dec 1999 10:22:34 -0500 Resent-From: Guido van Rossum <guido@CNRI.Reston.VA.US> Sender: python-dev-admin@python.org Errors-To: python-dev-admin@python.org X-BeenThere: python-dev@python.org X-Mailman-Version: 1.2 (experimental) Precedence: bulk List-Id: Python core developers <python-dev.python.org>
Ah, good! (This saves me the trouble of cleaning up our own zip code :-)
Unclick the "Save Extra Folder Info" and then drag the *parent* folder into the archive. --Guido van Rossum (home page: http://www.python.org/~guido/)

On Mon, 13 Dec 1999, James C. Ahlstrom wrote:
Can you post zipfile.py so that people can starting reviewing that?
See zlib.crc32() This is interesting, of course, because we have previously stated that zlib (and its compression) is optional. But if we need the CRC-32 function... hehe... Cheers, -g -- Greg Stein, http://www.lyra.org/

On Tue, 14 Dec 1999, James C. Ahlstrom wrote:
My point was that people could possibly use it *before* then. Not everybody needs it to be pretty, needs doc, or needs it fully working. Maybe people would like to provide feedback on the API. Maybe they'd like to start their own modules that use your library. This goes back to my years-old statement: release it now rather than later -- people can always use it now, and there might not be a later. Release early. Release often. :-) People are too hesitant to release code. Why? Just send it out there. When you update it, send out another. It doesn't hurt anybody to have more than one release. Cheers, -g -- Greg Stein, http://www.lyra.org/

Greg Stein wrote:
Release early. Release often. :-)
You are right of course. OK, the zipfile.py code and docs are at: ftp://ftp.interet.com/pub/pylib.html Despite the ftp URL, clicking on it should display the html. Please don't panic if is seems to be slow. It uses a Python CRC-32 which is slow. You may want to hack it to use zlib.crc32() if you have it. I am testing with WinZip. If you have another zip tool, it would be interesting to see how compatible it is. JimA

Did anyone look at this yet? ftp://ftp.interet.com/pub/pylib.html ftp://ftp.interet.com/pub/zipfile.py JimA

JA> Did anyone look at this yet? JA> ftp://ftp.interet.com/pub/pylib.html JA> ftp://ftp.interet.com/pub/zipfile.py I thought it wasn't supposed to be out until Monday? You're looking for, perhaps, a time machine? ;-) (More seriously, it won't have any effect on my "gotta have this done yesterday" list, so I will let others comment...) Skip

"James C. Ahlstrom" wrote:
ftp://ftp.interet.com/pub/pylib.html
I just changed zipfile.py so that regular zip compression works. And if zlib is available, its crc32() is used instead of the Python version. I should mention that the current code rejects zip files which have an archive comment added to the end. Accepting them would require a search, and I am not sure it is worth it. JimA

"James C. Ahlstrom" wrote:
I don't think it is needed for our purposes, but maybe a subclass could provide it ? FYI, I've tested the module against mxStack-0.3.0.zip which you can find on my Python Pages. It was created using Info-ZIP's zip 2.2 on Linux. Unfortunately, I always get the following traceback when trying to print the directory:
Some notes on the API: ---------------------- * I would find it more convenient if the filename and mode would be constructor parameters, e.g. zfile = zipfile('myfile.zip','rb') with compression defaulting to 8 rather than 0 (most zip files will be deflated since this is the ZIP default). * Also, I would like a method much like the os.listdir() which returns a list of filenames rather than print it to stdout. * .is_zipfile() should probably be a separate function: it doesn't use any of the class' features. More wishes to come ;-) So far: Great Work ! Aside: I found that you are using undocumented arguments to zlib.compressobj() ... are these extra arguments left out of the documentation on purpose or by simple oversight ? I couldn't find them in the HTML docs and neither in the docstrings. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 15 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

On Thu, 16 Dec 1999, M.-A. Lemburg wrote:
The above two items were in my ramble, just not as clear as MAL :-)
* .is_zipfile() should probably be a separate function: it doesn't use any of the class' features.
Ah! Good call. It is even more important to shift it out if the constructor now opens a file. Cheers, -g -- Greg Stein, http://www.lyra.org/

M.-A. Lemburg writes:
The documentation is way out of date and Jeremy Hylton and Andrew Kuchling haven't updated it. I'm not sure which of them changed the signatures for that module, but I've pestered Jeremy about it a few times. If anyone would like to update the documentation, I'd certainly appreciate it. I don't know the details of those interfaces, and this is somewhere where the details are pretty critical. -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives

"M.-A. Lemburg" wrote:
"James C. Ahlstrom" wrote:
ftp://ftp.interet.com/pub/pylib.html
Unfortunately, I always get the following traceback when trying to print the directory:
Yes, compression isn't there yet. I am looking into it.
OK, done.
with compression defaulting to 8 rather than 0 (most zip files will be deflated since this is the ZIP default).
Until compression works, and zlib ships with Python I would rather default to no compression (method 0). Otherwise this is not useful as a Python import archive.
OK, done.
* .is_zipfile() should probably be a separate function: it doesn't use any of the class' features.
OK, done.
I am following the CNRI code blindly here. I don't have docs either. JimA

"James C. Ahlstrom" wrote:
Great :-)
Point taken. Perhaps it would be even better to not have a default at all: that way people will have to think about the issue *before* implementing it, rather than debug code that produces tracebacks.
Thanks, -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 13 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

"JCA" == James C Ahlstrom <jim@interet.com> writes:
JCA> I am following the CNRI code blindly here. I don't have docs JCA> either. The docs for the zlib module are quite out of date, although I think the docstrings may be better (not necessarily completely up-to-date thought :-). The specific parameters to pass to zlib don't seem to be documented anywhere either; IIRC I dug them out of some example C code somewhere that used zlib to read Zip files. Jeremy

"M.-A. Lemburg" wrote:
Unfortunately, I always get the following traceback when trying to print the directory:
OK, I changed the decompress code (10:23 AM), please re-try.
with compression defaulting to 8 rather than 0 (most zip files will be deflated since this is the ZIP default).
The compress mode only applies to writing. On read, the method recorded in the file controls. JimA

"James C. Ahlstrom" wrote:
Everything is fine now... it's really impressive how easy you can manipulate ZIP files with it. One thing I'd suugest is to include some way to delete and update contents, e.g. the write() method should overwrite any existing entry in the archive (if it not already does -- I haven't tested it, just read the code and it seems to raise an exception), plus maybe a .remove() method which deletes an entry.
True. How about making the compression argument mandatory for file opened in 'wb' mode only ? -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 13 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

"M.-A. Lemburg" wrote:
Currently, adding a file requires the "a" append mode, while the "w" mode re-writes the file. Adding a duplicate file name produces an error message. I can change this, but removing a file would either waste space, or else the file contents must be copied over the old file and all the offsets updated. I don't like this because it is complicated, and I think it is fast enough to just re-write the archive. But it could be added if people want.
True. How about making the compression argument mandatory for file opened in 'wb' mode only ?
The default of zero provides a little guidance that you should use zero. I added a warning message if 8 is used which should discourage people from using 8. Or I could disallow 8. Is that OK? JimA

"James C. Ahlstrom" wrote:
I guess it would be ok to waste space. You could provide a .cleanup() or .rewrite() method that takes care of reorganizing the file to fill up the gaps.
Well the module seems to work just fine with compression on, so disallowing it or issuing a warning would reduce its value, IMHO. How about making compression a boolean value and then converting any true value to 8 ? -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 11 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

"M.-A. Lemburg" wrote:
OK, adding a duplicate name replaces the old file.
Yes compression works, but 90% of Python installations don't have zlib, so it is an ERROR to create archives with compression when these archives are distributed to other sites.
How about making compression a boolean value and then converting any true value to 8 ?
It would close the door to future or other compression methods. Currently the method must be 0 or 8 or a traceback will result. JimA

On Mon, 20 Dec 1999, James C. Ahlstrom wrote:
But it shouldn't print a warning(!). If an application wants to replace a file, then stuff shouldn't appear on stdout as a result.
While it may be problem to distribute them to other sites, that is not up to the library. If I want compression, then I should get compression. A library module should not determine application-level policy. The warning that __init__ prints shouldn't be there. Really: there should not be a single "print" in the library (well, printdir() is fine... that's what it is supposed to do; printing in the test code would be fine). In normal, or even exceptional(!), operation there should never be a print.
I definitely agree with JimA here. For example, maybe we want bzip compression in there. Sure, non-portable, but that's my problem :-) Cheers, -g -- Greg Stein, http://www.lyra.org/

"James C. Ahlstrom" wrote:
Cool.
Sure, for the sake of creating Python code archives, but your module is much more versatile: e.g. I could automatically create ZIP archives of log files or sets of other files and then have Python email them to someone who uses these archives through standard tools such as WinZip -- the target doesn't always have to be a Python process :-)
Ok. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 11 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

On Thu, 16 Dec 1999, James C. Ahlstrom wrote:
I went to look for it, but I think that was before you put zipfile up. Looking at it now... The writepy() as a method is questionable, I think. I think it should open the file at instantiation time. I don't see a reason to allow that to be deferred. Especially given that some of the methods fail if open() hasn't been called. It would be good to have symbolic names for the 0 and 8 compression constants, and to fail if 8 is passed and zlib is not available (otherwise, it doesn't fail until read/write time, and with a NameError). There should probably be a __del__ that calls close(). Oh, and a "closed" attribute that can be checked and an error raised if an operation is done after the file has been closed. I think dir() should return the contents, rather than print them. read() and write() ought to fail if the mode is incorrect. Oh, some symbolic constants for things like "PK\005\006" would be nice. Do you have a ZipImporter written? Cheers, -g -- Greg Stein, http://www.lyra.org/

James C. Ahlstrom wrote:
just a few comments (from reading the docs): -- it would be great if "open" could take an open file object as well as a file name. (in this case, you also need to document what you expect from the underlying file object: read, write, seek, tell should be enough, right? haven't looked at the code -- assuming it works, I'm only interested in the interface) -- or you could nuke "open" and pass those arguments to the constructor instead. -- I assume "open" adds "b" to the given mode argument. -- "dir" looks a bit strange. and hey, there's no "listdir" in there. I'd prefer a recursive "listdir" method, which takes an optional "depth" argument (e.g. 0=this dir, 1=this dir and first subdir, None=infinity, i.e. the full tree). that's all for now. </F>

Fredrik Lundh wrote:
James C. Ahlstrom wrote:
ftp://ftp.interet.com/pub/pylib.html
-- it would be great if "open" could take an open file object as well as a file name.
I put these arguments into the constructor now.
OK, docs updated.
-- I assume "open" adds "b" to the given mode argument.
Correct. The mode can be either "w" or "wb" etc., and it works.
I added a plain listdir() and changed dir() to printdir(). I also documented self.TOC which gets you the values too. JimA

Greg Stein wrote:
I eliminated open and added its args to the constructor.
All done.
I think dir() should return the contents, rather than print them.
I added listdir() and documented self.TOC. I kept printdir() as example code.
read() and write() ought to fail if the mode is incorrect. Oh, some symbolic constants for things like "PK\005\006" would be nice.
All done. JimA

James C. Ahlstrom wrote:
ftp://ftp.interet.com/pub/pylib.html
I feel that it smell a bit too much like a tool and too little like an general programming api. - It can only add disk files. The ability to write data to a zip entry through a file-like object or from a string would make it more like an API, IMHO - Some kind of access to the TOC entry fields (date, size, compressed size etc) also seems like a nice feature. - The data for an entry must be available in memory. Could be a problem for huge files, but most like not in practical use. I admit that I am fond of the api from java.util.zip.ZipFile and java.util.zip.ZipOutputStream. Regards, Finn Bock

Finn Bock wrote:
It was meant to be an API except for writepy(), which is clearly a tool.
- It can only add disk files. The ability to write data to a zip entry through a file-like object or from a string would make it more like an API, IMHO
I could add a method writestr(self, string, year, month, day, hour, minute, second, ...) There are a lot of fields required which usually come from the file.
- Some kind of access to the TOC entry fields (date, size, compressed size etc) also seems like a nice feature.
This access is provided directly by self.TOC, and the fields are documented.
- The data for an entry must be available in memory. Could be a problem for huge files, but most like not in practical use.
I agree, but adding loops will make it slower. What do others think?
I admit that I am fond of the api from java.util.zip.ZipFile and java.util.zip.ZipOutputStream.
I don't know this API. If writestr() is not sufficient, what API would you like? JimA

[I wrote]
- It can only add disk files. The ability to write data to a zip entry through a file-like object or from a string would make it more like an API, IMHO
[JimA wrote]
Something like that seems fine to me. [I wrote]
- Some kind of access to the TOC entry fields (date, size, compressed size etc) also seems like a nice feature.
[JimA answers]
This access is provided directly by self.TOC, and the fields are documented.
Good enough. My bad, I was looking for getter methods. (me being a java dude) [I wrote]
I admit that I am fond of the api from java.util.zip.ZipFile and java.util.zip.ZipOutputStream.
[JimA asks]
I don't know this API. If writestr() is not sufficient, what API would you like?
This is only meant as a source for inspiration, certainly as a request for change. writestr would answer my complaint nicely. Below, only one ZipEntry can be actively read or written to at a time. All the small details of performance and implementation complexity are ignored. class ZipFile: def getEntry(name): ... self.activeentry = ZipEntry(name) return self.activeentry class ZipEntry: #enough methods and fields to fake file-ness to casual users like me. def write(list): ... def writelines(str): ... def read(size=None): ... def readlines(sizehint=-1): ... def seek(offset): ... def flush(): ... def close(str): ... def getSize(): .... def getCompressedSize(): .... def getFlags(): .... regards, finn

"GS" == Greg Stein <gstein@lyra.org> writes:
GS> On Tue, 14 Dec 1999, James C. Ahlstrom wrote:
GS> My point was that people could possibly use it *before* GS> then. Not everybody needs it to be pretty, needs doc, or needs GS> it fully working. Maybe people would like to provide feedback GS> on the API. Maybe they'd like to start their own modules that GS> use your library. GS> This goes back to my years-old statement: release it now rather GS> than later -- people can always use it now, and there might not GS> be a later. Ok. I think we need some kind of zip file support in the core so that it can be used as a standard distribution format. I'd be happy if Jim's zipfile module ended up being it. We've got some zip code that we developed at CNRI; it's a bit of a mess, but it might be helpful to see what we did. Our code is at ftp://www.python.org/pub/tmp/zip.zip Jeremy

[James C. Ahlstrom]
Unfortunately, there are many different CRC functions in common use. None belong in md5; if the intent is to support just zip's version, adding a (say) zipcrc32 function to binascii would be ok; if we expect to support others as well, a new parameterized crc module would be in order.
No, it's a clumsiness unique to WinZip (damn GUIs <0.9 wink>). In the Add dialog box, you need to cd to the *Lib* directory, check the "Save extra folder info" box, and then, e.g., 1. Put test\*.pyc in the Add Files line, and click Add With Wildcards. Then all test\*.pyc files will be added, with paths test/__init__.pyc etc. or 2. Put "test\__init__.pyc" "test\testall.pyc" (including the quotes!) in the Add Files line, and click Add. Since #2 can be unbearable, other useful strategies include: 3. Use #1 (e.g. with dir\*.*) then delete the files you didn't really want. 4. Use #1 repeatedly, cleverly using a number of wildcard patterns that cover the files of interest. 5. Mixtures of #3 and #4. 6. Use a comand-line zip tool instead (e.g., pkzip; I think WinZip has an "experimental" cmdline add-on too, but haven't tried it).

Tim Peters wrote:
OK, a CRC-32 in binascii it is. The CRC-32 I have comes with these comments which seem to indicate it is a more "official standard" CRC-32 than average: # * Crc - 32 BIT ANSI X3.66 CRC checksum files #*********************************************************************\ #* *| #* Demonstration program to compute the 32-bit CRC used as the frame *| #* check sequence in ADCCP (ANSI X3.66, also known as FIPS PUB 71 *| #* and FED-STD-1003, the U.S. versions of CCITT's X.25 link-level *| #* protocol). The 32-bit FCS was added via the Federal Register, *| #* 1 June 1982, p.23798. I presume but don't know for certain that *| #* this polynomial is or will be included in CCITT V.41, which *| #* defines the 16-bit CRC (often called CRC-CCITT) polynomial. FIPS *| #* PUB 78 says that the 32-bit FCS reduces otherwise undetected *| #* errors by a factor of 10^-5 over 16-bit FCS. *| #* *| #********************************************************************* #* Copyright (C) 1986 Gary S. Brown. You may use this program, or #* code or tables extracted from it, as desired without restriction. I can submit this as a patch to binascii, or if the Copyright bothers anyone, maybe it is better for Guido to use his CRC-32 from his ZIP code. Preference?
Thanks. I knew there had to be some magic incantation to do it.
6. Use a comand-line zip tool instead (e.g., pkzip; I think WinZip has an "experimental" cmdline add-on too, but haven't tried it).
Actually pkzip 2.04g doesn't work because it writes names in upper case and is limited to 8.3 names (I think). My zipfile.py can be used as a basis for a command line tool. Actually I use makefiles with imbedded Python programs and find this easier than command line tools. JimA

I looked, but "my" crc32 in the zlib module (which was actually contributed by Andrew Kuchling) is just a wrapper around the crc32 function in zlib, which is copyrighted by Mark Adler and follows the zlib rules. I propose to use Gary Brown's code. I'll defend this to CNRI's lawyers if need be. Jim, have you checked that this is the right CRC to use for zip's CRC? (This in the light of Tim's assertion that there are many CRCs around.) --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
The CRC it calculates agrees with the CRC of WinZip for all files I have tried. The original Gary Brown code was much longer and included file reading. Here is the shortened version: JimA # * Crc - 32 BIT ANSI X3.66 CRC checksum files #*********************************************************************\ #* *| #* Demonstration program to compute the 32-bit CRC used as the frame *| #* check sequence in ADCCP (ANSI X3.66, also known as FIPS PUB 71 *| #* and FED-STD-1003, the U.S. versions of CCITT's X.25 link-level *| #* protocol). The 32-bit FCS was added via the Federal Register, *| #* 1 June 1982, p.23798. I presume but don't know for certain that *| #* this polynomial is or will be included in CCITT V.41, which *| #* defines the 16-bit CRC (often called CRC-CCITT) polynomial. FIPS *| #* PUB 78 says that the 32-bit FCS reduces otherwise undetected *| #* errors by a factor of 10^-5 over 16-bit FCS. *| #* *| #********************************************************************* # #* Copyright (C) 1986 Gary S. Brown. You may use this program, or #* code or tables extracted from it, as desired without restriction. # First, the polynomial itself and its table of feedback terms. The # polynomial is # X^32+X^26+X^23+X^22+X^16+X^12+X^11+X^10+X^8+X^7+X^5+X^4+X^2+X^1+X^0 # Note that we take it "backwards" and put the highest-order term in # the lowest-order bit. The X^32 term is "implied"; the LSB is the # X^31 term, etc. The X^0 term (usually shown as "+1") results in # the MSB being 1. # Note that the usual hardware shift register implementation, which # is what we're using (we're merely optimizing it by doing eight-bit # chunks at a time) shifts bits into the lowest-order term. In our # implementation, that means shifting towards the right. Why do we # do it this way? Because the calculated CRC must be transmitted in # order from highest-order term to lowest-order term. UARTs transmit # characters in order from LSB to MSB. By storing the CRC this way, # we hand it to the UART in the order low-byte to high-byte; the UART # sends each low-bit to hight-bit; and the result is transmission bit # by bit from highest- to lowest-order term without requiring any bit # shuffling on our part. Reception works similarly. # The feedback terms table consists of 256, 32-bit entries. Notes: # # 1. The table can be generated at runtime if desired; code to do so # is shown later. It might not be obvious, but the feedback # terms simply represent the results of eight shift/xor opera- # tions for all combinations of data and CRC register values. # # 2. The CRC accumulation logic is the same for all CRC polynomials, # be they sixteen or thirty-two bits wide. You simply choose the # appropriate table. Alternatively, because the table can be # generated at runtime, you can start by generating the table for # the polynomial in question and use exactly the same "updcrc", # if your application needn't simultaneously handle two CRC # polynomials. (Note, however, that XMODEM is strange.) # # 3. For 16-bit CRCs, the table entries need be only 16 bits wide; # of course, 32-bit entries work OK if the high 16 bits are zero. # # 4. The values must be right-shifted by eight bits by the "updcrc" # logic; the shift must be unsigned (bring in zeroes). On some # hardware you could probably optimize the shift in assembler by # using byte-swap instructions. # Converted to Python by James C. Ahlstrom crc_32_tab = [ # CRC polynomial 0xedb88320 0x00000000, 0x77073096, 0xee0e612c, 0x990951ba, 0x076dc419, 0x706af48f, 0xe963a535, 0x9e6495a3, 0x0edb8832, 0x79dcb8a4, 0xe0d5e91e, 0x97d2d988, 0x09b64c2b, 0x7eb17cbd, 0xe7b82d07, 0x90bf1d91, 0x1db71064, 0x6ab020f2, 0xf3b97148, 0x84be41de, 0x1adad47d, 0x6ddde4eb, 0xf4d4b551, 0x83d385c7, 0x136c9856, 0x646ba8c0, 0xfd62f97a, 0x8a65c9ec, 0x14015c4f, 0x63066cd9, 0xfa0f3d63, 0x8d080df5, 0x3b6e20c8, 0x4c69105e, 0xd56041e4, 0xa2677172, 0x3c03e4d1, 0x4b04d447, 0xd20d85fd, 0xa50ab56b, 0x35b5a8fa, 0x42b2986c, 0xdbbbc9d6, 0xacbcf940, 0x32d86ce3, 0x45df5c75, 0xdcd60dcf, 0xabd13d59, 0x26d930ac, 0x51de003a, 0xc8d75180, 0xbfd06116, 0x21b4f4b5, 0x56b3c423, 0xcfba9599, 0xb8bda50f, 0x2802b89e, 0x5f058808, 0xc60cd9b2, 0xb10be924, 0x2f6f7c87, 0x58684c11, 0xc1611dab, 0xb6662d3d, 0x76dc4190, 0x01db7106, 0x98d220bc, 0xefd5102a, 0x71b18589, 0x06b6b51f, 0x9fbfe4a5, 0xe8b8d433, 0x7807c9a2, 0x0f00f934, 0x9609a88e, 0xe10e9818, 0x7f6a0dbb, 0x086d3d2d, 0x91646c97, 0xe6635c01, 0x6b6b51f4, 0x1c6c6162, 0x856530d8, 0xf262004e, 0x6c0695ed, 0x1b01a57b, 0x8208f4c1, 0xf50fc457, 0x65b0d9c6, 0x12b7e950, 0x8bbeb8ea, 0xfcb9887c, 0x62dd1ddf, 0x15da2d49, 0x8cd37cf3, 0xfbd44c65, 0x4db26158, 0x3ab551ce, 0xa3bc0074, 0xd4bb30e2, 0x4adfa541, 0x3dd895d7, 0xa4d1c46d, 0xd3d6f4fb, 0x4369e96a, 0x346ed9fc, 0xad678846, 0xda60b8d0, 0x44042d73, 0x33031de5, 0xaa0a4c5f, 0xdd0d7cc9, 0x5005713c, 0x270241aa, 0xbe0b1010, 0xc90c2086, 0x5768b525, 0x206f85b3, 0xb966d409, 0xce61e49f, 0x5edef90e, 0x29d9c998, 0xb0d09822, 0xc7d7a8b4, 0x59b33d17, 0x2eb40d81, 0xb7bd5c3b, 0xc0ba6cad, 0xedb88320, 0x9abfb3b6, 0x03b6e20c, 0x74b1d29a, 0xead54739, 0x9dd277af, 0x04db2615, 0x73dc1683, 0xe3630b12, 0x94643b84, 0x0d6d6a3e, 0x7a6a5aa8, 0xe40ecf0b, 0x9309ff9d, 0x0a00ae27, 0x7d079eb1, 0xf00f9344, 0x8708a3d2, 0x1e01f268, 0x6906c2fe, 0xf762575d, 0x806567cb, 0x196c3671, 0x6e6b06e7, 0xfed41b76, 0x89d32be0, 0x10da7a5a, 0x67dd4acc, 0xf9b9df6f, 0x8ebeeff9, 0x17b7be43, 0x60b08ed5, 0xd6d6a3e8, 0xa1d1937e, 0x38d8c2c4, 0x4fdff252, 0xd1bb67f1, 0xa6bc5767, 0x3fb506dd, 0x48b2364b, 0xd80d2bda, 0xaf0a1b4c, 0x36034af6, 0x41047a60, 0xdf60efc3, 0xa867df55, 0x316e8eef, 0x4669be79, 0xcb61b38c, 0xbc66831a, 0x256fd2a0, 0x5268e236, 0xcc0c7795, 0xbb0b4703, 0x220216b9, 0x5505262f, 0xc5ba3bbe, 0xb2bd0b28, 0x2bb45a92, 0x5cb36a04, 0xc2d7ffa7, 0xb5d0cf31, 0x2cd99e8b, 0x5bdeae1d, 0x9b64c2b0, 0xec63f226, 0x756aa39c, 0x026d930a, 0x9c0906a9, 0xeb0e363f, 0x72076785, 0x05005713, 0x95bf4a82, 0xe2b87a14, 0x7bb12bae, 0x0cb61b38, 0x92d28e9b, 0xe5d5be0d, 0x7cdcefb7, 0x0bdbdf21, 0x86d3d2d4, 0xf1d4e242, 0x68ddb3f8, 0x1fda836e, 0x81be16cd, 0xf6b9265b, 0x6fb077e1, 0x18b74777, 0x88085ae6, 0xff0f6a70, 0x66063bca, 0x11010b5c, 0x8f659eff, 0xf862ae69, 0x616bffd3, 0x166ccf45, 0xa00ae278, 0xd70dd2ee, 0x4e048354, 0x3903b3c2, 0xa7672661, 0xd06016f7, 0x4969474d, 0x3e6e77db, 0xaed16a4a, 0xd9d65adc, 0x40df0b66, 0x37d83bf0, 0xa9bcae53, 0xdebb9ec5, 0x47b2cf7f, 0x30b5ffe9, 0xbdbdf21c, 0xcabac28a, 0x53b39330, 0x24b4a3a6, 0xbad03605, 0xcdd70693, 0x54de5729, 0x23d967bf, 0xb3667a2e, 0xc4614ab8, 0x5d681b02, 0x2a6f2b94, 0xb40bbe37, 0xc30c8ea1, 0x5a05df1b, 0x2d02ef8d ] def crc32(string): crc = 0xFFFFFFFF for ch in string: crc = crc_32_tab[((crc) ^ ord(ch)) & 0xff] ^ (((crc) >> 8) & 0xFFFFFF) return ~crc

[JimA posts his Python rendering of Gary Brown's code] Yup! That's the zip algorithm, right down to the absurdly bit-reversed polynomial.
Note that the last line is better (whether in Python or C!) as return crc ^ 0xffffffff Else you'll get a surprising result in a 64-bit Python, and in some 64-bit C implementations. it's-a-32-bit-algorithm-not-an-"int"-or-"long"-one-ly y'rs - tim

[Guido]
I propose to use Gary Brown's code. I'll defend this to CNRI's lawyers if need be.
If there's a hassle, I can do a clean-room implementation easily enough -- although I'd rather not.
Jim, have you checked that this is the right CRC to use for zip's CRC?
If WinZip unzips Jim's files without griping, the odds that he's got the wrong CRC are about 1 in 2**36 <wink>.
(This in the light of Tim's assertion that there are many CRCs around.)
There are, and several others are hiding in assorted communications stds (e.g., Ethernet uses a different 32-bit CRC); but the zip CRC is the one you'll find most commonly described on the Web. All the same, once Jim releases his code, I'll do an anal verification that it's the right one.

[Tim]
If WinZip unzips Jim's files without griping, the odds that he's got the wrong CRC are about 1 in 2**36 <wink>.
[JimA]
You mean 2**32, right?
Nope! For each of the 2**32 polynomials you may have pulled out of thin air, there are about a dozen common variations in the details of CRC algorithms. For example, a CRC used for hashing usually initializes "the register" to 0, but a CRC used to protect against transmission errors usually initializes to a block of 1 bits (since leading zeroes don't affect the result, and a common transmission error is dropping a prefix of the msg). Similarly, algorithms vary in the order they scan the data; in whether they use the raw data or its complement; and in whether they return the actual remainder, the complement of the remainder, or a checksum cleverly computed so that "the other end" always sees a fixed remainder other than 0 (or ~0).
Oh, sorry, you must be using a DEC-10 <wink again>.
I used a Univac 1108 in college, back when ASCII was in its infancy. They couldn't decide on the natural size for a character, so the 36-bit 1108 could be configured to treat each word as either 6 6-bit bytes or 4 9-bit ones. If they had been thinking ahead, they would have defined it as two Unicode characters plus a 4-bit tag field for the Python implementation to play with <wink>. now-they-make-their-living-suing-.gif-bandits-ly y'rs - tim

[Jim A]
Apart from agreeing with Jean-Claude's rant about inventing a new archive format, I think this is a good proposal because it is very clear about the problem it tries to solve and doesn't get distracted by other issues. I also commend Jim for building upon Greg Stein's imputil (like Gordon did). I wish I could present a solution this simple as The Standard Way, but (as explained in my long post earlier today) there just are so many wrinkles that I'd rather hold out for the Right Solution... But I've taken good notice of Jim's solution. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido wrote:
It needed a name. I hate the word "Installer", but it expresses in one word the most common use of my stuff. I'll be releasing a beta for Linux real soon. Only some of the tricks are Windows only (such as self-extracting executables, which is only culturally appropriate on Windows, anyway). But more importantly it's not just for installing. The Python I use (interactively) on my wife's machine is 1 directory with about 6 files in it. On my Linux box I've been using the std lib in a .pyz for about a month now. Someone distributing a pure Python package could instead ship 3 files (imputil.py, archive.py and <package>.pyz) with the "install" consisting of adding one line to site.py in the user's perfectly normal Python installation. And yeah, I solved the "manifest" problem, too. Mine predates Distutils, so don't accuse me of duplicate effort, (I pointed them to it a couple times). It uses ConfigParser and a config file, so it allows finer control. While .pyz's are completely cross-platform, I have yet to work out endianness issues in the other archive I use (which should probably be zip format - it can hold anything). And at the "Installer" end, I have yet to work out how things should work on non-ELF/COFF platforms (where I can't append the archive to the executable). But there aren't any technical issues involved; just lack of time. So no, it's not just for Windows; and no, it's not just for creating standalones (though that's what almost everyone uses it for). - Gordon

Gordon, I'm sorry, but from this description I still have no idea what your stuff is (and I forgot the URL so I can't look it up). For example, if it's not (just) for installing, what *is* it for? What is the ``"manifest" problem'' and how did you solve it? Also, note that editing site.py is a no-no! You can create/edit sitecustomize.py, but you should leave site.py alone! --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido,
Gordon, I'm sorry, but from this description I still have no idea what your stuff is (and I forgot the URL so I can't look it up).
http://starship.python.org/crew/gmcm/installer.html The Linux stuff has a couple alpha testers and will probably get announced in a week or two.
For example, if it's not (just) for installing, what *is* it for?
At the bottom level, it's a bunch of tools using freeze's modulefinder, imputil.py and 2 kinds of archives. There's at least 2 layers above that, with "Installer" being the top. There's a clean separation between the layers, so you can break in wherever you like.
What is the ``"manifest" problem'' and how did you solve it?
The problem is specifying a set of resources, hopefully without having to list them explicitly. I solve this with a config file that lets you specify packages, directories, directory trees.. with filters that can work from paths, names, extensions, regular expressions...
Also, note that editing site.py is a no-no! You can create/edit sitecustomize.py, but you should leave site.py alone!
That would work fine. One of the standalone configurations will write a site.py, but that's for a completely self-contained installation (ie, one which will have no conflicts with another Python installation). I'd also note that, for Windows at least, the path-expanding mechanism created by site.py has not caught on. I've got lots installed, and no site-python, site-packages or sitecustomize. - Gordon

[me]
Also, note that editing site.py is a no-no! You can create/edit sitecustomize.py, but you should leave site.py alone!
[Gordon]
You shouldn't see site-python or site-packages, they only exist on Unix. On Windows, everything is installed in the top Python directory. However you should see .pth files there, which is what site.py looks for. I believe NumPy and PIL use those. --Guido van Rossum (home page: http://www.python.org/~guido/)

You mean "they only exist _for_ Unix", (site.py looks for them on Windows). I don't like that. For one thing, modulo a few platform differences, the same mechanism should work for multi-user Unix and Windows LAN installations. And single- user Windows (I know, redundant, even on NT) should be a degenerate case of the above.
No NumPy, no PIL, no .pth files. 99% of everything out there just says "unzip this somewhere on your Python path". In this case, Jim Ahlstrom may be right - there are too many options, or at least an insufficiently emphasized "proper" method. Until I worked out my own way of installing stuff, I used to lose a large number of packages whenever I upgraded my Windows Python. Much as I love Mark's stuff (and hesitate to criticize crazy Aussies), I wish there weren't so much special casing here for Windows. And no, I don't have any solutions to this, I'm just griping... - Gordon

[Gordon]
You mean "they only exist _for_ Unix", (site.py looks for them on Windows).
No it doesn't. The code in site.py only adds site-packages and site-python when os.sep is '/'. RTSL.
What do you mean by "the same mechanism should work"? The same mechanism for what? Are you talking about sharing the installed files somehow?
Fair enough. Of course I know about .pth files so I unzipped them elsewhere and added a .pth file pointing there...
The .pth files are designed for this. Maybe they haven't been explained as well as they should.
It's not Mark's fault, it's Microsoft's fault. If you don't do things the way MS wants you to, experienced Windows users will gripe, misunderstand what you do, etc.
And no, I don't have any solutions to this, I'm just griping...
Ditto. Understanding the problems is half of the solution though. The problems seem pretty complex! --Guido van Rossum (home page: http://www.python.org/~guido/)

[Guido]
No it doesn't. The code in site.py only adds site-packages and site-python when os.sep is '/'. RTSL.
Oops. Missed that.
In the above, "mechanism" basically meant that which creates sys.path. Basically, this came up for me because in standalone configurations (my Installer again), I have to take complete control of sys.path. After doing so differently on Windows and Linux, I finally realized that I can do it the same way on both. Which makes me question why they are so different.
The .pth files are designed for this. Maybe they haven't been explained as well as they should.
I'd say "badgered" or "browbeaten" instead of "explained" ;-).
Even MS doesn't do things the way MS says they want you to. I find MS users equally divided between those who scream bloody murder if you touch the registry, and those who scream if you don't. It's not like *nixen suffer from an excessive degree of conformity in preferred installation procedures, but somehow Python survives there...
Grumpily agreed ;-). - Gordon

I finally got around to reading the current Linux Journal (which just keeps getting better and better) and lo! there was a picture of a familiar face I just couldn't quite.... Oh no! Could it be true? I heard rumors but I refused to believe them until now. The glasses are gone! Guido now looks like an investment banker! The sky is falling! Next will probably be a Python 1.6 as a 27 Meg DLL, and a Python IPO. Well, maybe not. Now that I look more closely, he is wearing a black and white and mustard (??MUSTARD) T-shirt which says "You Need Python". At least we ought to make him wear a name tag at IPC8. JimA

James C. Ahlstrom writes:
I'm afraid this non-distinctive look was introduced at IPC7... it's too bad we can't tell people Python was invented by the guy with the glasses anymore.
It's really the blue & white & orange IPC7 shirt. -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives

"JCA" == James C Ahlstrom <jim@interet.com> writes:
JCA> Oh no! Could it be true? I heard rumors but I refused to JCA> believe them until now. The glasses are gone! Guido now JCA> looks like an investment banker! The sky is falling! He's not the only one who's, like, "gone corporate", but I won't mention any names, so as to protect the guilty.

"Barry A. Warsaw" wrote:
He's not the only one who's, like, "gone corporate", but I won't mention any names, so as to protect the guilty.
OK, Buzz. Jim -- Jim Fulton mailto:jim@digicool.com Python Powered! Technical Director (888) 344-4332 http://www.python.org Digital Creations http://www.digicool.com http://www.zope.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats.

[Gordon]
[Guido]
Something just occurred to me: MS's guidelines aren't arbitrary, they actually have very good reasons. In the case of putting all an app's crucial info in the Registry, it's the only way to allow a site administrator to set policy and site options remotely (an admin can fiddle other machines' registries remotely). This works very well indeed when there's only "one copy" of an app on a machine (or at most one copy "per user"). What just occurred to me is that JimA is concerned with *not* letting any info from a previously-installed Python affect the app he's installing. Similarly, Gordon's Win32 "standalone installer" modifies python.exe and pythonw.exe to use a PYTHONPATH he forces, leaving the registry out of it. Similarly, the woes I've had in trying to sell Python as a general Win32 scripting tool at work mostly boil down to that there's no effortless way to do it that doesn't risk picking up info from-- or forcing info onto --pre-existing or future distinct Python installations (in contrast, Perl "just works" in this respect). IOW, the three of us find getting path info out of the registry intolerable because we are in fact trying to do the opposite of what the registry mechanism was *designed* for: we want perfect isolation, not perfect sharing. This has come up on Python-Help a few times too, in the guise of someone installing a product that in turn installs an older version of Python, which in turn confuses another product that relies on features in a newer version of Python. So while the traditional Windows .ini file (like Unix this-or-that.rc file) model was replaced by the registry for excellent reasons, those reasons don't apply to the way we're using Python! The .ini file model was exactly right for what most of us seem to want to do, and the registry model is exactly wrong. just-thought-i'd-cheer-you-up<wink>-ly y'rs - tim

Tim> So while the traditional Windows .ini file (like Unix Tim> this-or-that.rc file) model was replaced by the registry for Tim> excellent reasons, those reasons don't apply to the way we're using Tim> Python! The .ini file model was exactly right for what most of us Tim> seem to want to do, and the registry model is exactly wrong. Alright! Now I understand what all the hubbub is about! My eyes have mostly been glazing over trying to follow all this Windows registry/path/ini stuff. MS believes that Python is the application. Those of us writing Python programs view those programs as the applications, not the Python interpreter per se. Is there some way that people writing applications in Python can set up registry entries that are specific to their application (e.g. tabnanny.py) instead of only specific to the Python interpreter? Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/ 847-971-7098 | Python: Programming the way Guido indented...

Skip Montanaro wrote:
I think this is a good point. Windows app programmers (mostly) view Python as part of their app and try it install it in their app directory. Unix installs Python as a system app in multiple versions and users use PATH to pick a version. Unix users view the Python interpreter as a system service which is needed for running their app. I think this is because a Windows app is a visual program, and the Python release compiles to a console app (not really a visual program). So all (?most) Windows Python apps are custom mains with Python as a component, but the stock python.exe is not the main. This makes it difficult to document a way to install Python in the Unix fashion, since all apps need their own binary main and python15.dll is the only thing in common. IMHO archive files can solve this a lot more simply. JimA

[Skip Montanaro]
Eww -- that's a helpful and insightful way to put it, Skip! Now maybe *I* can understand what the hubbub is about <wink>.
Yes, but they can't get Python to look at those before it's too late. I spent a whole evening a month or two ago just trying to figure out where all the cruft in my Windows sys.path *came* from. This is out-of-the-box; I haven't added anything myself: ['', 'D:\\Python\\win32', 'D:\\Python\\win32\\lib', 'D:\\Python', 'D:\\Python\\Pythonwin', 'D:\\Python\\Lib\\plat-win', 'D:\\Python\\Lib', 'D:\\Python\\DLLs', 'D:\\Python\\Lib\\lib-tk', 'D:\\PYTHON\\DLLs', 'D:\\PYTHON\\lib', 'D:\\PYTHON\\lib\\plat-win', 'D:\\PYTHON\\lib\\lib-tk', 'D:\\PYTHON'] That's bizarre on the face of it, and tracking it all down was draining. I've forgotten the details. I do remember concluding that it was impossible to do what I wanted to do without changing the implementation, though, and nobody on Python-Dev disputed that at the time. In a pragmatic crunch, I wrote the little app I needed to distribute at the time in Perl instead, meaning to come back to this. I haven't had time. IIRC, the ultimate problem wasn't really that Python looked at the registry to get *some* path info, it was a combination of A) It looked at the registry so early that it was impossible to stop it from executing whatever site.py the registry pointed at (well, I could with the -S option -- but then there was no way to get it to do the site.py that was *wanted* instead). B) No way to override what was in the registry; e.g., I was greatly surprised to discover that setting a PYTHONPATH envar didn't override anything, it simply plunked the PYTHONPATH entries into sys.path along with everything else -- and too late to stop anything anyway. In a long msg I haven't yet read all the way thru, Guido at least suggested associating different registry path info with different Python versions. That would address a number of otherwise currently intractable problems. I suspect it still wouldn't help with the problem I was facing, though. That is, I wanted to be able to tell people to run \\dragres01\mrec\reduce\python \\dragres01\mrec\reduce\reduce.py which is just a Windows way of saying "run a Python executable from a shared network location". When they tried that, though, the network Python looked in *their* individual registries for its Python path info, and some of the hackers with mondo customized Python setups on their own machines watched things go down in flames. This certainly can't be a common problem, but it speaks to an unforgiving rigidity in the current approach. There seemed to be nothing I could do to guarantee this would work, short of telling users to edit their registries before running this tool (that's a non-starter on Windows -- editing the registry is dangerous) or putting a customized Python on the network pointing to a bogus registry key (it was faster to write the app in Perl! Perl doesn't *try* to be so infernally helpful <wink>, so doesn't get in the way either). I'm left wondering what purpose putting Python library path info into the Windows registry serves. Is there anyone on Windows who *doesn't* have their Python Lib/ etc as direct subdirectories of the directory containing python.exe? Not that I've seen. Python puts *those* in sys.path too -- but only after it (in the normal case; see my sys.path above) pulls identically redundant paths out of the registry first, or (in the cases we're griping about) pulls irrelevant or downright harmful paths out of the registry first (paths appropriate to the last Python you *installed*, not to the Python that's *running*!). Perhaps all this cruft is needed to support embedded Python, though (something I've never done). Regardless, I expect it would have been enough for me if PYTHONPATH simply worked the way I mistakenly assumed it would (that is, this is sys.path, and that's *it*; feel free to prepend the current directory when initialization is complete, but before then looking at any file not reached from PYTHONPATH is verboten). the-cleverer-the-code-the-more-vital-that-there-be-a-way-to- short-circuit-it-ly y'rs - tim

Tim Peters wrote:
Excellent discussion Tim!
I think a sensible way to run little apps is to put everything in an archive file including the main.py. On Windows you concattenate that to python.exe, and it Just Works.
Point on the curve. We don't. We freeze everything except the main.py. JimA

[Guido]
And actually, the business about separate subtrees for the machine's configuration and the user's configuration is pretty clever. MS doesn't explain it well, and it gets misused, but when done right, it's a lot simpler than the maze of .xxxrc files you sometimes find in other OSes.
In my Linux version, I went to the heart of the matter - getpath.c. It occurs to me that getpath.c might do better to follow a normal bootstrap process - ie, create the absolute minimal sys.path required to go to the next step. Then the rest of what goes on in getpath.c could be written in Python. Maybe that Python code needs to get frozen in (to prevent bozos from destroying an installation by stepping on getpath.py), but it would make it a lot easier to create independent installations, and also reduce the variations between platforms at the C level. (Then again, I've never heard of anyone stepping on exceptions.py.) If some registry manipulation primitives were exposed (say, through ntpath) that would mean that Windows developers could (if they wanted) play by the MS rules with at least the option of not stepping on each other. - Gordon

I agree. And I am guilty of not even try to find MS' explanation -- I just looked in the registry at what other apps did and tried to mimic that (plus what Mark had already done), without really knowing what I was doing. I now know a little better -- see the end of this message.
Yes, this is exactly what was proposed in the thread on the Big Import Rewrite.
That's a good idea. These functions are already available through Mark's win32api extension -- much of which will eventually (I hope before 1.6 is out!) become part of the core distribution. In the mean time, I've been thinking a bit more about how Python should be using the Windows registry. (It's clear to me that Python should use the registry -- those who disagree can go build their own Python distribution.) The basic ideas of Python's current registry usage are sound: there's a resource built into the DLL which is part of the key into the registry used for all information. The problem lies in which key is used. All versions of Python 1.5.x (1.5, 1.5.1, 1.5.2) use the same key! This is a main cause of trouble, because it means that different versions cannot peacefully live together even if the user installs them into different directories -- they will all use the registry keys of the last version installed. This, in turn, means that someone who writes a Python application that has a dependency on a particular Python version (and which application worth distributing doesn't :-) cannot trust that if a Python installation is present, it is the right one. But they also cannot simply bundle the standard installer for the correct Python version with their program, because its installation would overwrite an existing Python application, thus breaking some *other* Python apps that the user might already have installed. (There's a solution for app builders who are willing to do a lot of work -- you can change the registry key resource in the DLL. For example, Alice comes with its own version of Python 1.5.1 and it uses "1.5.1-alice" as its registry key. The Alice installer installs Python in a subdirectory of the Alice installation directory and points the 1.5.1-alice registry entries there. The problem is that this is a lot of work for the average app builder.) I thought a bit about how VB solves this. I think that when you wrap up a VB app in, all the support code (mostly a big DLL) is wrapped with it. When the user runs the installer, the DLL is installed (probably in the WINDOWS directory). If a user installs several VB apps built with the same VB version, they all attempt to install the exact same DLL; of course the installers notice this and optimize it away, keeping a reference count. (Ignoring for now the fact that those reference counts don't always work!) If an app builty with a different VB version is installed, it has a DLL with a different name, and that is installed separately. Other support files, I presume, are dealt with in much the same way. Voila, there's the theory. How can we do something similar for Python? A app written in Python should need to install only three or four files: - a driver EXE to start the app - a copy of the Python DLL - the Python library in an archive - the app code in an archive The latter two could be combined into a single archive, but I propose that we use two archives so that the DLL and the Python library archive can be shared between installations of independent Python apps as long as they use the exact same Python version and don't need additional 3rd party packages. (I believe that Jim A's proposal combines the archives with the EXE and the DLL, reducing the number of files to two. That's fine too.) Is there a use for the registry here at all? Maybe not. (I notice that VB seems to have a single registry entry, pointing to a DLL; all other VB files also seem to live there.) Complications: - Some apps may need a custom extension module, which has to be installed as a PYD file. So it seems that there needs to be a directory per app, and perhaps per version of the app (if the app distributor cares). - Some apps need other, non-pyc files (e.g. data tables or help files); it would be handy if these could be stored in the archives as well. - Some standard extension modules are in their own PYD files; these also need to be installed. They aren't typically marked with a version, so perhaps a path directory per version of Python (if not per installed app) is wise. - How to distribute an app that needs 3rd party stuff, e.g. Tcl/Tk, or PIL, or NumPy? Their Python code can easily be wrapped up in another archive with a standard name incorporating a version number; but the required PYD and DLL files are a separate story. (E.g. for Tkinter, you need _tkinter.pyd which links against tcl80.dll.) Basically the same solution as for standard PYD files can work; the needed DLL files can be installed either systemwide (if they have a reliable version number in their name, like tcl80.dll) or in the per-app or per-package directory (like NumPy). - Presumably, the archives will contain PYC files only. This means that tracebacks will not show source code, only line numbers. For Jim A, this is probably exactly what he wants (if the user gets a traceback, his "robust app" has miserably failed, and he takes it in pride that this doesn't happen). But for some others, access to the sources could be essential. For example, I might want to distribute IDLE using this mechanism; users of IDLE who are curious about the standard library (or about IDLE itself) should be able to open the source for an arbitrary module (and maybe even edit it, although that's not a priority and perhaps should even be discouraged). Library source access is an important feature of the IDLE debugger as well. A way out for IDLE is to install a classic distribution of the Python library sources, into the filesystem at an IDLE specific location. Other apps, with only the need for source code in tracebacks, might choose to to have the PY files in the archives sitting next to the PYC files, and somehow the traceback mechanism should be accessing the archive to get a hold of the source. And yes, I realize that Jim A's latest offering solves most of these problems to a large extent -- well done. (Jim, would you care to comment on the issues that you don't address? Will you address them in a future version?) Final notes: There are two different problems here. One is how to distribute Python apps robustly to end users who don't particular care about Python. This is Jim A's problem (and he has a solution that works for him). In general the solutions here try to isolate the installed app from other Python installations. I'm proposing that at least the DLL and the Python library archive can probably be shared between apps without reducing robustness if we keep track more carefully of version numbers. The other problem is how to distribute packages of Python and extension modules for use by Python users. These typically need to drop into some existing Python installation. This is Paul Dubois' problem with NumPy (amongst others) and is the current focus of the distutil SIG. However I believe that there could be a lot of common infrastructure that would help us create better solutions for both problems. For package distribution, common infrastructure (a.k.a. standards) is essential. For app distribution, common infrastructure isn't so important (since the solutions strive for total isolation, there's no problem if different apps use solutions). However, this changes when app creators want to distribute robust self-sufficient apps that use 3rd party packages -- then the 3rd party packages must allow being packaged up using the app distribution creator of choice. Solving this compound problem (creating package distributions that can be redistributed easily as part of robust Python app distributions) should be an important goal for the infrastructure we're building here. The Big Import Rewrite ought to add this to its list of objectives if it isn't already on it. My guess is that the solution for this compound problem will increase the dependency of app distribution tools on the package distribution infrastructure; which to me seems like a Good Thing because it would lead to more code sharing. --Guido van Rossum (home page: http://www.python.org/~guido/)

Briefly backtracking to an old thread: [Guido]
Right, that's one class of intractable problem under Windows. *Inside* my workplace, another kind of problem is caused when people try to make a Python app available over the Windows network. They stick the Python they want and its libraries out on the network, with python.exe in the same directory as the app. Now some people have highly customized Python setups, and the network Python picks up "the wrong" site.py etc. That sucks, and there appears no sane way to stop it. Telling internal app distributors they need to invent a unique registry key and fiddle their python.exe's resources is a non-starter. Ditto telling people with highly customized Pythons "don't do that". Ditto telling anyone they have to run any sort of installation script just to use a network app (sometimes they don't even know they're running it! e.g., when it's a subsystem invoked by another app). So while everyone is thinking about the hardest possible scenarios, please give a thought to the dirt simple one too <0.5 wink>: an app distributor who knows exactly what they're doing, and for whom *any* magical inference is simply a barrier to overcome. The latter can be satisfied by any number of means, from an envar that says "please don't try to be helpful, *this* is the directory you look in, and if you don't find stuff there give up" to a cmdline switch that says the same. Nothing Windows-specific there -- any OS with an envar or a cmdline will play along <wink>.
This is the way most *MS* DLLs work; stuff like the C runtime libraries and MS database drivers work exactly the same way. It's rare for pkgs other than MS's to attempt to use this mechanism, though (the reason is given below).
(Ignoring for now the fact that those reference counts don't always work!)
? They work very well, in my experience. Where they fail is when installers & uninstallers break the rules. MS publishes the list of MS DLLs that are to be treated this way: an installer "must" use refcounting on the DLLs in the list. Alas, some (especially older) installation pkgs don't. Then the refcounts get screwed up. That's what makes the mechanism brittle: "the system" doesn't enforce it, it relies on universal & intelligent cooperation. It's very likely that someone distributing a Python app will neglect (out of ignorance) to bump the refcount on their Python components, so the refcount will be artificially low, and a later uninstall of some unrelated pkg that *did* follow the rules will merrily delete Python. Gordon and I will repeat this until it sinks in <wink>: almost everyone with a successful Windows product ships the non-MS DLLs they rely on and copies them into their own app directory. It's simple and it works; alternatives are complicated and don't work. Many even ship & copy MS DLLs (e.g., Scriptics copies its own msvcrt.dll (the MS C runtime) into Tcl's directories). Worrying about space consumed by redundant Python components is a bad case of premature optimization <0.3 wink>.
... How can we do something similar for Python?
Seriously, short of getting MS to distribute Python and put the Python DLLs on The List of refcounted resources, we should pursue this line reluctantly if at all. MS may have a better scheme in the future, but for now better safe than sorry. a-couple-mb-on-a-modern-pc-isn't-worth-the-time-it-took- to-read-this<wink>-ly y'rs - tim

Tim Peters wrote:
The registry is still a bad idea because it lumps critical and app data into single files and brings up the ugly problem of protecting individual registry entries instead of just files. Microsoft should have put all app config into the app directory and provided for remote admin of that. But that is not really your point (just ranting about the registry again).
Or, in other words, no isolation is possible if critical info depends on global data like PYTHONPATH or a _common_ registry entry. We could have different registry entries, but this is confusing and not documented. I think we can solve this with archive files in a way compatible with Unix without going off on a Windows-only wavelength. If the archive file contains everything, and it is in the dir of the app, and the app looks there and finds it, then it Just Works. See also my reply to Skip. JimA

Eh? Doesn't work for me. This does: http://starship.python.net/crew/gmcm/distribute.html

[Guido]
[Great analysis, Tim!]
I beg to differ: it's internally inconsistent and should have identified at least 3 axes and hence at least 8 cases. Still, you got more than you paid for <wink>.
I'm not sure what stuff by which Gordon you're referring to.
You guessed right!
If it can install a whole app, what makes you suspect it couldn't install just a bunch of modules <0.5 wink>? It started life as Windows-only, and I believe it's been virtually ignored by non-Windows folk because of that. Bad blind spot. It supplies already-working approaches to many of the issues that are still being *talked* about on Distutils (at least archive formats, code to manipulate same, manifest files (how do you tell the tool which files to package?), and transparently bundling a Python interpreter when needed).
I include part of that in my case #4 above, where the app happens to be written in Pure Python -- but the user doesn't have to know that. Gordon is addressing at least that part of it. AFAIK he can't deal with transparently compiling C or exorcising Elvish on the target platform, but if you're just distributing the binaries I expect his work is directly usable already.
(And then there's still the distinction between Win32, Unix or both.)
I vote "both". The world really doesn't need another Win32-only (or Unix-only) installer, archive format, compression format, or distribution model. Jim seems mostly interested in Win32-only to me, and his concerns haven't been about the mechanics of distribution but about how-- regardless of tool --to create a bulletproof Python installation by hook or by crook. Last time we went thru this, it was concluded that one couldn't without patching the Python Windows binary with a resource editor (to point to its own infernal <0.5 wink> registry entries). Distutils hasn't talked about that at all (that I've seen, anyway); if there were a less radical approach to that, I suspect Jim would be delighted to use one of the commercial Win32 installation pkgs (and if that's what his customers expect, delighted or not that's what he'll do).
The current distutil dools don't deal with this at all.
That's why I said I thought what Gordon is doing seems more appropriate to case #4 than what Distutils has been doing.
I think it should though,
Ditto.
and I think its framework is powerful enough to be able to add this, e.g. as a new "appdist" command.
I cordially invite (since Gordon will uncordially browbeat <wink>) people to look seriously at what he's done. Best I can tell, for apps that don't need compilation "on the other end", it's mostly "there" already! give-the-man-a-hand-ly y'rs - tim

Tim Peters wrote:
Not exactly. I am interested in how to create a bullet-proof installation. But I am equally interested in Unix (especially Linux) and dislike the current dichotomy in the code base. Lately I have been more active in distribution via archive files. Part of the solution is an archive file format which is identical on Unix and Windows, and which can hold the Python library and packages as single files. For my own efforts on this see: ftp://ftp.interet.com/pub/pylib.html This is an archive file format similar to Gordon's format, although Gordon's work goes well beyond just file formats. I currently have fifth generation code for this format, and am adding features as suggested by Fredrik Lundt. I hope it gets considered as a candidate for a Python standard format.
Distutils hasn't talked about that at all (that I've seen, anyway);
Gordon, Greg Stein and I have discussed file formats before. I think it was on distutils. Anyway that was months ago. JimA

"James C. Ahlstrom" wrote: [...]
ftp://ftp.interet.com/pub/pylib.html
Ouch - what's wrong with zip archives? There are utilities to convert to/from zip, to re-pack, to mount zip transparently so it's entries look like regular files, FTP servers, etc. Both Java (jar) and Tcl (Jan Nijtman's "Wrap") have adopted this format. Zips would seem natural with JPython. And suppose that scripting ever starts to consolidate to a common scripting kernel (yah, well), do you really want a system which is closing all doors to cross-fertilization? Zip has an advantage over .tar.gz in that its table of contents is available without having to decompress the whole kaboodle. Your format has no checksum, which for deployment and long-term storage can be important. If you want a marshalled TOC, then why not add a manifest entry for it, sort of like what ranlib does with ar? You designed the format so archives can be concatenated without any tool (other than "cat"), but this works just as well with zip files, as the Tcl Wrap approach demonstrates. Allow me to very, very loosely paraphrase Guido here: sure, everyone can design an archive format, but they are likely to make the same mistakes all over again - so why not adopt a format which is tried and tested? With all due respect - I sincerely hope you will reconsider and alter your code to work with zip files. It's probably a small adjustment? Unless your *intent* is to create a diverging standard, of course... -- Jean-Claude

Jean-Claude Wippler replied:
Exactly my sentiments. We have rough Python code to deal with zip files; it's very rough because we got kind of carried away adding features and ended up with spaghetti code :-( But it's working code nevertheless and we're offering it up for anyone in this group to clean up (we could do that ourselves but it's not high on our current priority list). I don't know anything about Tcl Wrap. I do know a great deal about the ZIP format, but apparently I missed the concatenation feature. How does this work? Does that work for all zip tools, or just for the ZIP reader in Wrap? (I looked up how Jim A does it -- his central directory at the end of the file contains the total size of the data covered by that directory, so he seeks back to the beginning of it and sees if another magic number precedes it; and so on. Very simple.) I quickly looked at the Wrap page; it shows how to access data files stored in the archive. Question: does the wrap::open code go out to the regular filesystem if it finds there's no wrap archive? That would be handy so you can test the code in its unwrapped form without change. Python needs this too. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
[... my not-really-meant-as-rant about adopting zip as format ...]
[zip concatenation feature]
Same for Wrap. Standard tools would not see the preceding ZIP groups. In terms of maintenance, I'd avoid this trick. I merely wanted to point out that zip archives can be stacked, if the reader is set up to it.
IIRC, Wrap overrides "open" for embedded entries as "file.zip/abc.py". There's more being developed in this area: a "virtual file system" which lets you mount archives and such (VFS by Matt Newman, mentioned with his permission), so that the file-system model can be extended to navigate into a lot more things than real file systems. Andrew Kuchling's post hints at another tangent: opendir/readdir is of course simply an enumeration. There's a lot of "genericity" lurking in scanning across file systems, trees, networks, and resources in general. <minirant> The filesystem <-> OO dichotomy needs a review. </minirant>
Python needs this too.
<voice location=in-the-desert level=timid> Concepts like these have a lot to offer - and would make even more sense if they were done in a way which benefits multiple scripting languages. Feel free to reply by email if you ever want to further discuss this. </voice> -- Jean-Claude

"JW" == Jean-Claude Wippler <jcw@equi4.com> writes:
JW> Same for Wrap. Standard tools would not see the preceding ZIP JW> groups. JW> In terms of maintenance, I'd avoid this trick. I merely JW> wanted to point out that zip archives can be stacked, if the JW> reader is set up to it. I agree. I can't recall the details now, but I had a lot of problems with zip concatenation in JPython. I think at least some of the older Java tools for groking zips don't work with contatenation. -Barry

The Java "jar" tool mostly ignores the central directory -- it seems to read the archive from the front, using the local header records, and ignoring the central directory (of course it writes one when it creates an archive). --Guido van Rossum (home page: http://www.python.org/~guido/)

Jean-Claude Wippler:
I agree. We have experimented with this a bunch in the Knowbot sofware, where we have some code that wants to look at a "filesystem" but could be talking to some kind of filesystem emulation across an RPC connection or alternatively could be accessing a zip file. Our conclusion is that a convenient interface is modeled after (a subset of) the os and os.path functionality. In fact, the only thing you would need to add to the os module would be a function to open a file object; I've proposed to add os.fopen() as an alias for the built-in open(). The idea that you could mount one VFS inside another is nice, although I'm not sure how practical it is. For one thing, in our fs code, os.path.sep and friends (e.g. os.path.normcase behavior) were set per filesystem; what would happen if you mounted a Unix filesystem in an NT tree? Doing the translations is hard too; e.g. on a Mac fs, the separator is ':' and a '/' can be part of a filename -- do you simply swap them? What if a Mac file has both '/' and '\' and you mount it on a Windows FS? I'd rather stay away from this. On the other hand the VFS concept could be used as a totally different solution to the sys.importers vs. sys.path
I'd still rather see listdir() (which our sample virtual FS API supported). I don't think it necessarily makes sense to do this on a more generic basis -- other trees and graphs have sufficiently different semantics that using a FS like API doesn't necessarily cut it. Take for example the Windows registry -- looks a lot like a filesystem, doesn't it? Yet it has one fundamental property that a typical FS doesn't: directory nodes can have data *and* children... I've written a tree widget and found that it's remarkably hard to come up with a workable API to talk to trees *in general*. Trees are a universal concept, but code sharing is still elusive... Perhaps because the concept is so simple?
<minirant> The filesystem <-> OO dichotomy needs a review. </minirant>
I think that my proposal above should cover this. (We looked briefly at doing a similar thing for Java, and found that it's actually harder there -- they have all these nice objects representing paths, but it's not easily subclassable to represent paths in some virtual filesystem.)
I see only very hope for this point of view, but I will refrain to comment more. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum writes:
os.path.sep and friends (e.g. os.path.normcase behavior) were set per
Hah! Caught you in public! "sep" & friends are defined in the os module; this is where the separation breaks down. I think these should be located in os.path, and os can just pick them up from there to be backward compatible. os.pathsep is a problem, somewhat; it is related to os.sep, but is very different in many ways. I don't think there's a good way to deal with it.
And this is tightly related to the sep/pathsep problem as well. I agree, we should stay away from it.
But it was easy to create a set of interfaces with a reasonable API; getting back to the "typical" Java classes was what really changed the most. For those of us not working on the KOE: I set up Filesystem and FSFile interfaces; the Filesystem represented the entire filesystem and the FSFile was very similar to the java.io.File class, but had additional methods to get input and output stream objects (of the standard Java flavor); all the buffering and such could be wrapped on top of that just like any other Java I/O. The specific application was to provide access to an isolated directory structure which untrusted code "owned", but ensured that parent directories were unreachable. Additional security checks can be worked into such a structure as applicable. -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives

Guido van Rossum wrote:
[... horrors of cross-OS mounts and ":\/" separators ...] I agree, this has some very hairy sides to it. But VFS is really more about mounting non-FS things in a "root" FS (presumably the real one).
On the other hand the VFS concept could be used as a totally different solution to the sys.importers vs. sys.path
Heck, I'll be the "enfant terrible" once more: yes, and this stuff could well be implemented generically across scripting languages. Of course the act of "importing" is a very Pythonic issue - but FS/VFS traversal and the actual shared library load need not be. Anyway, enough of that.
What you're saying is that dir = set-of-subdirs + set-of-files, and that this is a more general requirement than plain FS's. Doesn't that simply mean that the more general model is needed as basis to handle both?
Trees are a universal concept, but code sharing is still elusive...
Ah, but think of the implications: archives, networks, XML, the world! -- Jean-Claude

Jean-Claude Wippler wrote:
Ouch - what's wrong with zip archives?
Thanks very much for looking over the format. In general Zip archives store whole branches of a file system. A Python ./Lib zip archive would contain: N:/python/Python-1.5.2/Lib/string.pyc N:/python/Python-1.5.2/Lib/os.pyc N:/python/Python-1.5.2/Lib/copy.pyc N:/python/Python-1.5.2/Lib/test/testall.pyc Zip archives are isomorphic to branches of a file system. That means there must be a sys.path for each zip archive file. How would this be specified? The archive format stores modules as dotted names, just as they appear in the import statement. The search path is "." in every archive file by definition. The import statement "import foo" just results in a dictionary lookup for key "foo", not a search through a zip directory along a local search path for "foo.something" where "something" can be pyc, pyo, py, etc. The intent was to link the archives to the import statement, not re-create a directory tree. It borrowed this feature from the archive formats of Greg and Gordon.
There are utilities to convert to/from zip, to re-pack, to mount zip transparently so it's entries look like regular files, FTP servers, etc.
Basic operations (to, from, repack) are easy in Python.
Both Java (jar) and Tcl (Jan Nijtman's "Wrap") have adopted this format.
Hmmm....
Your format has no checksum, which for deployment and long-term storage can be important.
Actually the pylib.py "dir()" method reads all *.pyc with marshal, and I am depending on marshal to object to bad data and also out-of-date magic numbers. But this is a good point.
If you want a marshalled TOC, then why not add a manifest entry for it, sort of like what ranlib does with ar?
Sorry, I don't understand. Please explain.
Are you saying that cat zip1.zip zip2.zip > myzip.zip works? An important feature is the ability to concatenate to a binary: cat python.exe zip1.zip > myapp.exe Searching for this isn't fast unless magic numbers are at the end. Are zip files recognizable from the end (I don't know)?
The intent is to create a standard but not a diverging standard. Are there any zip experts out there? Can zip files satisfy all the design requirements I listed in pylib.html? Is there zip code available? All my code is in Python. JimA

James C. Ahlstrom wrote:
Jean-Claude Wippler wrote:
Ouch - what's wrong with zip archives?
In general Zip archives store whole branches of a file system.
As I've stated before, I have 2 archive formats. This may seem a needless complication, but my suspicion is that sooner or later, people will want 2 different kinds. One is a .pyz format, which corresponds closely to Jim's .pyl format (with a number of minor differences: it's compressed, the archive as a whole has the Python magic number, instead of each entry, and it's not designed for concatenation). The other is like a zip, and probably should be zip format. It's designed to hold _anything_, and can be manipulated from C and from Python. It can be concatenated and / or embedded (and the innner one opened without extraction). It's table of contents is more file-system like. Importing from one is slower, but that's not really what it's for. It's for packaging up arbitrary resources. Like .pyz's, or Tcl/Tk for Tkinter apps, or configuration files. Jim is correct that a good importer (which can say "No, it's not mine" as quickly as possible) is better satisfied by a simple dictionary lookup than fooling with file extensions and directories (virtual or real).
The table of contents is just another entry.
Where do you think we got this idea?
Hmm. My bookmark appears to be dead (I was there not long ago): http://www.cubic.org/source/archive/fileform/packers/appnote.t xt There have been several references on this list to Guido et al having some Python / zip code. - Gordon

Not true. It's easy (using the proper Zip tools) to creat an archive containing this instead: string.pyc os.pyc copy.pyc testall.pyc Thus the entire archive is considered the directory. The Java "jar" tool uses this approach. It's also easy to have packages in there (again this is what Java does): test/ test/__init__.pyc test/pystone.pyc test_support.pyc (etc.)
Maybe you've gone overboard. The time it takes to translate the dots into slashes really isn't the big deal.
Yes (all of us here at CNRI), yes, yes (we have the spaghetti code). While zip files support compression, they support uncompressed files as well and we could go either way. Their most popular compression format is gzip compatible and can be read and written with the zlib module, which is in the standard Python distribution (even on Windows) -- though to build it you need the zlib C library which is of course external (but solid open source). --Guido van Rossum (home page: http://www.python.org/~guido/)

Jean-Claude Wippler wrote:
Ouch - what's wrong with zip archives?
With all due respect - I sincerely hope you will reconsider and alter your code to work with zip files. It's probably a small adjustment?
OK, you talked me into it. Ya, small adjustment, no problem ;-) JimA

OK, I now have a new module "zipfile" which reads and writes ZIP files. It is written in Python and has been tested on Windows and Linux. I tested it with WinZip and found that the files it creates are read OK with WinZip, and WinZip files are read OK with zipfile. So I am withdrawing my Python archive file format, and re-writing all my stuff using zipfile. It should all be done in a week. Basically everything works fine. But there are some problems. Python seems to lack a CRC-32 function, so I wrote one in Python. It is slow. We need to add a CRC-32 function to some Python built-in module that it always present, like md5 or binascci. The zlib module is not necessarily present. I can't seem to get WinZip to record a partial path. That is, I want the ./Lib/test package to have these ZIP paths: test/__init__.pyc test/testall.pyc ... but WinZip creates files with either no path at all or the fully specified path. Am I missing something? Do all other ZIP tools do this too? JimA Return-Path: <owner-python-dev@python.org> Delivered-To: python-dev@dinsdale.python.org Received: from python.org (parrot.python.org [132.151.1.90]) by dinsdale.python.org (Postfix) with ESMTP id EFDA11CDB9 for <python-dev@dinsdale.python.org>; Mon, 13 Dec 1999 10:21:56 -0500 (EST) Received: from cnri.reston.va.us (ns.CNRI.Reston.VA.US [132.151.1.1] (may be forged)) by python.org (8.9.1a/8.9.1) with ESMTP id KAA06423 for <python-dev@python.org>; Mon, 13 Dec 1999 10:21:55 -0500 (EST) Received: from kaluha.cnri.reston.va.us (kaluha.cnri.reston.va.us [132.151.7.31]) by cnri.reston.va.us (8.9.1a/8.9.1) with ESMTP id KAA04774 for <python-dev@python.org>; Mon, 13 Dec 1999 10:21:56 -0500 (EST) Received: from eric.cnri.reston.va.us (eric.cnri.reston.va.us [10.27.10.23]) by kaluha.cnri.reston.va.us (8.9.1b+Sun/8.9.1) with ESMTP id KAA04556 for <python-dev@python.org>; Mon, 13 Dec 1999 10:22:34 -0500 (EST) Received: from CNRI.Reston.VA.US (localhost [127.0.0.1]) by eric.cnri.reston.va.us (8.9.3+Sun/8.9.1) with ESMTP id KAA18858 for <python-dev@python.org>; Mon, 13 Dec 1999 10:22:34 -0500 (EST) Resent-Message-Id: <199912131522.KAA18858@eric.cnri.reston.va.us> Message-Id: <199912131522.KAA18858@eric.cnri.reston.va.us> To: "James C. Ahlstrom" <jim@interet.com> Subject: Re: [Python-Dev] Re: [Distutils] Questions about distutils strategy In-reply-to: Your message of "Mon, 13 Dec 1999 09:50:11 EST." <385507A3.9F6AAF0F@interet.com> References: <000301bf4206$b39e5b80$36a2143f@tim> <384FC47A.BB4DA517@interet.com> <384FDAF5.C25C447C@equi4.com> <38510254.ED15D32B@interet.com> <385507A3.9F6AAF0F@interet.com> Date: Mon, 13 Dec 1999 10:22:12 -0500 From: Guido van Rossum <guido@CNRI.Reston.VA.US> Resent-Cc: python-dev@python.org Resent-Date: Mon, 13 Dec 1999 10:22:34 -0500 Resent-From: Guido van Rossum <guido@CNRI.Reston.VA.US> Sender: python-dev-admin@python.org Errors-To: python-dev-admin@python.org X-BeenThere: python-dev@python.org X-Mailman-Version: 1.2 (experimental) Precedence: bulk List-Id: Python core developers <python-dev.python.org>
Ah, good! (This saves me the trouble of cleaning up our own zip code :-)
Unclick the "Save Extra Folder Info" and then drag the *parent* folder into the archive. --Guido van Rossum (home page: http://www.python.org/~guido/)

On Mon, 13 Dec 1999, James C. Ahlstrom wrote:
Can you post zipfile.py so that people can starting reviewing that?
See zlib.crc32() This is interesting, of course, because we have previously stated that zlib (and its compression) is optional. But if we need the CRC-32 function... hehe... Cheers, -g -- Greg Stein, http://www.lyra.org/

On Tue, 14 Dec 1999, James C. Ahlstrom wrote:
My point was that people could possibly use it *before* then. Not everybody needs it to be pretty, needs doc, or needs it fully working. Maybe people would like to provide feedback on the API. Maybe they'd like to start their own modules that use your library. This goes back to my years-old statement: release it now rather than later -- people can always use it now, and there might not be a later. Release early. Release often. :-) People are too hesitant to release code. Why? Just send it out there. When you update it, send out another. It doesn't hurt anybody to have more than one release. Cheers, -g -- Greg Stein, http://www.lyra.org/

Greg Stein wrote:
Release early. Release often. :-)
You are right of course. OK, the zipfile.py code and docs are at: ftp://ftp.interet.com/pub/pylib.html Despite the ftp URL, clicking on it should display the html. Please don't panic if is seems to be slow. It uses a Python CRC-32 which is slow. You may want to hack it to use zlib.crc32() if you have it. I am testing with WinZip. If you have another zip tool, it would be interesting to see how compatible it is. JimA

Did anyone look at this yet? ftp://ftp.interet.com/pub/pylib.html ftp://ftp.interet.com/pub/zipfile.py JimA

JA> Did anyone look at this yet? JA> ftp://ftp.interet.com/pub/pylib.html JA> ftp://ftp.interet.com/pub/zipfile.py I thought it wasn't supposed to be out until Monday? You're looking for, perhaps, a time machine? ;-) (More seriously, it won't have any effect on my "gotta have this done yesterday" list, so I will let others comment...) Skip

"James C. Ahlstrom" wrote:
ftp://ftp.interet.com/pub/pylib.html
I just changed zipfile.py so that regular zip compression works. And if zlib is available, its crc32() is used instead of the Python version. I should mention that the current code rejects zip files which have an archive comment added to the end. Accepting them would require a search, and I am not sure it is worth it. JimA

"James C. Ahlstrom" wrote:
I don't think it is needed for our purposes, but maybe a subclass could provide it ? FYI, I've tested the module against mxStack-0.3.0.zip which you can find on my Python Pages. It was created using Info-ZIP's zip 2.2 on Linux. Unfortunately, I always get the following traceback when trying to print the directory:
Some notes on the API: ---------------------- * I would find it more convenient if the filename and mode would be constructor parameters, e.g. zfile = zipfile('myfile.zip','rb') with compression defaulting to 8 rather than 0 (most zip files will be deflated since this is the ZIP default). * Also, I would like a method much like the os.listdir() which returns a list of filenames rather than print it to stdout. * .is_zipfile() should probably be a separate function: it doesn't use any of the class' features. More wishes to come ;-) So far: Great Work ! Aside: I found that you are using undocumented arguments to zlib.compressobj() ... are these extra arguments left out of the documentation on purpose or by simple oversight ? I couldn't find them in the HTML docs and neither in the docstrings. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 15 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

On Thu, 16 Dec 1999, M.-A. Lemburg wrote:
The above two items were in my ramble, just not as clear as MAL :-)
* .is_zipfile() should probably be a separate function: it doesn't use any of the class' features.
Ah! Good call. It is even more important to shift it out if the constructor now opens a file. Cheers, -g -- Greg Stein, http://www.lyra.org/

M.-A. Lemburg writes:
The documentation is way out of date and Jeremy Hylton and Andrew Kuchling haven't updated it. I'm not sure which of them changed the signatures for that module, but I've pestered Jeremy about it a few times. If anyone would like to update the documentation, I'd certainly appreciate it. I don't know the details of those interfaces, and this is somewhere where the details are pretty critical. -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives

"M.-A. Lemburg" wrote:
"James C. Ahlstrom" wrote:
ftp://ftp.interet.com/pub/pylib.html
Unfortunately, I always get the following traceback when trying to print the directory:
Yes, compression isn't there yet. I am looking into it.
OK, done.
with compression defaulting to 8 rather than 0 (most zip files will be deflated since this is the ZIP default).
Until compression works, and zlib ships with Python I would rather default to no compression (method 0). Otherwise this is not useful as a Python import archive.
OK, done.
* .is_zipfile() should probably be a separate function: it doesn't use any of the class' features.
OK, done.
I am following the CNRI code blindly here. I don't have docs either. JimA

"James C. Ahlstrom" wrote:
Great :-)
Point taken. Perhaps it would be even better to not have a default at all: that way people will have to think about the issue *before* implementing it, rather than debug code that produces tracebacks.
Thanks, -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 13 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
participants (14)
-
Barry A. Warsaw
-
bckfnn@pipmail.dknet.dk
-
Fred L. Drake, Jr.
-
Fredrik Lundh
-
Gordon McMillan
-
Greg Stein
-
Guido van Rossum
-
James C. Ahlstrom
-
Jean-Claude Wippler
-
Jeremy Hylton
-
Jim Fulton
-
M.-A. Lemburg
-
Skip Montanaro
-
Tim Peters