Planning to drop gzip compression for future releases.
At this point, I'm planning to drop the gzip-compressed archives for all future Python releases. The bzip2 archives are much smaller (saving bandwidth, disk space, and download time), and supporting software seems to have become widely available in both free and commercial tools. I'm still planning to make ZIP archives available. If anyone would like to argue that I should drop that as well, feel free. ;-) -Fred -- Fred L. Drake, Jr. <fdrake at acm.org>
At this point, I'm planning to drop the gzip-compressed archives for all future Python releases. The bzip2 archives are much smaller (saving bandwidth, disk space, and download time), and supporting software seems to have become widely available in both free and commercial tools.
Sounds good. When are we going to start offering a bzip2 library in Python?
I'm still planning to make ZIP archives available. If anyone would like to argue that I should drop that as well, feel free. ;-)
Zip has been the de-facto standard for compression in the windows world for around 10 years. While other formats are making inroads (rar, ace, bzip2, etc.), they are not supported by the most popular windows archiver, WinZip: http://www.download.com/sort/3150-2250-0-1-4.html? When the most popular compression tool for Windows starts offering bzip2 compression, then it seems like a good idea to toss the zip file format. - Josiah
On Fri, Sep 17, 2004 at 01:09:47PM -0400, Fred L. Drake, Jr. wrote:
I'm still planning to make ZIP archives available. If anyone would like to argue that I should drop that as well, feel free. ;-)
1. the main archive software packages for all OSes support tar.bz2 in their current releases. (This includes WinZip, WinRAR and whatnot.) 2. if you can't be bothered to know what is a tar.bz2 and how to open it, you won't be getting the ZIP, but rather the EXE installer. []s, |alo +---- -- Those who trade freedom for security lose both and deserve neither. -- http://www.laranja.org/ mailto:lalo@laranja.org pgp key: http://garfield.laranja.org/~lalo/gpgkey-signed.asc GNU: never give up freedom http://www.gnu.org/
[Lalo Martins]
1. the main archive software packages for all OSes support tar.bz2 in their current releases. (This includes WinZip, WinRAR and whatnot.)
WinZip 9.0 SR-1 (which is the current release) does not support bz2.
Lalo Martins wrote:
On Fri, Sep 17, 2004 at 01:09:47PM -0400, Fred L. Drake, Jr. wrote:
I'm still planning to make ZIP archives available. If anyone would like to argue that I should drop that as well, feel free. ;-)
1. the main archive software packages for all OSes support tar.bz2 in their current releases. (This includes WinZip, WinRAR and whatnot.)
2. if you can't be bothered to know what is a tar.bz2 and how to open it, you won't be getting the ZIP, but rather the EXE installer.
.zip is the only one of these 3 formats that allows you to decompress a few files without expanding the entire archive. This feature is useful to me at least (and makes up for the larger size). -- -- Scott David Daniels Scott.Daniels@Acm.Org
Scott David Daniels
.zip is the only one of these 3 formats that allows you to decompress a few files without expanding the entire archive. This feature is useful to me at least (and makes up for the larger size).
tar supports that as well, and with better compression when paired with bzip2.
Hint: tar xzf archive [file] [...]
Charles
--
-----------------------------------------------------------------------
Charles Cazabon
Charles Cazabon wrote:
Scott David Daniels
wrote: .zip is the only one of these 3 formats that allows you to decompress a few files without expanding the entire archive. This feature is useful to me at least (and makes up for the larger size).
tar supports that as well, and with better compression when paired with bzip2. Hint: tar xzf archive [file] [...]
Right, but the only way it can extract the last file ofthetar archive is to expand the entire arcive (in order to determine the bytes at the end of the archive). .zip looks in the directory for the file, reads the bytes representing the compressed file (and only that file), and uses them to expand the file to its original version.
Charles
-- -- Scott David Daniels Scott.Daniels@Acm.Org
On Sep 17, 2004, at 2:14 PM, Lalo Martins wrote:
On Fri, Sep 17, 2004 at 01:09:47PM -0400, Fred L. Drake, Jr. wrote:
I'm still planning to make ZIP archives available. If anyone would like to argue that I should drop that as well, feel free. ;-)
1. the main archive software packages for all OSes support tar.bz2 in their current releases. (This includes WinZip, WinRAR and whatnot.)
If we're only talking binary releases, then I don't really care, but please don't make this change for the source releases. There are several platforms on which Python is supported which do not support bzip2 out of the box (Solaris, as a prime example). It adds just that much more heartache to get python installed on such a system. -- Nick
Nick Bastin wrote:
If we're only talking binary releases, then I don't really care, but please don't make this change for the source releases. There are several platforms on which Python is supported which do not support bzip2 out of the box (Solaris, as a prime example). It adds just that much more heartache to get python installed on such a system.
agreed. it may come as a surprise to some people, but Linux is not the only Unix system out there. Python works extremely well on non-Linux systems too... </F>
Fredrik Lundh wrote:
agreed. it may come as a surprise to some people, but Linux is not the only Unix system out there. Python works extremely well on non-Linux systems too...
But then, a Unix system does not have gzip, either. So we probably should use compress(1), or, better yet, distribute uncompressed tar files. Perhaps we should use cpio instead, or pax, because we need to avoid GNU tar extensions. Maybe IP isn't available, either, so we should ship QIC tapes. On Solaris, bzip2 is in the SUNWbzipS package, and installs into /usr/bin. Regards, Martin P.S. Just found this on compress(1) of Solaris 9: NOTES Although compressed files are compatible between machines with large memory, -b 12 should be used for file transfer to architectures with a small process data space (64KB or less). Solaris 9 requires a 512MB swap partition for installation, and the installer makes heavy use of Java...
Martin v. Löwis wrote:
agreed. it may come as a surprise to some people, but Linux is not the only Unix system out there. Python works extremely well on non-Linux systems too...
But then, a Unix system does not have gzip, either.
Of the build systems I checked, all had gunzip, most had unzip, but only the Linux systems had bunzip2. The bzip2 homepage contains 1.0.2 binaries for exactly three plat- forms, compared to over 20 systems for gzip and 30 systems for unzip. I suppose older bzip2 versions (0.9.5) are compatible, but someone should verify that they work before you pull the gzip archives.
Maybe IP isn't available, either, so we should ship QIC tapes.
That's a really helpful comment. </F>
Fredrik Lundh wrote:
Of the build systems I checked, all had gunzip, most had unzip, but only the Linux systems had bunzip2.
Sure, there are systems that don't have bunzip2 installed. However, what is the problem of installing it? All you need is a C compiler, and I'm sure you have one - how else are you going to install Python? And if building bzip2 yourself is a problem for some reason I cannot imagine, then what is the problem with using a prebuilt binary? As I said, Solaris (atleast Solaris 9) comes with bzip2. If you have an older Solaris release, you can get a binary from sunfreeware.com. For HP-UX, you can get it from the HP porting center, e.g. http://hpux.asknet.de/hppd/hpux/Misc/bzip2-1.0.2/ (both PA-RISC and Itanium binaries, for 10.20, 11.00, 11.20, and 11.22) For AIX, you can get it from http://www.bullfreeware.com/. What other systems have you been looking at? Regards, Martin
Martin v. Löwis wrote:
Fredrik Lundh wrote:
Of the build systems I checked, all had gunzip, most had unzip, but only the Linux systems had bunzip2.
Sure, there are systems that don't have bunzip2 installed. However, what is the problem of installing it? All you need is a C compiler, and I'm sure you have one - how else are you going to install Python?
Yes, those with older, bzip2less systems can probably figure out how to get it and build it, but why force them when it's practically no work keeping it? It's one (sic) extra command for the release manager and ~9M extra disk space per release on www.python.org. And besides that... only GNU tar supports the j flag. <wink> Erik
Erik Heneryd wrote:
It's one (sic) extra command for the release manager and ~9M extra disk space per release on www.python.org.
but at 50 cents a gigabyte, and an endless stream of alphas and release candidates, that might turn out to be rather expensive. oh wait, you wrote megabytes, not gigabytes. </F>
Erik Heneryd wrote:
Yes, those with older, bzip2less systems can probably figure out how to get it and build it, but why force them when it's practically no work keeping it? It's one (sic) extra command for the release manager and ~9M extra disk space per release on www.python.org.
Fred wouldn't have asked if it was no effort in keeping it. There is certainly more than one command to it - you have to md5sum the file, and copy the md5sum into the release notes. You have to upload the file from your workstation to python.org. I don't know how you do that, but I need to use my DSL link for uploading the MSI files; it takes roughly 30min to upload. Fortunately, I have a DSL flatrate. Regards, Martin
Martin v. Löwis wrote:
Erik Heneryd wrote:
Yes, those with older, bzip2less systems can probably figure out how to get it and build it, but why force them when it's practically no work keeping it? It's one (sic) extra command for the release manager and ~9M extra disk space per release on www.python.org.
Fred wouldn't have asked if it was no effort in keeping it. There is certainly more than one command to it - you have to md5sum the file, and copy the md5sum into the release notes. You have to upload the file from your workstation to python.org. I don't know how you do that, but I need to use my DSL link for uploading the MSI files; it takes roughly 30min to upload. Fortunately, I have a DSL flatrate.
Regards, Martin
Yeah, I was a bit hasty. Sure, it's more than one command: * pack it * unpack it * diff it against the known-to-be-good bzip2 tree * md5sum it and add that to the release notes * add another link on the download page * ...something else? but I still think my point stands - it's not that much work, really, and it'd be a nice service to those with bzip2less systems. Regarding upload times I guess I'm just another spoiled swede; I've been on ethernet for so long I can barely remember what 5k/s was like... That said, I do realise that it all adds up and that doing a release take some work, so whether you decide to keep it or not: thanks. Erik
Martin v. Löwis wrote:
Fred wouldn't have asked if it was no effort in keeping it. There is certainly more than one command to it - you have to md5sum the file, and copy the md5sum into the release notes. You have to upload the file from your workstation to python.org. I don't know how you do that, but I need to use my DSL link for uploading the MSI files; it takes roughly 30min to upload. Fortunately, I have a DSL flatrate.
Again, we're only talking about the documentation tarballs. I'm still
going to be making both tar.gz and tar.bz2 format source releases -
yes, it's a bit more work (gotta gpg sign both, upload both) but I'm
completely unconvinced that forcing people to install bzip2 everywhere
is a useful approach.
Anthony
--
Anthony Baxter
On Sat, Sep 18, 2004, Fredrik Lundh wrote:
Of the build systems I checked, all had gunzip, most had unzip, but only the Linux systems had bunzip2.
The bzip2 homepage contains 1.0.2 binaries for exactly three plat- forms, compared to over 20 systems for gzip and 30 systems for unzip. I suppose older bzip2 versions (0.9.5) are compatible, but someone should verify that they work before you pull the gzip archives.
Granted that bz2-only isn't a viable option, what does gz give us over bz2/zip that makes it worthwhile to keep? -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines." --Ralph Waldo Emerson
damien morton wrote:
Umm, gzip compression is also one of the possible http compression algorithms. bz2 isnt.
What does HTTP compression have to do with whether we have a gzipped release of Python? My personal take on all of this is that we make the release manager's job as simple as possible. That means either ditch gzip files or ditch bzip2 files. If we stick with gzip we basically eat the bandwidth cost. If we go with bzip2 we need to link to where to get the source to compile, if not host a copy of the bzip2 source ourselves. But either way I completely sympathize with the release managers and I am all for making people's lives easier at release time. So I say we should go with bzip2. While we might get our bandwidth for free thanks to the good graces of XS4ALL and Thomas, I don't think we should view it as infinite since they are still footing the bill. If we can do something easily that would reduce their cost enough to buy Thomas a soda I think we should do it. If that means some people need to go download some free software, then so be it. Considering Python has practically no required tools beyond a C compiler we have rather low dependency requirements for UNIX in my eyes. Hell, bzip2's source is less than the difference between 2.4's bzip2 source package compared to the gzip one. We could have a copy of the latest bzip2 on our server for people to download and we would still save on bandwidth even when people need both Python and bzip2. Plus, without starting a flame war, bzip2 is under a BSD license so it gets a gold star from me. =) -Brett
Brett C. wrote:
But either way I completely sympathize with the release managers and I am all for making people's lives easier at release time.
Yep. I suppose that's what this is all about. Should we add 5 minutes of work for: 1) the release manager 2) the n (small integer) people with bzip2less systems Think 1) is the way to go, at least for finals. Oh, whatever, I don't even really care. I'll shut up now. Erik
On Sat, Sep 18, 2004 at 10:46:10PM +0200, Erik Heneryd wrote:
Yep. I suppose that's what this is all about. Should we add 5 minutes of work for:
1) the release manager
Add 5 minutes for EVERY release.
2) the n (small integer) people with bzip2less systems
Add 5 minutes to install bzip2 ONCE and forever. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd@phd.pp.ru Programmers don't die, they just GOSUB without RETURN.
On Sep 18, 2004, at 4:57 PM, Oleg Broytmann wrote:
On Sat, Sep 18, 2004 at 10:46:10PM +0200, Erik Heneryd wrote:
Yep. I suppose that's what this is all about. Should we add 5 minutes of work for:
1) the release manager
Add 5 minutes for EVERY release.
2) the n (small integer) people with bzip2less systems
Add 5 minutes to install bzip2 ONCE and forever.
Sure, on every machine that you need to install python on (and it isn't 5 minutes either - most solaris machines aren't that fast). That's assuming that it's acceptable to your corporation to just be adding software to your unix machines that hasn't gone through a qualification process. -- Nick
Are there site statistics that show the current relative demand for .gz versus .bz2? Neil
"When in doubt, don't pass." If there was all around agreement to drop gzip, I'd say go for it. But since there isn't, let's keep supporting it and test the waters again in a year or two. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Neil Hodgson wrote:
Are there site statistics that show the current relative demand for .gz versus .bz2?
Within the last few days (since the logs rotated on Sep 13), there have been 1095 accesses to Python-2.3.4.tar.bz2, and 5168 to Python-2.3.4.tgz. Regards, Martin
Martin v. Löwis wrote:
Within the last few days (since the logs rotated on Sep 13), there have been 1095 accesses to Python-2.3.4.tar.bz2, and 5168 to Python-2.3.4.tgz.
so given the "we'll save 5 minutes for each release, and users stuck with gzip only loses 5 minutes each" rationale, I assume this means that some- one's planning to make 314400 Python releases over the next year? </F>
Fredrik Lundh wrote:
so given the "we'll save 5 minutes for each release, and users stuck with gzip only loses 5 minutes each" rationale, I assume this means that some- one's planning to make 314400 Python releases over the next year?
Talking about helpful comments... Regards, Martin
On Sat, 2004-09-18 at 15:57, Brett C. wrote:
My personal take on all of this is that we make the release manager's job as simple as possible.
Although if someone from the community wanted to volunteer to build tgz files, that might go a long way toward keeping this option available. Disk space on python.org isn't (or shouldn't be) an issue. -Barry
Barry Warsaw wrote:
On Sat, 2004-09-18 at 15:57, Brett C. wrote:
My personal take on all of this is that we make the release manager's job as simple as possible.
Although if someone from the community wanted to volunteer to build tgz files, that might go a long way toward keeping this option available. Disk space on python.org isn't (or shouldn't be) an issue.
Sure. I could build the tar.gz if given a tar.bz2/cvs pointer, though I personally think even the coordination overhead wouldn't make it worthwhile. If nothing else, just to end this IMHO silly thread. Erik
Brett C wrote:
My personal take on all of this is that we make the release manager's job as simple as possible.
first the "no abbreviations in the standard library" and now "who cares about users; releases are for the release manager". have you even seen a Python user in real life? </F>
Brett C. wrote:
My personal take on all of this is that we make the release manager's job as simple as possible. That means either ditch gzip files or ditch bzip2 files.
I disagree, almost 100%. The job of release management is to make
it as easy as possible for people to get and use Python. The language
isn't being organised for _my_ benefit.
Last I looked, tar.bz2 was less than 1/4 of tar.gz in terms of number
of downloads (see http://www.python.org/wwwstats/usage_200409.html)
That's hardly a case for switching.
--
Anthony Baxter
Anthony Baxter wrote:
Last I looked, tar.bz2 was less than 1/4 of tar.gz in terms of number of downloads (see http://www.python.org/wwwstats/usage_200409.html) That's hardly a case for switching.
Although that may partly result from http://www.python.org/download/ referring to the .tgz only. Regards, Martin
Nick Bastin wrote:
If we're only talking binary releases, then I don't really care, but please don't make this change for the source releases. There are several platforms on which Python is supported which do not support bzip2 out of the box (Solaris, as a prime example). It adds just that much more heartache to get python installed on such a system.
I have no intention of dropping tar.gz source releases. I think Fred
was talking about the documentation tarballs. Even then, I think there's
some advantages to keeping both, and I don't really see the advantage
to dropping the tar.gz format. But hey, that's up to Fred - he's the
one who makes the doc releases.
Anthony
--
Anthony Baxter
[Responding to an absolutely enormous deluge of emails on python-dev...] On Sunday 19 September 2004 10:06 pm, Anthony Baxter wrote:
I have no intention of dropping tar.gz source releases. I think Fred was talking about the documentation tarballs. Even then, I think there's some advantages to keeping both, and I don't really see the advantage to dropping the tar.gz format. But hey, that's up to Fred - he's the one who makes the doc releases.
Dang, it doesn't pay to be away from email for three days, does it? Yes, I was only talking about documentation releases. It never occurred to me anyone would think I was talking about Python source releases. Maybe I shouldn't have added python-dev to the recipients list for my original email, but too often objections get heard quite late if I don't include python-dev. For the documentation, there's a much longer history of providing the bz2 versions of the archives. There are also many more archives we can drop per release. While in theory disk space isn't supposed to be an issue, it seems to be something our sysadmin group is dealing with on a regular basis (mostly cleaning up old webserver logs). So while the space itself may not be an issue, it certainly generates tedious work for volunteers. My motivation in dropping the bz2 archives is two-fold: 1. Reduce disk space consumed per release, mostly to ease the burden on the sysadmin group. 2. Reduce the number of files posted for the documentation per release, so that choices for end-users are easier to pick through. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org>
I'm not sure how easy or difficult it would be---but it would be very
convenient for me if the documentation was also downloadable in
windows help (CHM) format. Currently CHM files are only available in
windows installers, but I use them on Linux (easier to search, etc).
-param
On Tue, 21 Sep 2004 10:36:15 -0400, Fred L. Drake, Jr.
[Responding to an absolutely enormous deluge of emails on python-dev...]
On Sunday 19 September 2004 10:06 pm, Anthony Baxter wrote:
I have no intention of dropping tar.gz source releases. I think Fred was talking about the documentation tarballs. Even then, I think there's some advantages to keeping both, and I don't really see the advantage to dropping the tar.gz format. But hey, that's up to Fred - he's the one who makes the doc releases.
Dang, it doesn't pay to be away from email for three days, does it?
Yes, I was only talking about documentation releases. It never occurred to me anyone would think I was talking about Python source releases. Maybe I shouldn't have added python-dev to the recipients list for my original email, but too often objections get heard quite late if I don't include python-dev.
For the documentation, there's a much longer history of providing the bz2 versions of the archives. There are also many more archives we can drop per release.
While in theory disk space isn't supposed to be an issue, it seems to be something our sysadmin group is dealing with on a regular basis (mostly cleaning up old webserver logs). So while the space itself may not be an issue, it certainly generates tedious work for volunteers.
My motivation in dropping the bz2 archives is two-fold:
1. Reduce disk space consumed per release, mostly to ease the burden on the sysadmin group.
2. Reduce the number of files posted for the documentation per release, so that choices for end-users are easier to pick through.
-Fred
-- Fred L. Drake, Jr. <fdrake at acm.org>
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/psoberoi%40gmail.com
On Tuesday 21 September 2004 11:31 am, Paramjit Oberoi wrote:
I'm not sure how easy or difficult it would be---but it would be very convenient for me if the documentation was also downloadable in windows help (CHM) format. Currently CHM files are only available in windows installers, but I use them on Linux (easier to search, etc).
You do? What software supports them? It would be cool to have a decent single-file documentation browser on Linux. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org>
Fred wrote:
You do? What software supports them? It would be cool to have a decent single-file documentation browser on Linux.
You do? What software supports them? It would be cool to have a decent single-file documentation browser on Linux.
/F pointed out xchm - http://xchm.sourceforge.net/ - that's what I use. There is also GnoCHM - http://gnochm.sourceforge.net/ - which wasn't completely stable when I last tried it. But it's written in Python/PyGTK. Both these readers use CHMLIB - http://66.93.236.84/~jedwin/projects/chmlib/ - to read CHM files. Python bindings for this library are available from the GnoCHM project: http://gnochm.sourceforge.net/pychm.html -param
Paramjit Oberoi wrote:
I'm not sure how easy or difficult it would be---but it would be very convenient for me if the documentation was also downloadable in windows help (CHM) format. Currently CHM files are only available in windows installers, but I use them on Linux (easier to search, etc).
I could do this along with the Windows installer releases. However, I don't think I can do this whenever Fred makes a snapshot release; I also doubt that Fred can easily do this on his own, since the documentation is build on Unix, and the CHM file on Windows. Regards, Martin
I could do this along with the Windows installer releases. However, I don't think I can do this whenever Fred makes a snapshot release;
That would be perfectly adequate. The documentation doesn't change that much between snapshots, and anyway, I usually don't use snapshot releases... As far as I am concerned, just having the CHM files corresponding to the offical releases would be fine. Thanks, -param
Fred Drake wrote:
While in theory disk space isn't supposed to be an issue, it seems to be something our sysadmin group is dealing with on a regular basis (mostly cleaning up old webserver logs). So while the space itself may not be an issue, it certainly generates tedious work for volunteers.
I thought everyone knew that logs always fill up until the disk is almost full, no matter how much disk you have. </F>
From what I understand, the algorithmic behavior of bz2 and gz are completely different -- while gzip is incremental, bz2 requires memory proportional to the size of the source information. Furthermore, most browsers now support gzip compression for their web pages, it will quite some time before bz2 support is ubiquitous. Unless these two issues are different than I understand them, I'd
Fred, prefer if gzip remain in the standard Python distribution. Best, Clark
*blush* I read the post wrong, please disregard my comment.
participants (20)
-
"Martin v. Löwis"
-
Aahz
-
Anthony Baxter
-
Barry Warsaw
-
Brett C.
-
Charles Cazabon
-
Clark C. Evans
-
damien morton
-
Erik Heneryd
-
Fred L. Drake, Jr.
-
Fredrik Lundh
-
Guido van Rossum
-
Josiah Carlson
-
Lalo Martins
-
Neil Hodgson
-
Nick Bastin
-
Oleg Broytmann
-
Paramjit Oberoi
-
Scott David Daniels
-
Tim Peters