Problem with 2.3b2 tarfile?

Was the 2.3b2 tarfile generated in such a way that it requires GNU tar? I'm unable to extract it on a Solaris 8 system using /usr/bin/tar. It gunzips fine and I can extract it on my Mac OS X system, but on Solaris I get, "tar: directory checksum error" and fine a truncated directory tree. I've also verified that the files downloaded to the Mac and the Solaris box have the same checksum. Skip

On Mon, 2003-06-30 at 13:45, Skip Montanaro wrote:
I generated the tarball using the PEP 101 suggestion, which is the same way I generated the 2.2.3 tarball: tar cf - Python-2.3b2 | gzip -9 > Python-2.3b2.tgz so it's possible there are GNU tar-isms in the file (I don't have access to a non-GNU tar). Did you have a problem with the 2.2.3 tarball? I haven't seen any other reports of problems with either. -Barry

>> Was the 2.3b2 tarfile generated in such a way that it requires GNU >> tar? Barry> I generated the tarball using the PEP 101 suggestion, which is Barry> the same way I generated the 2.2.3 tarball: Barry> tar cf - Python-2.3b2 | gzip -9 > Python-2.3b2.tgz Barry> so it's possible there are GNU tar-isms in the file (I don't have Barry> access to a non-GNU tar). I just tried creating a tarfile on my Mac using "tar cfoO". The -o and -O flags are described as -O Write old-style (non-POSIX) archives. -o Don't write directory information that the older (V7) style tar is unable to decode. This implies the -O flag. I still got a directory checksum error. Next step was to try extracting the tarfile (that generated with -o and -O) on another Solaris 8 system. That worked fine, so I will assume the problem is on the first machine I tried. Skip

Skip> Next step was to try extracting the tarfile (that generated with Skip> -o and -O) on another Solaris 8 system. That worked fine, so I Skip> will assume the problem is on the first machine I tried. Belay that. The other machine had GNU tar in /usr/local/bin... With /usr/bin/tar I get the same problem. Off-list John Abel suggested perhaps it might be a path length problem. I was more concerned about filenames containing spaces or non-ASCII characters but didn't find any smoking guns. Looking at the characters in the filenames, I didn't see any files whose name wasn't matched by this regular expression: [-:A-Za-z.0-9 _()]+ In addition to several files containing spaces, there is a directory component named "(vise)" which seems a bit odd, but which should be okay for tar. Next check was on the length of things as John had suggested. The longest component is only 110 characters: ./Mac/OSXResources/app/Resources/English.lproj/Documentation/macpython_ide_tutorial/entering_in_new_window.gif but is in the directory where tar craps out. I then generated another tarfile which excluded just the .../macpython_ide_tutorial directory. That extracts fine: bash-2.03$ /usr/bin/tar xf ../tmp/Py2.3.tar bash-2.03$ Just or Jack, is there any chance we can rearrange the Mac subtree to shorten up that long path? I don't know if it's the number of directory components or the total length of the path that's causing the problem. In the meantime, perhaps it's worth noting on the website that extracting the tarfile on some platforms may fail. Skip

Barry> I've updated the bugs page to remove the kaput bugs and add a Barry> note about extraction problems on Solaris 8. Thanks. I found this item on the Haskell website: http://www.haskell.org/pipermail/hugs-bugs/2002-November/001045.html Looks like Solaris (<= 8) tar has a pathname limit of 80 characters. Should we try and squeeze the Python directory tree into that limit or simply tell people to use GNU tar if their tar barfs? Skip

On Mon, Jun 30, 2003 at 03:10:05PM -0500, Skip Montanaro wrote:
... provide a "macless" tarball for problem systems, and enforce an 80 char limit on it? is this a traditional tar or Solaris tar limitation? It's not clear, having RTFA'd. It sounds like the 80-char limit is Solaris, and the 100-char limit is traditional? Jeff

Jeff> ... provide a "macless" tarball for problem systems, and enforce Jeff> an 80 char limit on it? I think that would be troublesome. Some people would pick up the wrong tarball, and it would require extra work on the part of whoever creates a distribution tarball. Jeff> is this a traditional tar or Solaris tar limitation? It's not Jeff> clear, having RTFA'd. It sounds like the 80-char limit is Jeff> Solaris, and the 100-char limit is traditional? I'm game for just recommending people get GNU tar, but it's available on all the platforms I care about. Other people may not be so fortunate. Skip

On Mon, Jun 30, 2003 at 03:22:11PM -0500, Skip Montanaro wrote:
I'm game for just recommending people get GNU tar, but it's available on all the platforms I care about. Other people may not be so fortunate.
Me, too. I think even on the more godforesaken platforms I use (hpux8) I could compile gnu tar. Jeff

[Behrang Dadsetan]
As a very wild guess I would say any platform where python could compile, GNU tar will compile as well
A last message about this. :-) The ideal is to create POSIX archives. GNU `tar' is reasonably conforming when unpacking an archive, is often conforming when creating a new one, but not necessarily. When it errs, it errs _big_. Sun `tar' attempts to be POSIX conforming, and may be more POSIX conforming on the average of all aspects. However, whenever they goof, they usually try to stay compatible with their previous mistakes for the sake of their own users. The most known example is how to compute the header checksum (there are other examples and issues, I'm just taking the most known to illustrate how the history goes). When Sun recompiled Unix `tar' from sources, they missed that their machines were not signing chars the same way, but of course, they were able to read the archives they were creating. Later, `HP-UX' purposely made the same error "for being compatible with Sun". POSIX defined the checksum according to the original and documented design, making Sun and HP-UX officially wrong. On reading an archive, GNU `tar' computes two checksums, one the POSIX way, the other the Sun way, and ensures (at least) one of them matches. On creation, GNU `tar' uses the POSIX checksum. The discrepancy only shows if there is some character with the high bit set. Sun might be computing two checksum as well, nowadays. Declaring that Solaris is right, or GNU `tar' is right, is not the best way to ponder all this. The end line is they are all wrong, even GNU `tar' has its good share of horrors. A reasonable way to go is that we all attempt to follow POSIX standards, whether we like them or not, however strange they may be. In this way, there is some chance that we converge, instead of fighting like cats in the backyard. Unless you take the bet that GNU `tar' will squash them all? It will happen if Linux takes over the world! :-) -- François Pinard http://www.iro.umontreal.ca/~pinard

[Jeff Epler]
the 100-char limit is traditional?
Traditional. POSIX extends the limit somewhat, but the rules are very strange. There are two separated fields in the header, and whenever the second field is non-empty, a slash is to be _implied_ between both, so you cannot cut the path string anywhere. One source of incompatibility from GNU tar came from the fact the FSF staff used that second field to hold binary information for GNU extensions, like for sparse files information. To go over 100 characters, GNU tar used its own ways. I remember that while studying these problems in deep detail, I was dismayed to see than when GNU extensions were added, the guys who did it already had a copy of the POSIX draft (it was only a draft then), in which the intent of actually using the second field, was clearly stated. In any case, I devised a long and careful migration plan (accepted by RMS at the time) for better POSIX-ifying GNU tar, but I fear this plan has been abandoned with the change of maintainer. If I remember correctly, GNU tar is more or less able to decipher POSIX archives, but you should not rely on it for generating them in some unusual conditions, like when file names are long. But I may be all wrong, as I did not follow recent changes. -- François Pinard http://www.iro.umontreal.ca/~pinard

On Monday 30 June 2003 23:19, François Pinard wrote:
On HP-UX 11.x you hit the same problem than on Solaris, with a bit different error message: bash-2.05a$ /usr/bin/tar xf Python-2.3b2.tar tar: ././@LongLink - cannot create tar: ././@LongLink - cannot create ... I've run into the same problem with our in-house software - my solution was to use HP-UX 11.x tar to create the archive, both GNU tar and posix tar will correctly unpack it. Harri

The 80 is a misreading of that Haskell message. There is only a 100 char limit. (The message just explains that the shortest name remaining is 80, which is less than 100.) --Guido van Rossum (home page: http://www.python.org/~guido/)

[Barry Warsaw]
(unless someone else is motivated to generate Solaris-friendly tarballs).
Thinking loud. I remember that GNU `tar' already has a few ugly hacks to inter-operate despite serious glitches (or blunt bugs) in Sun and Solaris `tar's. Moreover, the last maintainer of `tar' was very fond of Solaris. P.S. - I often objected that there are so many `Solarisms' in GNU. I was somewhat pitiful for users of old hardware, from deceased companies. Before Linux existed, such users clearly did not have the monetary means to do better. But Sun? They, rather than us, should support their users... :-) -- François Pinard http://www.iro.umontreal.ca/~pinard

This is the culprit. In a "classic" tar file, a filename can be only 100 characters. (See Lib/tarfile.py. :-)
The total length must be <= 100. --Guido van Rossum (home page: http://www.python.org/~guido/)

Sorry for the late reply; very busy. On Monday, Jun 30, 2003, at 23:55 Europe/Amsterdam, Guido van Rossum wrote:
If this is the only culprit it is easy to fix. But we should think of a more general fix for the future. -- Jack Jansen, <Jack.Jansen@cwi.nl>, http://www.cwi.nl/~jack If I can't dance I don't want to be part of your revolution -- Emma Goldman

On Tue, Jul 1 2003 Jack Jansen wrote:
While you're at it, can you also fix something else that's been bugging me since I first started using CVS on Windows: Whenever I do a "cvs update" (-dP implied) I get warnings about the "Mac/IDE scripts" directory (this is using Cygwin): $ cd "Mac/IDE scripts"; cvs update ? Hold option to open a script ? Insert file name ? Insert folder name ? Search Python Documentation ? Hack/Remove .pyc files ? Hack/Toolbox Assistant cvs server: Updating . cvs server: Updating Hack cvs server: Updating Widget demos The problem is, Windows doesn't allow multiple periods in file names, and all these file names should end in "...". Not using -d is not a reasonable option. -- Sjoerd Mullender <sjoerd@acm.org>

On Mon, 2003-06-30 at 13:45, Skip Montanaro wrote:
I generated the tarball using the PEP 101 suggestion, which is the same way I generated the 2.2.3 tarball: tar cf - Python-2.3b2 | gzip -9 > Python-2.3b2.tgz so it's possible there are GNU tar-isms in the file (I don't have access to a non-GNU tar). Did you have a problem with the 2.2.3 tarball? I haven't seen any other reports of problems with either. -Barry

>> Was the 2.3b2 tarfile generated in such a way that it requires GNU >> tar? Barry> I generated the tarball using the PEP 101 suggestion, which is Barry> the same way I generated the 2.2.3 tarball: Barry> tar cf - Python-2.3b2 | gzip -9 > Python-2.3b2.tgz Barry> so it's possible there are GNU tar-isms in the file (I don't have Barry> access to a non-GNU tar). I just tried creating a tarfile on my Mac using "tar cfoO". The -o and -O flags are described as -O Write old-style (non-POSIX) archives. -o Don't write directory information that the older (V7) style tar is unable to decode. This implies the -O flag. I still got a directory checksum error. Next step was to try extracting the tarfile (that generated with -o and -O) on another Solaris 8 system. That worked fine, so I will assume the problem is on the first machine I tried. Skip

Skip> Next step was to try extracting the tarfile (that generated with Skip> -o and -O) on another Solaris 8 system. That worked fine, so I Skip> will assume the problem is on the first machine I tried. Belay that. The other machine had GNU tar in /usr/local/bin... With /usr/bin/tar I get the same problem. Off-list John Abel suggested perhaps it might be a path length problem. I was more concerned about filenames containing spaces or non-ASCII characters but didn't find any smoking guns. Looking at the characters in the filenames, I didn't see any files whose name wasn't matched by this regular expression: [-:A-Za-z.0-9 _()]+ In addition to several files containing spaces, there is a directory component named "(vise)" which seems a bit odd, but which should be okay for tar. Next check was on the length of things as John had suggested. The longest component is only 110 characters: ./Mac/OSXResources/app/Resources/English.lproj/Documentation/macpython_ide_tutorial/entering_in_new_window.gif but is in the directory where tar craps out. I then generated another tarfile which excluded just the .../macpython_ide_tutorial directory. That extracts fine: bash-2.03$ /usr/bin/tar xf ../tmp/Py2.3.tar bash-2.03$ Just or Jack, is there any chance we can rearrange the Mac subtree to shorten up that long path? I don't know if it's the number of directory components or the total length of the path that's causing the problem. In the meantime, perhaps it's worth noting on the website that extracting the tarfile on some platforms may fail. Skip

Barry> I've updated the bugs page to remove the kaput bugs and add a Barry> note about extraction problems on Solaris 8. Thanks. I found this item on the Haskell website: http://www.haskell.org/pipermail/hugs-bugs/2002-November/001045.html Looks like Solaris (<= 8) tar has a pathname limit of 80 characters. Should we try and squeeze the Python directory tree into that limit or simply tell people to use GNU tar if their tar barfs? Skip

On Mon, Jun 30, 2003 at 03:10:05PM -0500, Skip Montanaro wrote:
... provide a "macless" tarball for problem systems, and enforce an 80 char limit on it? is this a traditional tar or Solaris tar limitation? It's not clear, having RTFA'd. It sounds like the 80-char limit is Solaris, and the 100-char limit is traditional? Jeff

Jeff> ... provide a "macless" tarball for problem systems, and enforce Jeff> an 80 char limit on it? I think that would be troublesome. Some people would pick up the wrong tarball, and it would require extra work on the part of whoever creates a distribution tarball. Jeff> is this a traditional tar or Solaris tar limitation? It's not Jeff> clear, having RTFA'd. It sounds like the 80-char limit is Jeff> Solaris, and the 100-char limit is traditional? I'm game for just recommending people get GNU tar, but it's available on all the platforms I care about. Other people may not be so fortunate. Skip

On Mon, Jun 30, 2003 at 03:22:11PM -0500, Skip Montanaro wrote:
I'm game for just recommending people get GNU tar, but it's available on all the platforms I care about. Other people may not be so fortunate.
Me, too. I think even on the more godforesaken platforms I use (hpux8) I could compile gnu tar. Jeff

[Behrang Dadsetan]
As a very wild guess I would say any platform where python could compile, GNU tar will compile as well
A last message about this. :-) The ideal is to create POSIX archives. GNU `tar' is reasonably conforming when unpacking an archive, is often conforming when creating a new one, but not necessarily. When it errs, it errs _big_. Sun `tar' attempts to be POSIX conforming, and may be more POSIX conforming on the average of all aspects. However, whenever they goof, they usually try to stay compatible with their previous mistakes for the sake of their own users. The most known example is how to compute the header checksum (there are other examples and issues, I'm just taking the most known to illustrate how the history goes). When Sun recompiled Unix `tar' from sources, they missed that their machines were not signing chars the same way, but of course, they were able to read the archives they were creating. Later, `HP-UX' purposely made the same error "for being compatible with Sun". POSIX defined the checksum according to the original and documented design, making Sun and HP-UX officially wrong. On reading an archive, GNU `tar' computes two checksums, one the POSIX way, the other the Sun way, and ensures (at least) one of them matches. On creation, GNU `tar' uses the POSIX checksum. The discrepancy only shows if there is some character with the high bit set. Sun might be computing two checksum as well, nowadays. Declaring that Solaris is right, or GNU `tar' is right, is not the best way to ponder all this. The end line is they are all wrong, even GNU `tar' has its good share of horrors. A reasonable way to go is that we all attempt to follow POSIX standards, whether we like them or not, however strange they may be. In this way, there is some chance that we converge, instead of fighting like cats in the backyard. Unless you take the bet that GNU `tar' will squash them all? It will happen if Linux takes over the world! :-) -- François Pinard http://www.iro.umontreal.ca/~pinard

[Jeff Epler]
the 100-char limit is traditional?
Traditional. POSIX extends the limit somewhat, but the rules are very strange. There are two separated fields in the header, and whenever the second field is non-empty, a slash is to be _implied_ between both, so you cannot cut the path string anywhere. One source of incompatibility from GNU tar came from the fact the FSF staff used that second field to hold binary information for GNU extensions, like for sparse files information. To go over 100 characters, GNU tar used its own ways. I remember that while studying these problems in deep detail, I was dismayed to see than when GNU extensions were added, the guys who did it already had a copy of the POSIX draft (it was only a draft then), in which the intent of actually using the second field, was clearly stated. In any case, I devised a long and careful migration plan (accepted by RMS at the time) for better POSIX-ifying GNU tar, but I fear this plan has been abandoned with the change of maintainer. If I remember correctly, GNU tar is more or less able to decipher POSIX archives, but you should not rely on it for generating them in some unusual conditions, like when file names are long. But I may be all wrong, as I did not follow recent changes. -- François Pinard http://www.iro.umontreal.ca/~pinard

On Monday 30 June 2003 23:19, François Pinard wrote:
On HP-UX 11.x you hit the same problem than on Solaris, with a bit different error message: bash-2.05a$ /usr/bin/tar xf Python-2.3b2.tar tar: ././@LongLink - cannot create tar: ././@LongLink - cannot create ... I've run into the same problem with our in-house software - my solution was to use HP-UX 11.x tar to create the archive, both GNU tar and posix tar will correctly unpack it. Harri

The 80 is a misreading of that Haskell message. There is only a 100 char limit. (The message just explains that the shortest name remaining is 80, which is less than 100.) --Guido van Rossum (home page: http://www.python.org/~guido/)

[Barry Warsaw]
(unless someone else is motivated to generate Solaris-friendly tarballs).
Thinking loud. I remember that GNU `tar' already has a few ugly hacks to inter-operate despite serious glitches (or blunt bugs) in Sun and Solaris `tar's. Moreover, the last maintainer of `tar' was very fond of Solaris. P.S. - I often objected that there are so many `Solarisms' in GNU. I was somewhat pitiful for users of old hardware, from deceased companies. Before Linux existed, such users clearly did not have the monetary means to do better. But Sun? They, rather than us, should support their users... :-) -- François Pinard http://www.iro.umontreal.ca/~pinard

This is the culprit. In a "classic" tar file, a filename can be only 100 characters. (See Lib/tarfile.py. :-)
The total length must be <= 100. --Guido van Rossum (home page: http://www.python.org/~guido/)

Sorry for the late reply; very busy. On Monday, Jun 30, 2003, at 23:55 Europe/Amsterdam, Guido van Rossum wrote:
If this is the only culprit it is easy to fix. But we should think of a more general fix for the future. -- Jack Jansen, <Jack.Jansen@cwi.nl>, http://www.cwi.nl/~jack If I can't dance I don't want to be part of your revolution -- Emma Goldman

On Tue, Jul 1 2003 Jack Jansen wrote:
While you're at it, can you also fix something else that's been bugging me since I first started using CVS on Windows: Whenever I do a "cvs update" (-dP implied) I get warnings about the "Mac/IDE scripts" directory (this is using Cygwin): $ cd "Mac/IDE scripts"; cvs update ? Hold option to open a script ? Insert file name ? Insert folder name ? Search Python Documentation ? Hack/Remove .pyc files ? Hack/Toolbox Assistant cvs server: Updating . cvs server: Updating Hack cvs server: Updating Widget demos The problem is, Windows doesn't allow multiple periods in file names, and all these file names should end in "...". Not using -d is not a reasonable option. -- Sjoerd Mullender <sjoerd@acm.org>
participants (9)
-
Barry Warsaw
-
Behrang Dadsetan
-
Guido van Rossum
-
Harri Pasanen
-
Jack Jansen
-
Jeff Epler
-
pinard@iro.umontreal.ca
-
Sjoerd Mullender
-
Skip Montanaro