[issue18321] Multivolume support in tarfile module

Lars Gustäbel report at bugs.python.org
Sun Apr 13 14:11:17 CEST 2014


Lars Gustäbel added the comment:

> [...] but remember, we split a volume only in the middle of a big file, not in any other case (AFAIK). Hopefully you don't get huge pax headers or anything strange. [...] 

Hopefully? Sorry, but have you tested this? I did. I let GNU tar create a two volume archive that is split exactly between the two blocks of an XHDTYPE pax header.

The result is terrifying. At the beginning of the second volume GNU tar creates an XGLTYPE header as the pax replacement for a GNUTYPE_MULTIVOL header, followed by an XHDTYPE header ("GNUFileParts") that somehow decorates the following REGTYPE(!) tar header that contains the continuation of the split XHDTYPE header data from the previous volume. After that comes the REGTYPE file that the split XHDTYPE header was actually meant for as decoration.

I attached the archive to this issue.

What happens if a GNUTYPE_LONGNAME header is split in two? I don't wanna know...


> write() will need to take into account blocks (BLOCKSIZE), just to be able to split the volumes correctly.

It is mandatory to do the split on a block boundary (a multiple of 512).


> * multivolume logic in write() needs read/write access to the current tarinfo being written [...]. How do you propose this object should be accessed from write()?

I don't know and this problem seems to be quite hard to address with my approach. That's too bad.


> > BTW, my version of GNU tar refuses to create compressed multiple-volume archives which is why I doubt the usefulness of this feature overall.
> But it has multivolume support right? Which is what I am proposing here. Also, you can gzip (or encrypt or anything) the volumes after creating the volumes..

Yeah, it has multivolume support, but a very limited one that is not only weird but isn't even usable together with compression. And sure, I can compress and encrypt the volumes afterwards, but I can also create a compressed archive and pipe it through split(1) to split it into parts. Both ways create tar archives that are not readable by GNU tar because they're non-standard. So what?

Please tell me, what is your actual personal use-case for this feature?

----------
Added file: http://bugs.python.org/file34798/split-xhdtype.tar.gz

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue18321>
_______________________________________


More information about the Python-bugs-list mailing list