Gzip and zip extra field
Gzip files can contains an extra field [1] and some applications use this for extending gzip format. The current GzipFile implementation ignores this field on input and doesn't allow to create a new file with an extra field. ZIP file entries also can contains an extra field [2]. Currently it just saved as bytes in the `extra` attribute of ZipInfo. I propose to save an extra field for gzip file and provide structural access to subfields. f = gzip.GzipFile('somefile.gz', 'rb') f.extra_bytes # A raw extra field as bytes # iterating over all subfields for xid, data in f.extra_map.items(): ... # get Apollo file type information f.extra_map[b'AP'] # (or f.extra_map['AP']?) # creating gzip file with extra field f = gzip.GzipFile('somefile.gz', 'wb', extra=extrabytes) f = gzip.GzipFile('somefile.gz', 'wb', extra=[(b'AP', apollodata)]) f = gzip.GzipFile('somefile.gz', 'wb', extra={b'AP': apollodata}) # change Apollo file type information f.extra_map[b'AP'] = ... Issue #17681 [3] has preliminary patches. There is some open doubt about interface. Is not it over-engineered? Currently GzipFile supports seamless reading a sequence of separately compressed gzip files. Every such chunk can have own extra field (this is used in dictzip for example). It would be desirable to be able to read only until the end of current chunk in order not to miss an extra field. [1] http://www.gzip.org/format.txt [2] http://www.pkware.com/documents/casestudies/APPNOTE.TXT [3] http://bugs.python.org/issue17681
29.05.13 16:25, Serhiy Storchaka написав(ла):
Gzip files can contains an extra field [1] and some applications use this for extending gzip format. The current GzipFile implementation ignores this field on input and doesn't allow to create a new file with an extra field.
ZIP file entries also can contains an extra field [2]. Currently it just saved as bytes in the `extra` attribute of ZipInfo.
I propose to save an extra field for gzip file and provide structural access to subfields.
f = gzip.GzipFile('somefile.gz', 'rb') f.extra_bytes # A raw extra field as bytes # iterating over all subfields for xid, data in f.extra_map.items(): ... # get Apollo file type information f.extra_map[b'AP'] # (or f.extra_map['AP']?) # creating gzip file with extra field f = gzip.GzipFile('somefile.gz', 'wb', extra=extrabytes) f = gzip.GzipFile('somefile.gz', 'wb', extra=[(b'AP', apollodata)]) f = gzip.GzipFile('somefile.gz', 'wb', extra={b'AP': apollodata}) # change Apollo file type information f.extra_map[b'AP'] = ...
Issue #17681 [3] has preliminary patches. There is some open doubt about interface. Is not it over-engineered?
Currently GzipFile supports seamless reading a sequence of separately compressed gzip files. Every such chunk can have own extra field (this is used in dictzip for example). It would be desirable to be able to read only until the end of current chunk in order not to miss an extra field.
[1] http://www.gzip.org/format.txt [2] http://www.pkware.com/documents/casestudies/APPNOTE.TXT [3] http://bugs.python.org/issue17681
Is anyone interested in this feature? It needs bikeshedding.
Are any of the gzip standard extra fields in common usage today? I tried lookup up the URLs listed in the definitions; one is a 404 page and another is just an image with links to someone's Facebook and similar personal pages. As for the zip extra fields, at least some of them seem like they're only useful if the zip module actually interprets them. For example, given a zip64, if zipfile can read/extract a 5GB file directly, you have no need to look at the Zip64 extra info directly; if it can't do so, you won't get any useful benefit out of looking at the extra info. Other fields might be useful for building a more powerful wrapper module around zipfile--e.g., you could substitute the native NTFS or POSIX timestamps in the extra info for the possibly-less-accurate normal zip timestamps. Sent from a random iPhone On Nov 16, 2013, at 12:58, Serhiy Storchaka <storchaka@gmail.com> wrote:
29.05.13 16:25, Serhiy Storchaka написав(ла):
Gzip files can contains an extra field [1] and some applications use this for extending gzip format. The current GzipFile implementation ignores this field on input and doesn't allow to create a new file with an extra field.
ZIP file entries also can contains an extra field [2]. Currently it just saved as bytes in the `extra` attribute of ZipInfo.
I propose to save an extra field for gzip file and provide structural access to subfields.
f = gzip.GzipFile('somefile.gz', 'rb') f.extra_bytes # A raw extra field as bytes # iterating over all subfields for xid, data in f.extra_map.items(): ... # get Apollo file type information f.extra_map[b'AP'] # (or f.extra_map['AP']?) # creating gzip file with extra field f = gzip.GzipFile('somefile.gz', 'wb', extra=extrabytes) f = gzip.GzipFile('somefile.gz', 'wb', extra=[(b'AP', apollodata)]) f = gzip.GzipFile('somefile.gz', 'wb', extra={b'AP': apollodata}) # change Apollo file type information f.extra_map[b'AP'] = ...
Issue #17681 [3] has preliminary patches. There is some open doubt about interface. Is not it over-engineered?
Currently GzipFile supports seamless reading a sequence of separately compressed gzip files. Every such chunk can have own extra field (this is used in dictzip for example). It would be desirable to be able to read only until the end of current chunk in order not to miss an extra field.
[1] http://www.gzip.org/format.txt [2] http://www.pkware.com/documents/casestudies/APPNOTE.TXT [3] http://bugs.python.org/issue17681
Is anyone interested in this feature? It needs bikeshedding.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas
17.11.13 02:19, Andrew Barnert написав(ла):
Are any of the gzip standard extra fields in common usage today? I tried lookup up the URLs listed in the definitions; one is a 404 page and another is just an image with links to someone's Facebook and similar personal pages.
dictzip and BGZF use gzip format with different random access extensions. Both are very popular in their domains.
As for the zip extra fields, at least some of them seem like they're only useful if the zip module actually interprets them. For example, given a zip64, if zipfile can read/extract a 5GB file directly, you have no need to look at the Zip64 extra info directly; if it can't do so, you won't get any useful benefit out of looking at the extra info. Other fields might be useful for building a more powerful wrapper module around zipfile--e.g., you could substitute the native NTFS or POSIX timestamps in the extra info for the possibly-less-accurate normal zip timestamps.
In the first place high-level support of extra field will simplify ZIP64 support in zipfile (and will made it less buggy). Also it will help with support of UTF-8 filenames and extended file attributes.
participants (3)
-
Andrew Barnert
-
Eric Snow
-
Serhiy Storchaka