[Python-ideas] Gzip and zip extra field

Andrew Barnert abarnert at yahoo.com
Sun Nov 17 01:19:37 CET 2013


Are any of the gzip standard extra fields in common usage today? I tried lookup up the URLs listed in the definitions; one is a 404 page and another is just an image with links to someone's Facebook and similar personal pages.

As for the zip extra fields, at least some of them seem like they're only useful if the zip module actually interprets them. For example, given a zip64, if zipfile can read/extract a 5GB file directly, you have no need to look at the Zip64 extra info directly; if it can't do so, you won't get any useful benefit out of looking at the extra info. Other fields might be useful for building a more powerful wrapper module around zipfile--e.g., you could substitute the native NTFS or POSIX timestamps in the extra info for the possibly-less-accurate normal zip timestamps.

Sent from a random iPhone

On Nov 16, 2013, at 12:58, Serhiy Storchaka <storchaka at gmail.com> wrote:

> 29.05.13 16:25, Serhiy Storchaka написав(ла):
>> Gzip files can contains an extra field [1] and some applications use
>> this for extending gzip format. The current GzipFile implementation
>> ignores this field on input and doesn't allow to create a new file with
>> an extra field.
>> 
>> ZIP file entries also can contains an extra field [2]. Currently it just
>> saved as bytes in the `extra` attribute of ZipInfo.
>> 
>> I propose to save an extra field for gzip file and provide structural
>> access to subfields.
>> 
>> f = gzip.GzipFile('somefile.gz', 'rb')
>> f.extra_bytes # A raw extra field as bytes
>> # iterating over all subfields
>> for xid, data in f.extra_map.items():
>>     ...
>> # get Apollo file type information
>> f.extra_map[b'AP'] # (or f.extra_map['AP']?)
>> # creating gzip file with extra field
>> f = gzip.GzipFile('somefile.gz', 'wb', extra=extrabytes)
>> f = gzip.GzipFile('somefile.gz', 'wb', extra=[(b'AP', apollodata)])
>> f = gzip.GzipFile('somefile.gz', 'wb', extra={b'AP': apollodata})
>> # change Apollo file type information
>> f.extra_map[b'AP'] = ...
>> 
>> Issue #17681 [3] has preliminary patches. There is some open doubt about
>> interface. Is not it over-engineered?
>> 
>> Currently GzipFile supports seamless reading a sequence of separately
>> compressed gzip files. Every such chunk can have own extra field (this
>> is used in dictzip for example). It would be desirable to be able to
>> read only until the end of current chunk in order not to miss an extra
>> field.
>> 
>> [1] http://www.gzip.org/format.txt
>> [2] http://www.pkware.com/documents/casestudies/APPNOTE.TXT
>> [3] http://bugs.python.org/issue17681
> 
> Is anyone interested in this feature? It needs bikeshedding.
> 
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas


More information about the Python-ideas mailing list