Help with python-list archives
MRAB
python at mrabarnett.plus.com
Thu Jan 5 20:27:00 EST 2012
On 06/01/2012 00:10, Ian Kelly wrote:
> On Thu, Jan 5, 2012 at 4:52 PM, random joe<pywin32 at gmail.com> wrote:
>> Sure. Take the most recent file as example. "2012 - January.txt.gz".
>> If you use the python doc example this is the result. If i use "r" or
>> "rb" the result is the same.
>>
>>>>> import gzip
>>>>> f1 = gzip.open('C:\\2012-January.txt.gz', 'rb')
>>>>> data = f1.read()
>>>>> data[:100]
>> '\x1f\x8b\x08\x08x\n\x05O\x02\xff/srv/mailman/archives/private/python-
>> list/2012-January.txt\x00\xec\xbdy\x7f\xdb\xc6\xb50\xfcw\xf0)\xa6z|+
>> \xaa!!l\xdc\x14[\x8b-;V\xe2-\x92\x12'
>>>>> f2 = gzip.open('C:\\2012-January.txt.gz', 'r')
>>>>> data = f2.read()
>>>>> data[:100]
>> '\x1f\x8b\x08\x08x\n\x05O\x02\xff/srv/mailman/archives/private/python-
>> list/2012-January.txt\x00\xec\xbdy\x7f\xdb\xc6\xb50\xfcw\xf0)\xa6z|+
>> \xaa!!l\xdc\x14[\x8b-;V\xe2-\x92\x12'
>>
>> The docs and google provide no clear answer. I even tried 7zip and
>> ended up with nothing but gibberish characters. There must be levels
>> of compression or something. Why could they not simply use the tar
>> format? Is there anywhere else one can download the archives?
>
> Interesting. I tried this on a Linux system using both gunzip and
> your code, and both worked fine to extract that file. I also tried
> your code on a Windows system, and I get the same result that you do.
> This appears to be a bug in the gzip module under Windows.
>
> I think there may be something peculiar about the archive files that
> the module is not handling correctly. If I gunzip the file locally
> and then gzip it again before trying to open it in Python, then
> everything seems to be fine.
I've found that if I gunzip it twice (gunzip it and then gunzip the
result) using the gzip module I get the text file.
More information about the Python-list
mailing list