[New-bugs-announce] [issue42096] zipfile.is_zipfile incorrectly identifying a gzipped file as a zip archive

Alex Roussel report at bugs.python.org
Tue Oct 20 04:51:47 EDT 2020


New submission from Alex Roussel <alexandredrr14 at gmail.com>:

Hello, 

I've come across an issue that seems similar to the false positives problem outlined in this ticket (https://bugs.python.org/issue28494), however this issue relates to a single gzipped json file which is incorrectly identified as a .zip archive because (I suspect) is_zipfile is mistaking bytes in the file's data for the ending bytes that correspond to a .zip archive.

I'm afraid I'm not well versed on the way is_zipfile 'seeks' the bytes of a file to compare its magic number, so I apologise if my description isn't completely accurate. 

Here's my attempt at a summary of the problem:

My .json.gz file includes the correct magic number (1f8b) to identify it as a gzipped file, however when zipfile.is_zipfile is called on the filepath, it returns True.

I'm going to ask if I'm allowed to upload the file directly for you (it's work related), in the meantime I've included a head and tail of the file's hexdump below.

I am running Python 3.6.9 on Ubuntu 18.04.

----------

~/Téléchargements » python3                                                   
Python 3.6.9 (default, Oct  8 2020, 12:12:24) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import zipfile
>>> zipfile.is_zipfile('2020-10-18-1602979256-http_get_7549.json.gz')
True

----------

~/Téléchargements » xxd 2020-10-18-1602979256-http_get_7549.json.gz | head                   
00000000: 1f8b 0800 7090 8b5f 0003 ecbd 5993 e338  ....p.._....Y..8
00000010: 922e fa7e 7fc6 bcea 588a 8b28 8ad7 ec3c  ...~....X..(...<
00000020: 883b 292e e2be bc71 91b8 5394 b8f3 d8fd  .;)....q..S.....
00000030: ef17 9422 3223 22ab 2ab3 7ba6 67a6 e774  ..."2#".*.{.g..t
00000040: 5bab 4209 6175 7738 3e77 3880 fff3 6f71  [.B.auw8>w8...oq
00000050: d005 fff6 fffe 9bc1 ea96 451d 2629 6712  ..........E.&)g.
00000060: 853e 4202 830d 3145 7221 6af7 fe11 3ae9  .>B...1Er!j...:.
00000070: 1c0b f9e6 2db1 c0bf 25ea 38a9 1479 f650  ....-...%.8..y.P

----------

~/Téléchargements » xxd 2020-10-18-1602979256-http_get_7549.json.gz | tail
01d492b0: 98a2 3d5c 25e1 c5b8 d9c5 3287 c5d8 3d7c  ..=\%.....2...=|
01d492c0: 968f 3652 6fd4 4a0c 243b 166d 5640 97b5  ..6Ro.J.$;.mV at ..
01d492d0: 9308 8376 fe17 1fac 0c90 0fdb b3e3 4e4a  ...v..........NJ
01d492e0: 605c a870 5120 955b 6267 e318 406f e1e2  `\.pQ .[bg.. at o..
01d492f0: 2c50 12ec 5eb0 43cc 8d97 4daf 6017 3412  ,P..^.C...M.`.4.
01d49300: 3bdb 40ce 743f 7aa8 6ff9 f30d 796f f784  ;. at .t?z.o...yo..
01d49310: cec2 c45d b012 7e07 c70c dafd e16e fee2  ...]..~......n..
01d49320: c8a6 c01c 627f e004 f9c7 4770 e5e7 6bbf  ....b.....Gp..k.
01d49330: 44d5 97bb ffdf ffe7 ff03 2263 b46f 3d5c  D........."c.o=\
01d49340: cb04                                     ..

----------
components: Library (Lib)
messages: 379105
nosy: aroussel
priority: normal
severity: normal
status: open
title: zipfile.is_zipfile incorrectly identifying a gzipped file as a zip archive
type: behavior
versions: Python 3.6

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue42096>
_______________________________________


More information about the New-bugs-announce mailing list