[Python-bugs-list] [ python-Bugs-755031 ] zipfile: inconsistent filenames with InfoZip "unzip"
SourceForge.net
noreply@sourceforge.net
Tue, 17 Jun 2003 18:08:58 -0700
Bugs item #755031, was opened at 2003-06-15 17:23
Message generated for change (Comment added) made by gward
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=755031&group_id=5470
Category: Python Library
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: Greg Ward (gward)
>Assigned to: Greg Ward (gward)
Summary: zipfile: inconsistent filenames with InfoZip "unzip"
Initial Comment:
zipfile.py gives filenames inconsistent with the
InfoZIP "unzip" utility for certain ZIP files. My
source is an email virus, so the ZIP files are almost
certainl malformed. Nevertheless, it would be nice if
"unzip -l" and ZipFile.namelist() gave consistent
filenames.
Example: the attached Demo.zip (extracted from an email
virus caught on mail.python.org) looks like this
according to InfoZip:
$ unzip -l /tmp/Demo.zip
Archive: /tmp/Demo.zip
Length Date Time Name
-------- ---- ---- ----
44544 01-26-03 20:49
DOCUME~1\CHRISS~1\LOCALS~1\Temp\Demo.exe
-------- -------
44544 1 file
But according to ZipFile.namelist(), the name of that
file is:
DOCUME~1\CHRISS~1\LOCALS~1\Temp\Demo.exescr000000000000000000.txt
Getting the same result with Python 2.2.2 and a
~2-week-old build of 2.3 CVS.
----------------------------------------------------------------------
>Comment By: Greg Ward (gward)
Date: 2003-06-17 21:08
Message:
Logged In: YES
user_id=14422
Fixed with patch #755987.
----------------------------------------------------------------------
Comment By: James C. Ahlstrom (ahlstromjc)
Date: 2003-06-17 11:50
Message:
Logged In: YES
user_id=64929
I submitted a patch for this. It is 755987. See further
comments there.
----------------------------------------------------------------------
Comment By: James C. Ahlstrom (ahlstromjc)
Date: 2003-06-16 10:29
Message:
Logged In: YES
user_id=64929
The analysis by sjones is correct. Python and the zip file
format both allow null bytes in file names. But in this case,
the file is infected with the "I-Worm.Lentin.o" virus and the
file name is designed to hide this. The file name ends in ".txt"
but the file name up to the null byte ends in ".exe". The
intention is that a virus scanner would skip this file because it
ends in ".txt" ( a non-executable text file), but that
the ".exe" would be seen (an executable program file) if the
file were clicked, and so the file would be executed.
Testing this on my machine, my virus scanner (Kaspersky)
nevertheless flags the ".zip" file as containing a virus, but this
depends on the particular virus scanner and its settings.
I suggest that zipfile.py should terminate file names at a null
byte as InfoZip does.
----------------------------------------------------------------------
Comment By: Shannon Jones (sjones)
Date: 2003-06-15 21:23
Message:
Logged In: YES
user_id=589306
The actual filename from the zipfile is:
filename =
'DOCUME~1\CHRISS~1\LOCALS~1\Temp\Demo.exe\x00\x00scr\x00000000000000000000.txt'
Notice there is a \x00 after Demo.exe. My guess is InfoZip
stores the filename in a null terminated string and this
extra null character in the filename terminates it at this
point. Python doesn't care if you have nulls in the string,
so it prints the entire filename.
You can see the zip file format description at
ftp://ftp.info-zip.org/pub/infozip/doc/appnote-981119-iz.zip
The format does say:
2) String fields are not null terminated, since the
length is given explicitly.
But it doesn't really say if strings are allowed to have
nulls in them.
So does Python or InfoZip get this right?
----------------------------------------------------------------------
Comment By: Guido van Rossum (gvanrossum)
Date: 2003-06-15 21:19
Message:
Logged In: YES
user_id=6380
That almost sounds like an intentional inconsistency. Could
it be that the central directory has one name but the local
header has a different one? Or that there's a null byte in
the filename so that the filename length is inconsistent?
The front of the file looks like this according to od -c:
0000000 P K 003 004 \n \0 \0 \0 \0 \0 * Š :
. c Ì
0000020 \v g \0 ® \0 \0 \0 ® \0 \0 D \0 \0
\0 D O
0000040 C U M E ~ 1 \ C H R I S S
~ 1 \
0000060 L O C A L S ~ 1 \ T e m p
\ D e
0000100 m o . e x e \0 \0 s c r \0 0
0 0 0
0000120 0 0 0 0 0 0 0 0 0 0 0 0 0
0 . t
0000140 x t M Z 220 \0 003 \0 \0 \0 004 \0 \0
\0 ÿ ÿ
0000160 \0 \0 ž \0 \0 \0 \0 \0 \0 \0 @ \0 \0
\0 \0 \0
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=755031&group_id=5470