[Python-bugs-list] [ python-Bugs-755031 ] zipfile: inconsistent filenames with InfoZip "unzip"

SourceForge.net noreply@sourceforge.net
Tue, 17 Jun 2003 18:08:58 -0700


Bugs item #755031, was opened at 2003-06-15 17:23
Message generated for change (Comment added) made by gward
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=755031&group_id=5470

Category: Python Library
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: Greg Ward (gward)
>Assigned to: Greg Ward (gward)
Summary: zipfile: inconsistent filenames with InfoZip "unzip"

Initial Comment:
zipfile.py gives filenames inconsistent with the
InfoZIP "unzip" utility for certain ZIP files.  My
source is an email virus, so the ZIP files are almost
certainl malformed.  Nevertheless, it would be nice if
"unzip -l" and ZipFile.namelist() gave consistent
filenames.

Example: the attached Demo.zip (extracted from an email
virus caught on mail.python.org) looks like this
according to InfoZip:

$ unzip -l /tmp/Demo.zip 
Archive:  /tmp/Demo.zip
  Length     Date   Time    Name
 --------    ----   ----    ----
    44544  01-26-03 20:49  
DOCUME~1\CHRISS~1\LOCALS~1\Temp\Demo.exe
 --------                   -------
    44544                   1 file

But according to ZipFile.namelist(), the name of that
file is:
 
DOCUME~1\CHRISS~1\LOCALS~1\Temp\Demo.exescr000000000000000000.txt

Getting the same result with Python 2.2.2 and a
~2-week-old build of 2.3 CVS.


----------------------------------------------------------------------

>Comment By: Greg Ward (gward)
Date: 2003-06-17 21:08

Message:
Logged In: YES 
user_id=14422

Fixed with patch #755987.

----------------------------------------------------------------------

Comment By: James C. Ahlstrom (ahlstromjc)
Date: 2003-06-17 11:50

Message:
Logged In: YES 
user_id=64929

I submitted a patch for this.  It is 755987.  See further 
comments there.

----------------------------------------------------------------------

Comment By: James C. Ahlstrom (ahlstromjc)
Date: 2003-06-16 10:29

Message:
Logged In: YES 
user_id=64929

The analysis by sjones is correct.  Python and the zip file 
format both allow null bytes in file names.  But in this case, 
the file is infected with the "I-Worm.Lentin.o" virus and the 
file name is designed to hide this.  The file name ends in ".txt" 
but the file name up to the null byte ends in ".exe".  The 
intention is that a virus scanner would skip this file because it 
ends in ".txt" ( a non-executable text file), but that 
the ".exe" would be seen (an executable program file) if the 
file were clicked, and so the file would be executed.

Testing this on my machine, my virus scanner (Kaspersky) 
nevertheless flags the ".zip" file as containing a virus, but this 
depends on the particular virus scanner and its settings.

I suggest that zipfile.py should terminate file names at a null 
byte as InfoZip does.

----------------------------------------------------------------------

Comment By: Shannon Jones (sjones)
Date: 2003-06-15 21:23

Message:
Logged In: YES 
user_id=589306

The actual filename from the zipfile is:
filename =
'DOCUME~1\CHRISS~1\LOCALS~1\Temp\Demo.exe\x00\x00scr\x00000000000000000000.txt'

Notice there is a \x00 after Demo.exe. My guess is InfoZip
stores the filename in a null terminated string and this
extra null character in the filename terminates it at this
point. Python doesn't care if you have nulls in the string,
so it prints the entire filename.

You can see the zip file format description at
ftp://ftp.info-zip.org/pub/infozip/doc/appnote-981119-iz.zip

The format does say:
      2)  String fields are not null terminated, since the
          length is given explicitly.

But it doesn't really say if strings are allowed to have
nulls in them.

So does Python or InfoZip get this right?


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2003-06-15 21:19

Message:
Logged In: YES 
user_id=6380

That almost sounds like an intentional inconsistency. Could
it be that the central directory has one name but the local
header has a different one? Or that there's a null byte in
the filename so that the filename length is inconsistent?
The front of the file looks like this according to od -c:

0000000   P   K 003 004  \n  \0  \0  \0  \0  \0   *   Š   :
  .   c   Ì
0000020  \v   g  \0   ®  \0  \0  \0   ®  \0  \0   D  \0  \0
 \0   D   O
0000040   C   U   M   E   ~   1   \   C   H   R   I   S   S
  ~   1   \
0000060   L   O   C   A   L   S   ~   1   \   T   e   m   p
  \   D   e
0000100   m   o   .   e   x   e  \0  \0   s   c   r  \0   0
  0   0   0
0000120   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   .   t
0000140   x   t   M   Z 220  \0 003  \0  \0  \0 004  \0  \0
 \0   ÿ   ÿ
0000160  \0  \0   ž  \0  \0  \0  \0  \0  \0  \0   @  \0  \0
 \0  \0  \0


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=755031&group_id=5470