[ python-Bugs-896236 ] Unicode problem in os.path.getsize ?

SourceForge.net noreply at sourceforge.net
Mon Feb 16 11:42:19 EST 2004


Bugs item #896236, was opened at 2004-02-12 21:49
Message generated for change (Comment added) made by tjreedy
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=896236&group_id=5470

Category: Python Library
Group: Python 2.3
Status: Closed
Resolution: Wont Fix
Priority: 5
Submitted By: Ronald L. Rivest (ronrivest)
Assigned to: Nobody/Anonymous (nobody)
Summary: Unicode problem in os.path.getsize ?

Initial Comment:
I am running on Windows XP 5.1 using python version 2.3.
The following simple code fails on my system.

for dirpath,dirnames,filenames in os.walk("C:/"):
    for name in filenames:
	pathname = os.path.join(dirpath,name)
	size = os.path.getsize(pathname)
	print size, pathname

I get an error from getsize that the file given by 
pathname does not exist.  When it breaks, the
variable "name" contains two question marks, which
makes me think that this is a Unicode problem.

In any case, shouldn't names returned by walk be
acceptable in all cases to getsize???




 
            
            


----------------------------------------------------------------------

Comment By: Terry J. Reedy (tjreedy)
Date: 2004-02-16 11:42

Message:
Logged In: YES 
user_id=593130

Final comment:

dir and explorer can display stats of files with bad names 
because they get both simultaneously without trying to use 
the bad names.  CommandPrompt equivalent of listdir (or 
walk) followed by getsize (or stat) is 'dir /w' followed by 'dir 
badname', which should also give "File not found' error 
message.

I believe this 'disturbing' behavior results from having filename 
rules that are not enforced by restricting directory disk block 
writes to os functions that respect the rules.

A roundabout fix: replace 'size = ...' with something like
try: size = ...
except WhateverErrorYouGot:
  file = os.popenx('dir %s' % dirpath).read()
   # x = whichever of 1,2,3,4 works
   <find line with badname>
   <parse out file size>

But prefixing 'u' to the root dir looks a lot easier if it gets you 
what you need.


----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2004-02-16 09:55

Message:
Logged In: YES 
user_id=21627

This behaviour is standard behaviour of Win32, and,
disturbing as it may sound, is somewhat outside Python's
control.

When a file is found whose name cannot be represented in the
system code page (CP_ACP, the "ANSI" code page), then
non-representable characters are converted to question
marks. What's worse: "roughly-representable" characters are
sometimes converted to look-alike characters.

When passing back such a file name to the Win32, it will not
find the file, as it does have question marks in it.

Withe the "ANSI" API, there is really no solution. Instead,
you should use Unicode file names, i.e. write

for dirpath,dirnames,filenames in os.walk(u"C:/"):

Closing as "won't fix".

----------------------------------------------------------------------

Comment By: Ronald L. Rivest (ronrivest)
Date: 2004-02-13 20:46

Message:
Logged In: YES 
user_id=863876

TJREEDY -- Thanks for the reply...

To answer your questions:
   (1) What does Windows show when I visit the directory?
        -- I have several files in this directory that have
the same
            problem.  It is a hard, reproducible problem, not a
            transient glitch.   The files are mp3 files that
have 
            the name "prelude.mp3", except that the first "e" is
            replaced by two question marks (for Python) or by 
            two "boxes" in Windows Explorer.  I would guess that
            this is some funky representation of the french "e"
            with an "accent aigu".  
    (2) What does "dir" do in a Command Prompt?
        -- From a command prompt, I see two question marks
            at the problematic position.

Does Windows allow one to create filenames with characters
in the filename that are illegal for Windows?  

As I said in the original post, I find it very disturbing that
os.walk should return a filename that os.path.exists says
doesn't exist!  If you can walk the directory and find the
file, then os.path.exists (or, equivalently, os.path.getsize),
should find it!  This looks like a Python bug to me... no?

    Cheers,
    Ron Rivest



----------------------------------------------------------------------

Comment By: Terry J. Reedy (tjreedy)
Date: 2004-02-13 19:47

Message:
Logged In: YES 
user_id=593130

Though it might be, I suspect that this is not a Python bug.  
Whether is it a Windows design or coding bug in is another 
matter.

>variable "name" contains two question marks, which
>makes me think that this is a Unicode problem.

Since '?' is not legal in filenames, as you seem to know, I 
more believe this is the Windows substitute, in the Win 
function called by os.listdir and os.walk, for illegal characters 
in the filename.  So of course getsize, which wraps os.stat(), 
which calls a system function, chokes on it.

Could be disk bit glitch, or bad program writing directly to 
directory block.  Happened to me once - difficult to get rid of.

What does Windows Explorer show when you visit that 
directory?  Ditto for 'dir' in a CommandPrompt window
(Start/Accessories)? 


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=896236&group_id=5470



More information about the Python-bugs-list mailing list