[ python-Bugs-924703 ] test_unicode_file fails on Win98SE

Sun Jul 25 19:34:33 CEST 2004

Bugs item #924703, was opened at 2004-03-28 03:48
Message generated for change (Comment added) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=924703&group_id=5470

Category: Unicode
Group: Python 2.4
>Status: Closed
>Resolution: Fixed
Priority: 7
Submitted By: Tim Peters (tim_one)
Assigned to: Martin v. Löwis (loewis)
Summary: test_unicode_file fails on Win98SE

Initial Comment:
In current CVS, test_unicode_file fails on Win98SE.  This 
has been going on for some time, actually.

ERROR: test_single_files (__main__.TestUnicodeFiles)

Traceback (most recent call last):
  File ".../lib/test/test_unicode_file.py", line 162, in 
test_single_files
    self._test_single(TESTFN_UNICODE)
  File ".../lib/test/test_unicode_file.py", line 136, in 
_test_single
    self._do_single(filename)
  File ".../lib/test/test_unicode_file.py", line 49, in 
_do_single
    new_base = unicodedata.normalize("NFD", new_base)
TypeError: normalized() argument 2 must be unicode, 
not str

At this point,

filename is TESTFN_UNICODE is
u'@test-\xe0\xf2'

os.path.abspath(filename) is
'C:\Code\python\PC\VC6\@test-\xe0\xf2'

new_base is
'@test-\xe0\xf2

So abspath() removed the "Unicodeness" of filename, 
and new_base is indeed not a Unicode string at this 
point.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2004-07-25 19:34

Message:
Logged In: YES 
user_id=21627

This apparently got fixed with test_unicode_file.py 1.15, by
converting the listdir result to unicode if necessary.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2004-06-17 18:11

Message:
Logged In: YES 
user_id=31435

Reopened, because the same test is still failing on Win98SE, 
but for a different reason.

The traceback is identical, except that it's now failing in the 
listcomp on the line following the line it used to fail on:

  File ".../lib/test/test_unicode_file.py", line 50, in _do_single
    file_list = [unicodedata.normalize("NFD", f) for f in file_list]
TypeError: normalized() argument 2 must be unicode, not str

filename is
u'@test-\xe0\xf2'

os.path.abspath(filename) is
u'C:\Code\python\PC\VC6\@test-\xe0\xf2'

new_base is
u'@test-a\u0300o\u0300'

The problem now is that the first name in file_list is
'CVS'

so

[unicodedata.normalize("NFD", f) for f in file_list]

is passing an 8-bit string to normalize().  Earlier code in the 
test *appears* to assume that if filename is Unicode, then 
os.listdir() will return a list of Unicode strings.  But file_list is a 
list of 153 8-bit strings on this box.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2004-06-15 20:50

Message:
Logged In: YES 
user_id=21627

This should be fixed with posixmodule.c 2.321.
Unfortunately, I cannot test it, because I don't have W9X.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2004-06-12 09:11

Message:
Logged In: YES 
user_id=80475

This is still failing.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2004-03-30 07:44

Message:
Logged In: YES 
user_id=31435

Just a guess:  the os.path functions are generally just string 
manipulation, and on Windows I sometimes import 
posixpath.py directly to do Unixish path manipulations.  So it's 
conceivable that someone (not me) on a non-Windows box 
imports ntpath directly to manipulate Windows paths.

In fact, I see that Fredrik's "Python Standard Library" book 
explicitly mentions this use case for importing ntpath directly.  
So maybe he actually did it -- once <wink>.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2004-03-30 07:25

Message:
Logged In: YES 
user_id=21627

I see. I'll look into changing _getfullpathname to return
Unicode output for Unicode input even if
unicode_file_names() is false.

However,  I do wonder what the purpose of _abspath then is:
On what system would it be used???

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2004-03-30 01:11

Message:
Logged In: YES 
user_id=31435

Nope, that can't help -- ntpath.py's _abspath doesn't exist 
on Win98SE (the "from nt import _getfullpathname" succeeds, 
so _abspath is never defined).  It's _getfullpathname() that's 
taking a Unicode input and returning a str output here.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2004-03-30 00:17

Message:
Logged In: YES 
user_id=21627

abspath(unicode) should return a Unicode path.

Does it help if _abspath (in ntpath.py) is changed to contain

            if not isabs(path):
                if isinstance(path, unicode): cwd = os.getcwdu()
                else: cwd = os.getcwd()
                path = join(cwd, path)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=924703&group_id=5470