Re: [Python-Dev] os.path.normcase() in site.py
I noticed that these days __file__ attributes of modules are case normalized (ie. lowercased on case insensitive file systems), or at least the directory part. Then I noticed that this is caused by the fact that all sys.path entries are case normalized. It turns out that site.py does this, in a function called makepath(), added by Fred about 8 months ago.
I think this is wrong: we should always try to *preserve* case.
There is an added problem with the makepath() stuff that I hadn't reported here yet: it has broken MacPython on some non-western machines. Specifically I've had reports of people running a Japanese MacOS that things will break if they run Python from a pathname that has any non-7-bit-ascii characters in the name. Apparently normcase normalizes more than just ascii upper/lowercase letters. And aside from that I fully agree with Just: seeing a stacktrace with all lowercase filenames is _very_ disconcerting. I would disable the case-normalization for MacPython, except that I don't know whether it actually has a function. With MacPython's way of finding the initial sys.path contents we don't have the Windows-Python problem that we add the same directory 5 times (once in uppercase, once in lowercase, once in mixed case, once in mixed-case with / for \, etc:-), so if this is what it's trying to solve we can take it out easily. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm
I noticed that these days __file__ attributes of modules are case normalized (ie. lowercased on case insensitive file systems), or at least the directory part. Then I noticed that this is caused by the fact that all sys.path entries are case normalized. It turns out that site.py does this, in a function called makepath(), added by Fred about 8 months ago.
I think this is wrong: we should always try to *preserve* case.
There is an added problem with the makepath() stuff that I hadn't reported here yet: it has broken MacPython on some non-western machines. Specifically I've had reports of people running a Japanese MacOS that things will break if they run Python from a pathname that has any non-7-bit-ascii characters in the name. Apparently normcase normalizes more than just ascii upper/lowercase letters.
And aside from that I fully agree with Just: seeing a stacktrace with all lowercase filenames is _very_ disconcerting.
I would disable the case-normalization for MacPython, except that I don't know whether it actually has a function. With MacPython's way of finding the initial sys.path contents we don't have the Windows-Python problem that we add the same directory 5 times (once in uppercase, once in lowercase, once in mixed case, once in mixed-case with / for \, etc:-), so if this is what it's trying to solve we can take it out easily.
I can't think of any function besides the attempt to avoid duplicates. I think that even on Windows, retaining case makes sense. I think that there's a way to avoid duplicates without case-folding everything. (E.g. use a case-folding comparison instead.) I wonder if maybe path entries should be normpath'd though? I'll leave it to Fred, Jack or Just to fix this. --Guido van Rossum (home page: http://www.python.org/~guido/)
I noticed that these days __file__ attributes of modules are case normalized (ie. lowercased on case insensitive file systems), or at least the directory part. Then I noticed that this is caused by the fact that all sys.path entries are case normalized. It turns out that site.py does this, in a function called makepath(), added by Fred about 8 months ago.
I think this is wrong: we should always try to *preserve* case.
There is an added problem with the makepath() stuff that I hadn't reported here yet: it has broken MacPython on some non-western machines. Specifically I've had reports of people running a Japanese MacOS that things will break if they run Python from a pathname that has any non-7-bit-ascii characters in the name. Apparently normcase normalizes more than just ascii upper/lowercase letters.
And aside from that I fully agree with Just: seeing a stacktrace with all lowercase filenames is _very_ disconcerting.
I would disable the case-normalization for MacPython, except that I don't know whether it actually has a function. With MacPython's way of finding the initial sys.path contents we don't have the Windows-Python problem that we add the same directory 5 times (once in uppercase, once in lowercase, once in mixed case, once in mixed-case with / for \, etc:-), so if this is what it's trying to solve we can take it out easily.
I can't think of any function besides the attempt to avoid duplicates. I think that even on Windows, retaining case makes sense. I think that there's a way to avoid duplicates without case-folding everything. (E.g. use a case-folding comparison instead.) I wonder if maybe path entries should be normpath'd though? I'll leave it to Fred, Jack or Just to fix this. --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
I can't think of any function besides the attempt to avoid duplicates.
I think that even on Windows, retaining case makes sense.
I think that there's a way to avoid duplicates without case-folding everything. (E.g. use a case-folding comparison instead.)
I wonder if maybe path entries should be normpath'd though?
They are already, they already go through abspath(), which calls normpath().
I'll leave it to Fred, Jack or Just to fix this.
If it were up to me, I'd simply remove the normcase() call from makepath(). Just
Guido van Rossum writes:
I can't think of any function besides the attempt to avoid duplicates.
There were two reasons for adding this code: 1. Avoid duplicates (speeds imports if there are duplicates and the modules are found on an entry after the dupes). 2. Avoid breakage when a script uses os.chdir(). This is probably unusual for large applications, but fairly common for little admin helper scripts.
I think that even on Windows, retaining case makes sense.
I think that there's a way to avoid duplicates without case-folding everything. (E.g. use a case-folding comparison instead.)
I wonder if maybe path entries should be normpath'd though?
I'll leave it to Fred, Jack or Just to fix this.
I certainly agree that this can be improved; if Jack or Just would like to assign it to me on SourceForge, I'd be glad to fix it. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Digital Creations
Guido van Rossum writes:
I can't think of any function besides the attempt to avoid duplicates.
Fred L. Drake, Jr. wrote:
There were two reasons for adding this code:
1. Avoid duplicates (speeds imports if there are duplicates and the modules are found on an entry after the dupes).
2. Avoid breakage when a script uses os.chdir(). This is probably unusual for large applications, but fairly common for little admin helper scripts.
1) normcase(). Bad. 2) abspath(). Good. I think #2 is a ligitimate problem, but I'm not so sure of #1: is it really so common for sys.path to contain duplicates, to worry about it at all?
I'll leave it to Fred, Jack or Just to fix this.
I certainly agree that this can be improved; if Jack or Just would like to assign it to me on SourceForge, I'd be glad to fix it.
Here's my proposed fix: Index: site.py =================================================================== RCS file: /cvsroot/python/python/dist/src/Lib/site.py,v retrieving revision 1.27 diff -c -3 -r1.27 site.py *** site.py 2001/06/12 16:48:52 1.27 --- site.py 2001/06/25 16:42:33 *************** *** 67,73 **** def makepath(*paths): dir = os.path.join(*paths) ! return os.path.normcase(os.path.abspath(dir)) L = sys.modules.values() for m in L: --- 67,73 ---- def makepath(*paths): dir = os.path.join(*paths) ! return os.path.abspath(dir) L = sys.modules.values() for m in L: Just
[Jack Jansen]
... With MacPython's way of finding the initial sys.path contents we don't have the Windows-Python problem that we add the same directory 5 times (once in uppercase, once in lowercase, once in mixed case, once in mixed-case with / for \, etc:-),
Happily, we don't have that problem on a stock Windows Python anymore: C:\Python21>python Python 2.1 (#15, Apr 16 2001, 18:25:49) [MSC 32 bit (Intel)] on win32 Type "copyright", "credits" or "license" for more information.
import sys, pprint pprint.pprint(sys.path) ['', 'c:\\python21', 'c:\\python21\\dlls', 'c:\\python21\\lib', 'c:\\python21\\lib\\plat-win', 'c:\\python21\\lib\\lib-tk']
OTOH, this is still Icky, because those don't match (wrt case) the names in the filesystem (e.g., just look at the initial prompt line: I was in Python21 when I ran this, not python21).
so if this is what it's trying to solve we can take it out easily.
It's hard to believe Fred added code to solve a Windows problem <wink>; I don't know what it's trying to do.
participants (5)
-
Fred L. Drake, Jr.
-
Guido van Rossum
-
Jack Jansen
-
Just van Rossum
-
Tim Peters