Hi All, I'm curious as to why, with a file called "Foo.txt" on a case descriminating but case insensitive filesystem, os.path.normcase('FoO.txt') will return "foo.txt" rather than "Foo.txt"? Yes, I know the behaviour is documented, but I'm wondering if anyone can remember the rationale for that behaviour? cheers, Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk
On Sat, Sep 18, 2010 at 2:39 PM, Chris Withers
I'm curious as to why, with a file called "Foo.txt" on a case descriminating but case insensitive filesystem, os.path.normcase('FoO.txt') will return "foo.txt" rather than "Foo.txt"?
Yes, I know the behaviour is documented, but I'm wondering if anyone can remember the rationale for that behaviour?
Because normcase() and friends never look at the filesystem. Of course, exists() and isdir() etc. do, and so does realpath(), but the pure parsing functions don't. They can be used without a working filesystem even. (E.g. you can import ntpath on a Unix box and happily parse Windows paths.) -- --Guido van Rossum (python.org/~guido)
On 18/09/2010 23:36, Guido van Rossum wrote:
course, exists() and isdir() etc. do, and so does realpath(), but the pure parsing functions don't.
Yes, but: H:\>echo foo > TeSt.txt ...>>> import os.path
os.path.realpath('test.txt') 'H:\\test.txt' os.path.normcase('TeSt.txt') 'test.txt'
Both feel unsatisfying to me :-S How can I get 'TeSt.txt' from 'test.txt' (which feels like the contract normcase *should* have...)
They can be used without a working filesystem even. (E.g. you can import ntpath on a Unix box and happily parse Windows paths.)
But what value does that add over just doing a .lower() on the path? Chris
On 9/24/2010 6:13 AM, Chris Withers wrote:
On 18/09/2010 23:36, Guido van Rossum wrote:
course, exists() and isdir() etc. do, and so does realpath(), but the pure parsing functions don't.
Yes, but:
H:\>echo foo > TeSt.txt ...>>> import os.path
os.path.realpath('test.txt') 'H:\\test.txt' os.path.normcase('TeSt.txt') 'test.txt'
Both feel unsatisfying to me :-S
How can I get 'TeSt.txt' from 'test.txt' (which feels like the contract normcase *should* have...)
http://stackoverflow.com/questions/3692261/in-python-how-can-i-get-the-corre...
They can be used without a working filesystem even. (E.g. you can import ntpath on a Unix box and happily parse Windows paths.)
But what value does that add over just doing a .lower() on the path?
Chris _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/ned%40nedbatchelder.com
On Fri, 24 Sep 2010 11:13:46 +0100, Chris Withers
On 18/09/2010 23:36, Guido van Rossum wrote:
course, exists() and isdir() etc. do, and so does realpath(), but the pure parsing functions don't.
Yes, but:
H:\>echo foo > TeSt.txt ...>>> import os.path
os.path.realpath('test.txt') 'H:\\test.txt' os.path.normcase('TeSt.txt') 'test.txt'
Both feel unsatisfying to me :-S
How can I get 'TeSt.txt' from 'test.txt' (which feels like the contract normcase *should* have...)
You can't, and you shouldn't be able to. "normalization" is something that happens without reference to existing objects, the whole point is to put the thing into "standard form" so that you can compare strings obtained from different sources and know that they will represent the same object on that filesystem.
They can be used without a working filesystem even. (E.g. you can import ntpath on a Unix box and happily parse Windows paths.)
But what value does that add over just doing a .lower() on the path?
It does what is appropriate for that....oh, yeah. For that OS, not "for that filesystem". (e.g. on Unix normcase does nothing since files with different cases but the same letters are different files.) Being os specific rather than file system type specific is the usability bug. But to fix it we'll need to introduce a 'filesystems' module enumerating the different file systems we support, with tools for figuring out what filesystem your program is talking to. But normacase still, wouldn't (shouldn't) do what you want. -- R. David Murray www.bitdance.com
On Fri, Sep 24, 2010 at 5:17 AM, R. David Murray
On Fri, 24 Sep 2010 11:13:46 +0100, Chris Withers
wrote: On 18/09/2010 23:36, Guido van Rossum wrote:
course, exists() and isdir() etc. do, and so does realpath(), but the pure parsing functions don't.
Yes, but:
H:\>echo foo > TeSt.txt ...>>> import os.path >>> os.path.realpath('test.txt') 'H:\\test.txt' >>> os.path.normcase('TeSt.txt') 'test.txt'
Both feel unsatisfying to me :-S
How can I get 'TeSt.txt' from 'test.txt' (which feels like the contract normcase *should* have...)
You can't, and you shouldn't be able to. "normalization" is something that happens without reference to existing objects, the whole point is to put the thing into "standard form" so that you can compare strings obtained from different sources and know that they will represent the same object on that filesystem.
Clearly there is another use case where people want to display the filename back to the user with the correct case. This is a reasonable request and I think it makes sense for us to add another API to os.path that does this by looking up the path on the filesystem, or making an OS-specific call.
They can be used without a working filesystem even. (E.g. you can import ntpath on a Unix box and happily parse Windows paths.)
But what value does that add over just doing a .lower() on the path?
It does what is appropriate for that....oh, yeah. For that OS, not "for that filesystem". (e.g. on Unix normcase does nothing since files with different cases but the same letters are different files.)
Yeah, which is wrong on Mac OS X -- that's Unix but the default filesystem is case-preserving (though apparently it's possible to mount case-sensitive filesystems too). I've heard that on Windows there are also case-sensitive filesystems (part of a POSIX compliance package?). And on Linux you can mount FAT32 filesystems which are case-preserving.
Being os specific rather than file system type specific is the usability bug.
Agreed.
But to fix it we'll need to introduce a 'filesystems' module enumerating the different file systems we support, with tools for figuring out what filesystem your program is talking to. But normacase still, wouldn't (shouldn't) do what you want.
I don't think we should try to reimplement what the filesystem does. I think we should just ask the filesystem (how exactly I haven't figured out yet but I expect it will be more OS-specific than filesystem-specific). It will have to be a new API -- normcase() at least is *intended* to return a case-flattened name on OSes where case-preserving filesystems are the default, and changing it to look at the filesystem would break too much code. For a new use case we need a new API. -- --Guido van Rossum (python.org/~guido)
On Fri, 24 Sep 2010 07:29:40 -0700
Guido van Rossum
It will have to be a new API -- normcase() at least is *intended* to return a case-flattened name on OSes where case-preserving filesystems are the default, and changing it to look at the filesystem would break too much code. For a new use case we need a new API.
realpath() sounds like the proper API for that. It just needs to have a better implementation :) Regards Antoine.
On 24 September 2010 15:29, Guido van Rossum
I don't think we should try to reimplement what the filesystem does. I think we should just ask the filesystem (how exactly I haven't figured out yet but I expect it will be more OS-specific than filesystem-specific). It will have to be a new API -- normcase() at least is *intended* to return a case-flattened name on OSes where case-preserving filesystems are the default, and changing it to look at the filesystem would break too much code. For a new use case we need a new API.
I dug into this once, and as far as I could tell, it's possible to get the information on Windows, but there's no way on Linux to "ask the filesystem". From my researches, the standard interfaces a filesystem has to implement on Linux don't offer any means of asking this question. Of course, (a) I'm no Linux expert so what do I know, and (b) it may well be possible to come up with a "good enough" solution by ignoring pathologically annoying theoretical cases. I'm happy to provide Windows code if someone needs it. Paul PS There were some places I'd have been glad of this feature (and from what I recall, Mercurial could have used it too) so I'm +1 on the idea.
Paul Moore wrote:
I dug into this once, and as far as I could tell, it's possible to get the information on Windows, but there's no way on Linux to "ask the filesystem".
Maybe we could use a heuristic such as: 1) Search the directory for an exact match to the name given, return it if found. 2) Look for a match ignoring case. If one is found, test it to see if it refers to the same file as the given path, and if so return it. 3) Otherwise, raise an exception. -- Greg
On 9/24/2010 3:10 PM, Greg Ewing wrote:
Paul Moore wrote:
I dug into this once, and as far as I could tell, it's possible to get the information on Windows, but there's no way on Linux to "ask the filesystem".
Maybe we could use a heuristic such as:
1) Search the directory for an exact match to the name given, return it if found.
2) Look for a match ignoring case. If one is found, test it to see if it refers to the same file as the given path, and if so return it.
3) Otherwise, raise an exception.
Hmm. There is no need for the function on a case sensitive file system, because the name had better be spelled with matching case: that is, if it is spelled with non-matching case it is an attempt to reference a non-existent file (or at least a different file). So the API could do the "right thing" for case preserving or case ignoring file systems, but for case sensitive file systems, at most an existence check would be warranted. In other words, the API, should it be created, should be "What is the actual name of the file that matches this if it exists in the filesystem", so the first check is to see if it exists in the file system (this may raise an exception if it doesn't exist), and then if it does, then on those filesystems for which it might be different, obtain the different name.
I think that, like os.path.realpath(), it should not fail if the file
does not exist.
Maybe the API could be called os.path.unnormpath(), since it is in a
sense the opposite of normpath() (which removes case) ? But I would
want to write it so that even on Unix it scans the filesystem, in case
the filesystem is case-preserving (like the default fs on OS X).
--Guido
On Fri, Sep 24, 2010 at 3:43 PM, Glenn Linderman
On 9/24/2010 3:10 PM, Greg Ewing wrote:
Paul Moore wrote:
I dug into this once, and as far as I could tell, it's possible to get the information on Windows, but there's no way on Linux to "ask the filesystem".
Maybe we could use a heuristic such as:
1) Search the directory for an exact match to the name given, return it if found.
2) Look for a match ignoring case. If one is found, test it to see if it refers to the same file as the given path, and if so return it.
3) Otherwise, raise an exception.
Hmm. There is no need for the function on a case sensitive file system, because the name had better be spelled with matching case: that is, if it is spelled with non-matching case it is an attempt to reference a non-existent file (or at least a different file).
So the API could do the "right thing" for case preserving or case ignoring file systems, but for case sensitive file systems, at most an existence check would be warranted.
In other words, the API, should it be created, should be "What is the actual name of the file that matches this if it exists in the filesystem", so the first check is to see if it exists in the file system (this may raise an exception if it doesn't exist), and then if it does, then on those filesystems for which it might be different, obtain the different name. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)
Guido van Rossum wrote:
Maybe the API could be called os.path.unnormpath(), since it is in a sense the opposite of normpath() (which removes case) ?
Cute, but not very intuitive. Something like actualpath() might be better -- although that's somewhat arbitrarily different from realpath(). -- Greg
On Sat, 25 Sep 2010 09:22:47 am Guido van Rossum wrote:
I think that, like os.path.realpath(), it should not fail if the file does not exist.
Maybe the API could be called os.path.unnormpath(), since it is in a sense the opposite of normpath() (which removes case) ? But I would want to write it so that even on Unix it scans the filesystem, in case the filesystem is case-preserving (like the default fs on OS X).
It is not entirely clear to me what this function is meant to actually do? Should it: 1. Return the case of a filename in some canonical form which depends on the file system? 2. Return the case of a filename as it is actually stored on disk? 3. Something else? and just for completeness: 4. Return the case of a filename in some arbitrarily-chosen canonical form which does not depend on the file system? These are not the same, either conceptually or in practice. If you want #4, you already have it in os.path.normcase. I think that the OP, Chris, wants #1, but it isn't entirely clear to me. It's possible that he wants #2. Various people have posted links to recipes that solve case #2. Note though that this necessarily demands that if the file doesn't exist, it should raise an exception. In the case of #1, if the file system doesn't exist, we can't predict what the canonical form should be. The very concept of canonical form for file names is troublesome. If the file system is case-preserving, the file system doesn't define a canonical form: the case of the file name will depend on how the file is initially named. If the file system is case-destructive the behaviour will depend on the file system itself: e.g. FAT12 and ISO 9660 both uppercase file names, but other file systems may make other choices. For some arbitrary path, where we don't know what file system it is, or if the path doesn't actually exist, we have no way of telling what the file system's canonical form will be, or even whether it will have one. Note that I've been talking about case preservation, not case sensitivity. That's because case preservation is orthogonal to sensitivity. You can see three of the four combinations, e.g.: Preserving + insensitive: fat32, NTFS under Win32, normally HFS+ Preserving + sensitive: ext3, NTFS under POSIX, optionally HFS+ Destructive + insensitive: fat12, fat16 without long file name support To the best of my knowledge, destructive + sensitive doesn't exist. It could, in principle, but it would be silly to do so. Note that just knowing the file system type is not enough to tell what its behaviour will be. Given an arbitrary file system, there's no obvious way to determine what it will do to file names short of trying to create a file and see what happens. -- Steven D'Aprano
On Fri, Sep 24, 2010 at 8:25 PM, Steven D'Aprano
On Sat, 25 Sep 2010 09:22:47 am Guido van Rossum wrote:
I think that, like os.path.realpath(), it should not fail if the file does not exist.
Maybe the API could be called os.path.unnormpath(), since it is in a sense the opposite of normpath() (which removes case) ? But I would want to write it so that even on Unix it scans the filesystem, in case the filesystem is case-preserving (like the default fs on OS X).
It is not entirely clear to me what this function is meant to actually do? Should it:
1. Return the case of a filename in some canonical form which depends on the file system? 2. Return the case of a filename as it is actually stored on disk?
This one. This is actually useful (on case-preserving filesystems). There is no doubt in my mind that this is the requested and needed functionality.
3. Something else?
and just for completeness:
4. Return the case of a filename in some arbitrarily-chosen canonical form which does not depend on the file system?
These are not the same, either conceptually or in practice.
If you want #4, you already have it in os.path.normcase.
I think that the OP, Chris, wants #1, but it isn't entirely clear to me.
I don't think this is where the issue lies.
It's possible that he wants #2.
Various people have posted links to recipes that solve case #2. Note though that this necessarily demands that if the file doesn't exist, it should raise an exception.
No it needn't; realpath() uses the filesystem but leaves non-existing parts alone. Also some of the path may exist (e.g. a parent directory).
In the case of #1, if the file system doesn't exist, we can't predict what the canonical form should be.
The very concept of canonical form for file names is troublesome. If the file system is case-preserving, the file system doesn't define a canonical form: the case of the file name will depend on how the file is initially named. If the file system is case-destructive the behaviour will depend on the file system itself: e.g. FAT12 and ISO 9660 both uppercase file names, but other file systems may make other choices. For some arbitrary path, where we don't know what file system it is, or if the path doesn't actually exist, we have no way of telling what the file system's canonical form will be, or even whether it will have one.
Note that I've been talking about case preservation, not case sensitivity. That's because case preservation is orthogonal to sensitivity. You can see three of the four combinations, e.g.:
Preserving + insensitive: fat32, NTFS under Win32, normally HFS+ Preserving + sensitive: ext3, NTFS under POSIX, optionally HFS+ Destructive + insensitive: fat12, fat16 without long file name support
To the best of my knowledge, destructive + sensitive doesn't exist. It could, in principle, but it would be silly to do so.
Note that just knowing the file system type is not enough to tell what its behaviour will be. Given an arbitrary file system, there's no obvious way to determine what it will do to file names short of trying to create a file and see what happens.
This operation should not do any writes. The solution may well be OS specific. Solutions for Windows and OS X have already been pointed out. If it can't be done for other Unix versions, I think returning the input unchanged on those platform is a fine fallback (as it is for non-existent filenames). -- --Guido van Rossum (python.org/~guido)
On 25/09/2010 15:45, Guido van Rossum wrote:
The solution may well be OS specific. Solutions for Windows and OS X have already been pointed out. If it can't be done for other Unix versions, I think returning the input unchanged on those platform is a fine fallback (as it is for non-existent filenames).
Spot on, especially as the "default" of case perserving and case sensitive will likely cover anything where an FS-specific solution can't be found - I'd hazard a guess that the reason the FS-sepcific solution can't be found in some cases is because for case preserving and case sensitive situations, there really is no need for such an api ;-) Chris
On 25/09/2010 04:25, Steven D'Aprano wrote:
1. Return the case of a filename in some canonical form which depends on the file system? 2. Return the case of a filename as it is actually stored on disk?
How do 1 and 2 differ? FWIW, the use case that setuptools has (and for which it currently incorrectly uses normpath) is number 2.
4. Return the case of a filename in some arbitrarily-chosen canonical form which does not depend on the file system?
This is what normpath does, but only if you're on Windows ;-) I still don't really get the use case of normpath in its current form, at all...
Various people have posted links to recipes that solve case #2. Note though that this necessarily demands that if the file doesn't exist, it should raise an exception.
Fine by me, shame it seems to require iteration to find an answer though :-S
The very concept of canonical form for file names is troublesome.
I would have thought "whatever is shown when doing an ls/dir/etc" (and don't be smart and think about mentioning that oyu can get dir to output 8.3 as well as the full path ;-) ) Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk
On Tue, 5 Oct 2010 07:21:15 pm Chris Withers wrote:
On 25/09/2010 04:25, Steven D'Aprano wrote:
1. Return the case of a filename in some canonical form which depends on the file system? 2. Return the case of a filename as it is actually stored on disk?
How do 1 and 2 differ?
Case #1 imposes a particular canonical form, regardless of what is actually stored on disk. It is similar to normpath, except that we could have different canonical forms depending on what the file system was. normpath merely generalises from the operating system, and never looks at the file system. Some file systems are case-preserving, and don't have a canonical form. We might choose to arbitrarily impose one, as normcase already does. Some are case-folding, in which case it might be sensible to choose the same canonical form as the file system actually uses. However, this may be implementation dependent e.g. under FAT12 or FAT16, the file system will take a file name like pArRoT.tXt and fold it to PARROT.TXT, or possibly parrot.txt, or Parrot.txt. Even if that's not the case for FAT12, it may be the case for other case-folding file systems. And the behaviour of FAT16 will differ according to whether or not it has been built with support for long file names. Case #2 says to actually look at the file and see what the file system considers it's name to be. Consider a NTFS file system. By default it is case-preserving and case-insensitive, although that can be changed. (Just because a file system is NTFS doesn't mean that will be case-insensitive. NTFS can also run in a POSIX mode which is case-sensitive. But I digress.) For simplicity, suppose you're on Windows using NTFS with the standard non-POSIX behaviour. You create a file named pArRoT.tXt. This will be stored on disk using the exact characters that you typed. The file system does no case-folding and merely uses whatever characters are fed to it, which in the case of Windows apps is likely to be whatever characters the user types. In this case, we don't try to impose a particular case on file names, but return whatever actually exists on disk.
FWIW, the use case that setuptools has (and for which it currently incorrectly uses normpath) is number 2.
4. Return the case of a filename in some arbitrarily-chosen canonical form which does not depend on the file system?
This is what normpath does, but only if you're on Windows ;-)
Not quite. macpath.normcase() also lowercases the path. So does the module for OS/2. In any case, Windows is not a file system. It is quite possible to have virtually any combination of case-destroying, case-preserving, -sensitive and -insensitive file systems on the one Windows system. Say, a FAT12 floppy, an NTFS partition, and an ext2 USB stick. Windows doesn't ship with native support for ext2, but that doesn't mean it can't be installed with third party drivers. normpath pays no attention to any of this, and just lowercases the path. At least that's cheap, and consistent, even if it solves the wrong problem :) -- Steven D'Aprano
On 05/10/2010 12:04, Steven D'Aprano wrote:
On Tue, 5 Oct 2010 07:21:15 pm Chris Withers wrote:
On 25/09/2010 04:25, Steven D'Aprano wrote:
1. Return the case of a filename in some canonical form which depends on the file system? 2. Return the case of a filename as it is actually stored on disk?
How do 1 and 2 differ?
Case #1 imposes a particular canonical form, regardless of what is actually stored on disk. It is similar to normpath, except that we could have different canonical forms depending on what the file system was. normpath merely generalises from the operating system, and never looks at the file system.
Ah, okay, yeah, that's actually an anti-goal for me ;-)
Case #2 says to actually look at the file and see what the file system considers it's name to be. Consider a NTFS file system. By default it is case-preserving and case-insensitive, although that can be changed. (Just because a file system is NTFS doesn't mean that will be case-insensitive. NTFS can also run in a POSIX mode which is case-sensitive. But I digress.)
Yeah, this is definitely where I think the missing use case lies...
FWIW, the use case that setuptools has (and for which it currently incorrectly uses normpath) is number 2.
4. Return the case of a filename in some arbitrarily-chosen canonical form which does not depend on the file system?
This is what normpath does, but only if you're on Windows ;-)
Not quite. macpath.normcase() also lowercases the path. So does the module for OS/2.
Interesting, since I develop on MacOS, Linux and Windows and only experienced the problem caused by setuptools normcase'ing distribution names on Windows. The MacOS case also isn't in the docs.
In any case, Windows is not a file system. It is quite possible to have virtually any combination of case-destroying, case-preserving, -sensitive and -insensitive file systems on the one Windows system. Say, a FAT12 floppy, an NTFS partition, and an ext2 USB stick. Windows doesn't ship with native support for ext2, but that doesn't mean it can't be installed with third party drivers.
yes, exactly!
normpath pays no attention to any of this, and just lowercases the path. At least that's cheap, and consistent, even if it solves the wrong problem :)
...and creates a few more along the way ;-) Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk
On 08/10/2010 09:41, Chris Withers wrote:
On 05/10/2010 12:04, Steven D'Aprano wrote:
On Tue, 5 Oct 2010 07:21:15 pm Chris Withers wrote:
On 25/09/2010 04:25, Steven D'Aprano wrote: [snip...] FWIW, the use case that setuptools has (and for which it currently incorrectly uses normpath) is number 2.
4. Return the case of a filename in some arbitrarily-chosen canonical form which does not depend on the file system?
This is what normpath does, but only if you're on Windows ;-)
Not quite. macpath.normcase() also lowercases the path. So does the module for OS/2.
Interesting, since I develop on MacOS, Linux and Windows and only experienced the problem caused by setuptools normcase'ing distribution names on Windows. The MacOS case also isn't in the docs.
Unless you're using Mac OS 9 you will be using posixpath and not macpath though. :-) Michael -- http://www.voidspace.org.uk/ READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer.
On 08 Oct, 2010,at 11:38 AM, Michael Foord
4. Return the case of a filename in some arbitrarily-chosen canonical form which does not depend on the file system? AFAIK this is what the function is supposed to do: return a platform-dependent canonical form of the filename. And that is hopelessly naive on modern systems, on both linux and OSX some file systems are case insensitive and others are not. The default for Linux is case sensitive, but some filesystems are not (VFAT, CIFS), and the default on OSX is case insensitive, but some filesystems are case sensitive (NFS, case sensitive HFS+)
Ronald
On 24 September 2010 23:43, Glenn Linderman
Hmm. There is no need for the function on a case sensitive file system, because the name had better be spelled with matching case: that is, if it is spelled with non-matching case it is an attempt to reference a non-existent file (or at least a different file).
On Linux, I don't believe there's a way to ask "is this filesystem case insensitive?" In fact, with userfs, I believe it's possible to do massively pathological things like having a filesystem which treats anagrams as the same file (foo is the same file as oof or ofo). (More realistically, MacOS does Unicode normalisation). Windows has (I believe) user definable filesystems, too, but the OS has "get me the real filename" style calls, which the filesystem should support, so no matter how nasty a filesystem implementer gets, he has to deal with his own mess :-) Paul.
Paul Moore writes:
In fact, with userfs, I believe it's possible to do massively pathological things like having a filesystem which treats anagrams as the same file (foo is the same file as oof or ofo). (More realistically, MacOS does Unicode normalisation).
Nitpick: Mac OS X doesn't do Unicode normalization. The default filesystem implementation does.
Paul Moore wrote:
Windows has (I believe) user definable filesystems, too, but the OS has "get me the real filename" style calls,
Does it really, though? The suggestions I've seen for doing this involve abusing the short/long filename translation machinery, and I'm not sure they're guaranteed to return the actual case rather than something that happens to work. -- Greg
On 25 September 2010 23:57, Greg Ewing
Paul Moore wrote:
Windows has (I believe) user definable filesystems, too, but the OS has "get me the real filename" style calls,
Does it really, though? The suggestions I've seen for doing this involve abusing the short/long filename translation machinery, and I'm not sure they're guaranteed to return the actual case rather than something that happens to work.
There's another call available. I've been too lazy to go and look it up, but I'll do so sometime today. Paul.
On 26 September 2010 09:01, Paul Moore
On 25 September 2010 23:57, Greg Ewing
wrote: Paul Moore wrote:
Windows has (I believe) user definable filesystems, too, but the OS has "get me the real filename" style calls,
Does it really, though? The suggestions I've seen for doing this involve abusing the short/long filename translation machinery, and I'm not sure they're guaranteed to return the actual case rather than something that happens to work.
There's another call available. I've been too lazy to go and look it up, but I'll do so sometime today.
Hmm, I can't find the one I was thinking of. GetLongFileName correctly sets the case of all but the final part, and FindFile can be used to find the last part, but that's not what I recall. GetFinalPathNameByHandle works, and is documented to do so, but (a) it works on an open file handle, so you need to open the file, and (b) it's Vista and later only... Paul.
On Sun, Sep 26, 2010 at 13:36, Paul Moore
Hmm, I can't find the one I was thinking of. GetLongFileName correctly sets the case of all but the final part, and FindFile can be used to find the last part, but that's not what I recall.
GetFinalPathNameByHandle works, and is documented to do so, but (a) it works on an open file handle, so you need to open the file, and (b) it's Vista and later only...
FWIW, here's what Mercurial uses to get the real path name on Windows: http://hg.intevation.org/mercurial/crew/file/66a07fb76ceb/mercurial/util.py#... (I don't know much about that code or this topic, but maybe someone finds it useful. It doesn't use any "special" Windows API, so if there is any, it's something the hg hackers don't know about.) Cheers, Dirkjan
On Sep 26, 2010, at 7:36 AM, Paul Moore wrote:
On 26 September 2010 09:01, Paul Moore
wrote: On 25 September 2010 23:57, Greg Ewing
wrote: Paul Moore wrote:
Windows has (I believe) user definable filesystems, too, but the OS has "get me the real filename" style calls,
Does it really, though? The suggestions I've seen for doing this involve abusing the short/long filename translation machinery, and I'm not sure they're guaranteed to return the actual case rather than something that happens to work.
There's another call available. I've been too lazy to go and look it up, but I'll do so sometime today.
Hmm, I can't find the one I was thinking of. GetLongFileName correctly sets the case of all but the final part, and FindFile can be used to find the last part, but that's not what I recall.
GetFinalPathNameByHandle works, and is documented to do so, but (a) it works on an open file handle, so you need to open the file, and (b) it's Vista and later only...
Were you thinking of SHGetFileInfo? http://stackoverflow.com/questions/74451/getting-actual-file-name-with-prope... James
On 26 September 2010 13:37, James Y Knight
Were you thinking of SHGetFileInfo?
http://stackoverflow.com/questions/74451/getting-actual-file-name-with-prope...
It wasn't, but it looks possible. Only gives the last component, though, so you still have to walk up the path components :-( I suspect I was thinking of GetLongFileName, which puts everything *but* the last component into the right case. I missed the problem with the last component :-( Paul.
On Sun, Sep 26, 2010 at 06:36, Paul Moore
On 26 September 2010 09:01, Paul Moore
wrote: On 25 September 2010 23:57, Greg Ewing
wrote: Paul Moore wrote:
Windows has (I believe) user definable filesystems, too, but the OS has "get me the real filename" style calls,
Does it really, though? The suggestions I've seen for doing this involve abusing the short/long filename translation machinery, and I'm not sure they're guaranteed to return the actual case rather than something that happens to work.
There's another call available. I've been too lazy to go and look it up, but I'll do so sometime today.
GetFinalPathNameByHandle works, and is documented to do so, but (a) it works on an open file handle, so you need to open the file, and (b) it's Vista and later only...
FYI, this is currently exposed as nt._getfinalpathname, and is used for os.path.samefile on Vista and beyond.
Greg Ewing
Maybe we could use a heuristic such as:
Your heuristics seem to assume there will only ever be a maximum of one match, which is false. I present the following example: $ ls foo/ bAr.dat BaR.dat bar.DAT
1) Search the directory for an exact match to the name given, return it if found.
And what if there are also matches for a case-insensitive search? e.g. searching for ‘foo/bar.DAT’ in the above example.
2) Look for a match ignoring case. If one is found, test it to see if it refers to the same file as the given path, and if so return it.
And what if several matches are found? e.g. searching for ‘foo/BAR.DAT’ in the above example.
3) Otherwise, raise an exception.
It seems to me this whole thing should be hashed out on ‘python-ideas’. -- \ “In case you haven't noticed, [the USA] are now almost as | `\ feared and hated all over the world as the Nazis were.” —Kurt | _o__) Vonnegut, 2004 | Ben Finney
I think searching a case-sensitive filename for a case-insensitive
match should not be offered as part of os.path. Apps that really want
to do things like """There is no file named "README", do you want to
use "Readme" instead?""" can write their own inefficient code, thank
you.
--Guido
On Fri, Sep 24, 2010 at 4:15 PM, Ben Finney
Greg Ewing
writes: Maybe we could use a heuristic such as:
Your heuristics seem to assume there will only ever be a maximum of one match, which is false. I present the following example:
$ ls foo/ bAr.dat BaR.dat bar.DAT
1) Search the directory for an exact match to the name given, return it if found.
And what if there are also matches for a case-insensitive search? e.g. searching for ‘foo/bar.DAT’ in the above example.
2) Look for a match ignoring case. If one is found, test it to see if it refers to the same file as the given path, and if so return it.
And what if several matches are found? e.g. searching for ‘foo/BAR.DAT’ in the above example.
3) Otherwise, raise an exception.
It seems to me this whole thing should be hashed out on ‘python-ideas’.
-- \ “In case you haven't noticed, [the USA] are now almost as | `\ feared and hated all over the world as the Nazis were.” —Kurt | _o__) Vonnegut, 2004 | Ben Finney
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)
Ben Finney wrote:
Your heuristics seem to assume there will only ever be a maximum of one match, which is false. I present the following example:
$ ls foo/ bAr.dat BaR.dat bar.DAT
There should perhaps be an extra step at the beginning: 0) Test whether the specified path refers to an existing file. If not, raise an exception. If that passes, and the file system is case-sensitive, then there must be a directory entry that is an exact match, so it will be returned by step 1. If the file system is case-insensitive, then there can be at most one entry that matches except for case, and it must be the one we're looking for, so there is no need for the extra test in step 2. So the revised algorithm is: 0) Test whether the specified path refers to an existing file. If not, raise an exception. 1) Search the directory for an exact match, return it if found. 2) Search for a match ignoring case, and return one if found. 3) Otherwise, raise an exception. There's also some prior art that might be worth looking at: On Windows, Python checks to see whether the file name of an imported module has the same case as the name being imported, which is a similar problem in some ways.
It seems to me this whole thing should be hashed out on ‘python-ideas’.
Good point -- I've redirected the discussion there. -- Greg
On Sep 24, 2010, at 10:53 AM, Paul Moore wrote:
On 24 September 2010 15:29, Guido van Rossum
wrote: I don't think we should try to reimplement what the filesystem does. I think we should just ask the filesystem (how exactly I haven't figured out yet but I expect it will be more OS-specific than filesystem-specific). It will have to be a new API -- normcase() at least is *intended* to return a case-flattened name on OSes where case-preserving filesystems are the default, and changing it to look at the filesystem would break too much code. For a new use case we need a new API.
I dug into this once, and as far as I could tell, it's possible to get the information on Windows, but there's no way on Linux to "ask the filesystem". From my researches, the standard interfaces a filesystem has to implement on Linux don't offer any means of asking this question.
Of course, (a) I'm no Linux expert so what do I know, and (b) it may well be possible to come up with a "good enough" solution by ignoring pathologically annoying theoretical cases.
I'm happy to provide Windows code if someone needs it. Paul
An OSX code sketch is available here (summary: call FSPathMakeRef to get an FSRef from a path string, then FSRefMakePath to make it back into a path, which will then have the correct case). And note that it only works if the file actually exists. http://stackoverflow.com/questions/370186/how-do-i-find-the-correct-case-of-... It would indeed be useful to have that be available in Python. James
On Sat, Sep 25, 2010 at 1:36 AM, James Y Knight
An OSX code sketch is available here (summary: call FSPathMakeRef to get an FSRef from a path string, then FSRefMakePath to make it back into a path, which will then have the correct case). And note that it only works if the file actually exists.
http://stackoverflow.com/questions/370186/how-do-i-find-the-correct-case-of-...
It would indeed be useful to have that be available in Python.
There is a much simpler way:
from Carbon import File File.FSRef('/tmp/foo').as_pathname() '/private/tmp/Foo'
Note that this is much slower compared to os.path.exists.
On 3 Oct 2010, at 02:35, Nir Soffer wrote:
On Sat, Sep 25, 2010 at 1:36 AM, James Y Knight
wrote: An OSX code sketch is available here (summary: call FSPathMakeRef to get an FSRef from a path string, then FSRefMakePath to make it back into a path, which will then have the correct case). And note that it only works if the file actually exists.
http://stackoverflow.com/questions/370186/how-do-i-find-the-correct-case-of-...
It would indeed be useful to have that be available in Python.
There is a much simpler way:
from Carbon import File File.FSRef('/tmp/foo').as_pathname() '/private/tmp/Foo'
Note that this is much slower compared to os.path.exists.
This won't work in py3k; the Carbon modules were removed in 3.0. A simpler alternative would probably be the F_GETPATH fcntl. An example: Python 3.1.2 (r312:79147, Jul 11 2010, 18:21:56) [GCC 4.2.1 (Apple Inc. build 5664)] on darwin Type "help", "copyright", "credits" or "license" for more information.
from fcntl import fcntl from os.path import basename, exists from os import remove
F_GETPATH = 50
if exists('/tmp/å'): ... remove('/tmp/å') ... open('/tmp/å', 'w').close() f = open(b'/tmp/A\xcc\x8a')
a = f.name b = fcntl(f, F_GETPATH, b'\0' * 1024).rstrip(b'\0')
a, b (b'/tmp/A\xcc\x8a', b'/private/tmp/\xc3\xa5') a.decode('utf-8'), b.decode('utf-8') ('/tmp/Å', '/private/tmp/å')
-- Dan Villiom Podlaski Christiansen danchr@gmail.com
On Oct 3, 2010, at 9:18 AM, Dan Villiom Podlaski Christiansen wrote:
A simpler alternative would probably be the F_GETPATH fcntl. An example:
That requires that you have permission to open the file (and to actually do so which might have other effects), while the File Manager's FSRef method does not. If Python adds a cross-platform function to do this canonicalization, users don't have to worry about how easy it is to invoke in pure-python... James
participants (18)
-
Antoine Pitrou
-
Ben Finney
-
Brian Curtin
-
Chris Withers
-
Dan Villiom Podlaski Christiansen
-
Dirkjan Ochtman
-
Glenn Linderman
-
Greg Ewing
-
Guido van Rossum
-
James Y Knight
-
Michael Foord
-
Ned Batchelder
-
Nir Soffer
-
Paul Moore
-
R. David Murray
-
Ronald Oussoren
-
Stephen J. Turnbull
-
Steven D'Aprano