filename comparison [was] Re: PEP 428 - object-oriented filesystem paths

On 10/8/12, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
I can think of several, but when I thought a bit harder, they were mostly bug attractors. If I want my program (or a dict) to know that "CONFIG" and "config" are the same, then I also want it to know that "My Documents" is the same as "MYDOCU~1".* Ideally, I would also have a way to find out that a pathname is likely to be problematic for cross-platform uses, or at least whether two specific pathnames are known to be collision-prone on existing platforms other than mine. (But I'm not sure that sort of test can be reliable enough for the stdlib. Would just check for caseless equality, reserved Windows names, and non-alphanumeric characters in the filename?) *(Well, assuming it is. The short name depends on the history of the directory.) -jJ

On Tue, Oct 16, 2012 at 6:21 AM, Jim Jewett <jimjjewett@gmail.com> wrote:
I'd forgotten about it until reading this, but I think you can get into trouble with Unicode normalisation as well - so, I think we can safely dismiss this as an irrelevant tangent and just stick with Antoine's basic Windows vs Posix distinction. If need be, the strategies can be exposed at a later date (via keyword-only arguments) if we come up with a more convincing use case. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, Oct 15, 2012 at 04:21:50PM -0400, Jim Jewett wrote:
Well, perhaps you do, but those not using Windows are unlikely to care about DOS short names. However, they may care about some other form of short name. E.g. on iso9660 file systems (CDs) long names are just truncated; if two truncated names clash, the second and subsequent file is given a three digit suffix: this-is-a-long-file THIS-IS-A-LONG-NAME My Documents get renamed to: THIS_IS_ THIS_000 MY_DOCUM although my Linux computer displays those names in lower case. The Rock Ridge and Joliet extensions can record the unmangled file names, but not all CDs use them. It is not the case that all case-insensitive file systems necessarily support DOS short names. There are file systems that don't support long names at all, there are case-insensitive file systems that preserve case, and those that don't. It's not even necessarily so that Windows is always case-insensitive: http://support.microsoft.com/kb/929110 -- Steven

On Tue, Oct 16, 2012 at 6:21 AM, Jim Jewett <jimjjewett@gmail.com> wrote:
I'd forgotten about it until reading this, but I think you can get into trouble with Unicode normalisation as well - so, I think we can safely dismiss this as an irrelevant tangent and just stick with Antoine's basic Windows vs Posix distinction. If need be, the strategies can be exposed at a later date (via keyword-only arguments) if we come up with a more convincing use case. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, Oct 15, 2012 at 04:21:50PM -0400, Jim Jewett wrote:
Well, perhaps you do, but those not using Windows are unlikely to care about DOS short names. However, they may care about some other form of short name. E.g. on iso9660 file systems (CDs) long names are just truncated; if two truncated names clash, the second and subsequent file is given a three digit suffix: this-is-a-long-file THIS-IS-A-LONG-NAME My Documents get renamed to: THIS_IS_ THIS_000 MY_DOCUM although my Linux computer displays those names in lower case. The Rock Ridge and Joliet extensions can record the unmangled file names, but not all CDs use them. It is not the case that all case-insensitive file systems necessarily support DOS short names. There are file systems that don't support long names at all, there are case-insensitive file systems that preserve case, and those that don't. It's not even necessarily so that Windows is always case-insensitive: http://support.microsoft.com/kb/929110 -- Steven
participants (3)
-
Jim Jewett
-
Nick Coghlan
-
Steven D'Aprano