[New-bugs-announce] [issue42658] os.path.normcase() is inconsistent with Windows file system
report at bugs.python.org
Wed Dec 16 07:44:26 EST 2020
New submission from sogom <sogom at outlook.jp>:
On Windows file system, U+03A9 (Greek capital letter Omega) and U+2126 (Ohm sign) are distinguished. In fact, two distinct files "\u03A9.txt" and "\u2126.txt" can exist side by side in the same folder. But os.path.normcase() transforms both U+03A9 and U+2126 to U+03C9 (Greek small letter omega).
MSDN reads they use CompareStringOrdinal() to compare NTFS file names: https://docs.microsoft.com/en-us/windows/win32/intl/handling-sorting-in-your-applications#sort-strings-ordinally . This document also says "the function maps case using the operating system *uppercasing* table." But I made an experiment and found that at least in the Basic Multilingual Plane, "lowercase two strings by means of LCMapStringEx() and then wcscmp the two" always gives the same result as "compare the two strings with CompareStringOrdinal()". Though this fact is not explicitly mentioned in MSDN https://docs.microsoft.com/en-us/windows/win32/api/winnls/nf-winnls-lcmapstringex , the description of LCMAP_LINGUISTIC_CASING in this page implies that casing rules conform to file system's unless LCMAP_LINGUISTIC_CASING is used.
Therefore, I believe that os.path.normcase() should probably call LCMapStringEx(), with the first argument LOCALE_NAME_INVARIANT and the second argument LCMAP_LOWERCASE.
nosy: paul.moore, sogom, steve.dower, tim.golden, zach.ware
title: os.path.normcase() is inconsistent with Windows file system
versions: Python 3.9
Python tracker <report at bugs.python.org>
More information about the New-bugs-announce