[New-bugs-announce] [issue42658] os.path.normcase() is inconsistent with Windows file system

sogom report at bugs.python.org
Wed Dec 16 07:44:26 EST 2020

New submission from sogom <sogom at outlook.jp>:

On Windows file system, U+03A9 (Greek capital letter Omega) and U+2126 (Ohm sign) are distinguished. In fact, two distinct files "\u03A9.txt" and "\u2126.txt" can exist side by side in the same folder. But os.path.normcase() transforms both U+03A9 and U+2126 to U+03C9 (Greek small letter omega).

MSDN reads they use CompareStringOrdinal() to compare NTFS file names: https://docs.microsoft.com/en-us/windows/win32/intl/handling-sorting-in-your-applications#sort-strings-ordinally . This document also says "the function maps case using the operating system *uppercasing* table." But I made an experiment and found that at least in the Basic Multilingual Plane, "lowercase two strings by means of LCMapStringEx() and then wcscmp the two" always gives the same result as "compare the two strings with CompareStringOrdinal()". Though this fact is not explicitly mentioned in MSDN https://docs.microsoft.com/en-us/windows/win32/api/winnls/nf-winnls-lcmapstringex , the description of LCMAP_LINGUISTIC_CASING in this page implies that casing rules conform to file system's unless LCMAP_LINGUISTIC_CASING is used.

Therefore, I believe that os.path.normcase() should probably call LCMapStringEx(), with the first argument LOCALE_NAME_INVARIANT and the second argument LCMAP_LOWERCASE.

components: Windows
messages: 383163
nosy: paul.moore, sogom, steve.dower, tim.golden, zach.ware
priority: normal
severity: normal
status: open
title: os.path.normcase() is inconsistent with Windows file system
type: behavior
versions: Python 3.9

Python tracker <report at bugs.python.org>

More information about the New-bugs-announce mailing list