[New-bugs-announce] [issue45120] Windows cp encodings "UNDEFINED" entries update
Rafael Belo
report at bugs.python.org
Mon Sep 6 16:30:12 EDT 2021
New submission from Rafael Belo <rafaelblsilva at gmail.com>:
There is a mismatch in specification and behavior in some windows encodings.
Some older windows codepages specifications present "UNDEFINED" mapping, whereas in reality, they present another behavior which is updated in a section named "bestfit".
For example CP1252 has a corresponding bestfit1525:
CP1252: https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT
bestfit1525: https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt
>From which, in CP1252, bytes \x81 \x8d \x8f \x90 \x9d map to "UNDEFINED", whereas in bestfit1252, they map to \u0081 \u008d \u008f \u0090 \u009d respectively.
In the Windows API, the function 'MultiByteToWideChar' exhibits the bestfit1252 behavior.
This issue and PR proposes a correction for this behavior, updating the windows codepages where some code points where defined as "UNDEFINED" to the corresponding bestfit mapping.
Related issue: https://bugs.python.org/issue28712
----------
components: Demos and Tools, Library (Lib), Unicode, Windows
messages: 401181
nosy: ezio.melotti, lemburg, paul.moore, rafaelblsilva, steve.dower, tim.golden, vstinner, zach.ware
priority: normal
severity: normal
status: open
title: Windows cp encodings "UNDEFINED" entries update
type: behavior
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue45120>
_______________________________________
More information about the New-bugs-announce
mailing list