[New-bugs-announce] [issue45120] Windows cp encodings "UNDEFINED" entries update

Rafael Belo report at bugs.python.org
Mon Sep 6 16:30:12 EDT 2021

New submission from Rafael Belo <rafaelblsilva at gmail.com>:

There is a mismatch in specification and behavior in some windows encodings.

Some older windows codepages specifications present "UNDEFINED" mapping, whereas in reality, they present another behavior which is updated in a section named "bestfit".

For example CP1252 has a corresponding bestfit1525: 
CP1252: https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT
bestfit1525: https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt

>From which, in CP1252, bytes \x81 \x8d \x8f \x90 \x9d map to "UNDEFINED", whereas in bestfit1252, they map to \u0081 \u008d \u008f \u0090 \u009d respectively. 

In the Windows API, the function 'MultiByteToWideChar' exhibits the bestfit1252 behavior.

This issue and PR proposes a correction for this behavior, updating the windows codepages where some code points where defined as "UNDEFINED" to the corresponding bestfit mapping. 

Related issue: https://bugs.python.org/issue28712

components: Demos and Tools, Library (Lib), Unicode, Windows
messages: 401181
nosy: ezio.melotti, lemburg, paul.moore, rafaelblsilva, steve.dower, tim.golden, vstinner, zach.ware
priority: normal
severity: normal
status: open
title: Windows cp encodings "UNDEFINED" entries update
type: behavior

Python tracker <report at bugs.python.org>

More information about the New-bugs-announce mailing list