Strange UnicodeEncodeError in Windows image on Azure DevOps and Github

Eryk Sun eryksun at gmail.com
Fri Nov 11 19:10:47 EST 2022


On 11/10/22, Jessica Smith <12jessicasmith34 at gmail.com> wrote:
>
> Weird issue I've found on Windows images in Azure Devops Pipelines and
> Github actions. Printing Unicode characters fails on these images because,
> for some reason, the encoding is mapped to cp1252. What is particularly
> weird about the code page being set to 1252 is that if you execute "chcp"
> it shows that the code page is 65001.

If stdout isn't a console (e.g. a pipe), it defaults to using the
process code page (i.e. CP_ACP), such as legacy code page 1252
(extended Latin-1). You can override just sys.std* to UTF-8 by setting
the environment variable `PYTHONIOENCODING=UTF-8`. You can override
all I/O to use UTF-8 by setting `PYTHONUTF8=1`, or by passing the
command-line option `-X utf8`.

Background

The locale system in Windows supports a common system locale, plus a
separate locale for each user. By default the process code page is
based on the system locale, and the thread code page (i.e.
CP_THREAD_ACP) is based on the user locale. The default locale of the
Universal C runtime combines the user locale with the process code
page. (This combination may be inconsistent.)

In Windows 10 and later, the default process and thread code pages can
be configured to use CP_UTF8 (65001). Applications can also override
them to UTF-8 in their manifest via the "ActiveCodePage" setting. In
either case, if the process code page is UTF-8, the C runtime will use
UTF-8 for its default locale encoding (e.g. "en_uk.utf8").

Unlike some frameworks, Python has never used the console input code
page or output code page as a locale encoding. Personally, I wouldn't
want Python to default to that old MS-DOS behavior. However, I'd be in
favor of supporting a "console" encoding that's based on the console
input code page that's returned by GetConsoleCP(). If the process
doesn't have a console session, the "console" encoding would fall back
on the process code page from GetACP().


More information about the Python-list mailing list