[New-bugs-announce] [issue13560] Add PyUnicode_DecodeLocale and PyUnicode_DecodeLocaleAndSize

STINNER Victor report at bugs.python.org
Fri Dec 9 00:02:16 CET 2011

New submission from STINNER Victor <victor.stinner at haypocalc.com>:

To decode byte string from the locale encoding (LC_CTYPE), PyUnicode_DecodeFSDefault() can be used, but this function uses a constant encoding set at startup (the locale encoding at startup). The right method is currently to call _Py_char2wchar() and then PyUnicode_FromWideChar(). _Py_char2wchar() is a low level function, it doesn't raise nice Python exception, but just return NULL on error and write a message to stderr using fprintf() (!).

Attached patch adds PyUnicode_DecodeLocale() and PyUnicode_DecodeLocaleAndSize() to offer a high level API to decode data from the *current* locale encoding. These functions fail with an OSError  or MemoryError if decoding fails (instead of a generic ValueError), and don't write to stderr anymore. They are a surrogateescape argument to choose to escape undecodable bytes or to fail with an error.

The patch only uses the function in _localemodule.c, but other functions may have to be fixed to use the new function. The tzname_encoding.patch of issue #5905 should maybe use it for example.

components: Unicode
messages: 149060
nosy: ezio.melotti, haypo, loewis
priority: normal
severity: normal
status: open
title: Add PyUnicode_DecodeLocale and PyUnicode_DecodeLocaleAndSize
versions: Python 3.3

Python tracker <report at bugs.python.org>

More information about the New-bugs-announce mailing list