[issue34512] Document platform-specific strftime() behavior for non-ASCII format strings
![](https://secure.gravatar.com/avatar/fa0f7819f1825f596b384c19aa7dcf33.jpg?s=120&d=mm&r=g)
New submission from Alexey Izbyshev <izbyshev@ispras.ru>: If a format string contains code points outside of ASCII range, time.strftime() can behave in four different ways depending on the platform, the current locale and the code points: * raise a UnicodeEncodeError * return an empty string * for surrogates in \uDC80-\uDCFF range, replace them with different code points in the output (potentially mangling nearby parts of the output as well) * round-trip them correctly Some examples: * Linux (glibc 2.27): Python 3.6.4 (default, Jan 03 2018, 13:52:55) [GCC] on linux Type "help", "copyright", "credits" or "license" for more information.
import time, locale locale.getlocale() ('en_US', 'UTF-8') time.strftime('\x80') '\x80' time.strftime('\u044f') 'я' # '\u044f' time.strftime('\ud800') '\ud800' time.strftime('\udcff') '\udcff' locale.setlocale(locale.LC_CTYPE, 'C') 'C' time.strftime('\x80') '\x80' time.strftime('\u044f') 'я' # '\u044f' time.strftime('\ud800') '\ud800' time.strftime('\udcff') '\udcff'
* macOS 10.13.6 and FreeBSD 11.1: Python 3.7.0 (default, Jul 23 2018, 20:22:55) [Clang 9.1.0 (clang-902.0.39.2)] on darwin Type "help", "copyright", "credits" or "license" for more information.
import time, locale locale.getlocale() ('en_US', 'UTF-8') time.strftime('\x80') '\x80' time.strftime('\u044f') 'я' # '\u044f' time.strftime('\ud800') '' time.strftime('\udcff') '' locale.setlocale(locale.LC_CTYPE, 'C') 'C' time.strftime('\x80') '\x80' time.strftime('\u044f') '' time.strftime('\ud800') '' time.strftime('\udcff') ''
* Windows 8.1: Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 27 2018, 04:59:51) [MSC v.1914 64 bit (AMD64)] on win32
import time, locale locale.getlocale() (None, None) time.strftime('\x80') '\x80' time.strftime('\u044f') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'locale' codec can't encode character '\u044f' in position 0: encoding error time.strftime('\ud800') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'locale' codec can't encode character '\ud800' in position 0: encoding error time.strftime('\udcff') 'y' # '\xff' locale.setlocale(locale.LC_CTYPE, '') 'Russian_Russia.1251' time.strftime('\x80') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'locale' codec can't encode character '\x80' in position 0: encoding error time.strftime('\u044f') 'я' # '\u044f' time.strftime('\ud800') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'locale' codec can't encode character '\ud800' in position 0: encoding error time.strftime('\udcff') 'я' # '\u044f'
The reasons of such differences are the following: * Reliance on either wcsftime() or strftime() from the C library depending on the platform. * For strftime(), the input is encoded into the charset of the current locale with 'surrogateescape' error handler, and the output is decoded back in the same way. * Different handling of code points which are unrepresentable in the charset of the current locale by glibc and macOS/FreeBSD. I suggest to at least document that the format string, despite being an 'str', requires special care if it contains non-ASCII code points. The 'datetime' module docs warn about the locale-dependent output, but only with regard to particular format specifiers [1]. I'll submit a draft PR. Suggestions are welcome. [1] https://docs.python.org/3.7/library/datetime.html#strftime-and-strptime-beha... ---------- assignee: docs@python components: Documentation messages: 324136 nosy: belopolsky, docs@python, izbyshev, p-ganssle, taleinat priority: normal severity: normal status: open title: Document platform-specific strftime() behavior for non-ASCII format strings type: enhancement versions: Python 3.6, Python 3.7, Python 3.8 _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue34512> _______________________________________
![](https://secure.gravatar.com/avatar/fa0f7819f1825f596b384c19aa7dcf33.jpg?s=120&d=mm&r=g)
Change by Alexey Izbyshev <izbyshev@ispras.ru>: ---------- keywords: +patch pull_requests: +8424 stage: -> patch review _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue34512> _______________________________________
![](https://secure.gravatar.com/avatar/fa0f7819f1825f596b384c19aa7dcf33.jpg?s=120&d=mm&r=g)
Change by Tal Einat <taleinat@gmail.com>: ---------- versions: +Python 2.7, Python 3.5 _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue34512> _______________________________________
![](https://secure.gravatar.com/avatar/fa0f7819f1825f596b384c19aa7dcf33.jpg?s=120&d=mm&r=g)
Change by Karthikeyan Singaravelan <tir.karthi@gmail.com>: ---------- nosy: +xtreak _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue34512> _______________________________________
![](https://secure.gravatar.com/avatar/fa0f7819f1825f596b384c19aa7dcf33.jpg?s=120&d=mm&r=g)
Tal Einat <taleinat@gmail.com> added the comment: New changeset 1cffd0eed313011c0c2bb071c8affeb4a7ed05c7 by Tal Einat (Alexey Izbyshev) in branch 'master': bpo-34512: Document platform-specific strftime() behavior for non-ASCII format strings (GH-8948) https://github.com/python/cpython/commit/1cffd0eed313011c0c2bb071c8affeb4a7e... ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue34512> _______________________________________
![](https://secure.gravatar.com/avatar/fa0f7819f1825f596b384c19aa7dcf33.jpg?s=120&d=mm&r=g)
Change by miss-islington <mariatta.wijaya+miss-islington@gmail.com>: ---------- pull_requests: +11134 _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue34512> _______________________________________
![](https://secure.gravatar.com/avatar/fa0f7819f1825f596b384c19aa7dcf33.jpg?s=120&d=mm&r=g)
Change by miss-islington <mariatta.wijaya+miss-islington@gmail.com>: ---------- pull_requests: +11134, 11135 _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue34512> _______________________________________
![](https://secure.gravatar.com/avatar/fa0f7819f1825f596b384c19aa7dcf33.jpg?s=120&d=mm&r=g)
Change by miss-islington <mariatta.wijaya+miss-islington@gmail.com>: ---------- pull_requests: +11134, 11135, 11136 _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue34512> _______________________________________
![](https://secure.gravatar.com/avatar/fa0f7819f1825f596b384c19aa7dcf33.jpg?s=120&d=mm&r=g)
Change by miss-islington <mariatta.wijaya+miss-islington@gmail.com>: ---------- pull_requests: +11134, 11135, 11136, 11137 _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue34512> _______________________________________
![](https://secure.gravatar.com/avatar/fa0f7819f1825f596b384c19aa7dcf33.jpg?s=120&d=mm&r=g)
Change by miss-islington <mariatta.wijaya+miss-islington@gmail.com>: ---------- pull_requests: +11135, 11136, 11137, 11139 _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue34512> _______________________________________
![](https://secure.gravatar.com/avatar/fa0f7819f1825f596b384c19aa7dcf33.jpg?s=120&d=mm&r=g)
Change by miss-islington <mariatta.wijaya+miss-islington@gmail.com>: ---------- pull_requests: +11135, 11136, 11137, 11138, 11139 _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue34512> _______________________________________
![](https://secure.gravatar.com/avatar/fa0f7819f1825f596b384c19aa7dcf33.jpg?s=120&d=mm&r=g)
miss-islington <mariatta.wijaya+miss-islington@gmail.com> added the comment: New changeset 678c5c07521caca809b1356d954975e6234c49ae by Miss Islington (bot) in branch '3.7': bpo-34512: Document platform-specific strftime() behavior for non-ASCII format strings (GH-8948) https://github.com/python/cpython/commit/678c5c07521caca809b1356d954975e6234... ---------- nosy: +miss-islington _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue34512> _______________________________________
![](https://secure.gravatar.com/avatar/fa0f7819f1825f596b384c19aa7dcf33.jpg?s=120&d=mm&r=g)
miss-islington <mariatta.wijaya+miss-islington@gmail.com> added the comment: New changeset 77b80c956f39df34722bd8646cf5b83d149832c4 by Miss Islington (bot) in branch '2.7': bpo-34512: Document platform-specific strftime() behavior for non-ASCII format strings (GH-8948) https://github.com/python/cpython/commit/77b80c956f39df34722bd8646cf5b83d149... ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue34512> _______________________________________
![](https://secure.gravatar.com/avatar/fa0f7819f1825f596b384c19aa7dcf33.jpg?s=120&d=mm&r=g)
Change by Tal Einat <taleinat@gmail.com>: ---------- pull_requests: -11135 _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue34512> _______________________________________
![](https://secure.gravatar.com/avatar/fa0f7819f1825f596b384c19aa7dcf33.jpg?s=120&d=mm&r=g)
Change by Tal Einat <taleinat@gmail.com>: ---------- pull_requests: -11135, 11137 _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue34512> _______________________________________
![](https://secure.gravatar.com/avatar/fa0f7819f1825f596b384c19aa7dcf33.jpg?s=120&d=mm&r=g)
Change by Tal Einat <taleinat@gmail.com>: ---------- pull_requests: -11135, 11137, 11139 _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue34512> _______________________________________
![](https://secure.gravatar.com/avatar/fa0f7819f1825f596b384c19aa7dcf33.jpg?s=120&d=mm&r=g)
Change by Tal Einat <taleinat@gmail.com>: ---------- resolution: -> fixed stage: patch review -> resolved status: open -> closed versions: -Python 3.5, Python 3.6 _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue34512> _______________________________________
![](https://secure.gravatar.com/avatar/fa0f7819f1825f596b384c19aa7dcf33.jpg?s=120&d=mm&r=g)
STINNER Victor <vstinner@redhat.com> added the comment: A solution to make time.strftime() more portable would be to split the format string, format each "%xxx" substring separately but don't pass substrings between "%xxx" to strftime(). There is a similar discussion about trailing "%": bpo-35066. ---------- nosy: +vstinner _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue34512> _______________________________________
participants (5)
-
Alexey Izbyshev
-
Karthikeyan Singaravelan
-
miss-islington
-
STINNER Victor
-
Tal Einat