[Python-ideas] PEP 540: Add a new UTF-8 mode
INADA Naoki
songofacandy at gmail.com
Wed Jan 11 03:17:46 EST 2017
Here is one example of locale pitfall.
---
# from http://unix.stackexchange.com/questions/169739/why-is-coreutils-sort-slower-than-python
$ cat letters.py
import string
import random
def main():
for _ in range(1_000_000):
c = random.choice(string.ascii_letters)
print(c)
main()
$ python3 letters.py > letters.txt
$ LC_ALL=C time sort letters.txt > /dev/null
0.35 real 0.32 user 0.02 sys
$ LC_ALL=C.UTF-8 time sort letters.txt > /dev/null
0.36 real 0.33 user 0.02 sys
$ LC_ALL=ja_JP.UTF-8 time sort letters.txt > /dev/null
11.03 real 10.95 user 0.04 sys
$ LC_ALL=en_US.UTF-8 time sort letters.txt > /dev/null
11.05 real 10.97 user 0.04 sys
---
This is why some engineer including me use C locale on Linux,
at least when there are no C.UTF-8 locale.
Off course, we can use LC_CTYPE=en_US.UTF-8, instead of LANG or LC_ALL.
(I wonder if we can use LC_CTYPE=UTF-8...)
But I dislike current situation that "people should learn
how to configure locale properly, and pitfall of non-C locale, only for
using UTF-8 on Python".
More information about the Python-ideas
mailing list