[Python-ideas] PEP 540: Add a new UTF-8 mode
Oleg Broytman
phd at phdru.name
Fri Jan 6 14:12:16 EST 2017
On Fri, Jan 06, 2017 at 10:15:52AM +0900, INADA Naoki <songofacandy at gmail.com> wrote:
> >> Always use UTF-8
> >> ----------------
> >>
> >> Python already always use the UTF-8 encoding on Mac OS X, Android and Windows.
> >> Since UTF-8 became the defacto encoding, it makes sense to always use it on all
> >> platforms with any locale.
> >
> > Please don't! I use different locales and encodings, sometimes it's
> > utf-8, sometimes not - but I have properly configured LC_* settings and
> > I prefer Python to follow my command. It'd be disgusting if Python
> > starts to bend me to its preferences.
>
> For stdio (including console), PYTHONIOENCODING can be used for
> supporting legacy system.
> e.g. `export PYTHONIOENCODING=$(locale charmap)`
This means one more thing to reconfigure when I switch locales
instead of Python to catches up automatically.
> For commandline argument and filepath, UTF-8/surrogateescape can round trip.
> But mojibake may happens when pass the path to GUI.
>
> If we chose "Always use UTF-8 for fs encoding", I think
> PYTHONFSENCODING envvar should be
> added again. (It should be used from startup: decoding command line argument).
>
> >
> >> The risk is to introduce mojibake if the locale uses a different encoding,
> >> especially for locales other than the POSIX locale.
> >
> > There is no such risk for me as I already have mojibake in my
> > systems. Two most notable sources of mojibake are:
> >
> > 1) FTP servers - people create files (both names and content) in
> > different encodings; w32 FTP clients usually send file names and
> > content in cp1251 (Russian Windows encoding), sometimes in cp866
> > (Russian Windows OEM encoding).
> >
> > 2) MP3 tags and play lists - almost always cp1251.
> >
> > So whatever my personal encoding is - koi8-r or utf-8 - I have to
> > deal with file names and content in different encodings.
>
> 3) unzip zip file sent by Windows. Windows user use no-ASCII filenames, and
> create legacy (no UTF-8) zip file very often.
Good example, thank you! I forgot about it because I have wrote my
own zip.py and unzip.py that encode/decode filenames.
> I think people using non UTF-8 should solve encoding issue by themselves.
> People should use ASCII or UTF-8 always if they don't want to see mojibake.
Impossible. Even if I'd always use UTF-8 I still will receive a lot
of cp1251/cp866.
Oleg.
--
Oleg Broytman http://phdru.name/ phd at phdru.name
Programmers don't die, they just GOSUB without RETURN.
More information about the Python-ideas
mailing list