Python 3 is killing Python

Marko Rauhamaa marko at
Wed Jul 16 15:11:26 CEST 2014

Steven D'Aprano <steve+comp.lang.python at>:

> With a few exceptions, /etc is filled with text files, not binary
> files, and half the executables on the system are text (Python, Perl,
> bash, sh, awk, etc.).

Our debate seems to stem from a different idea of what text is. To me,
text in the Python sense is a sequence of UCS-4 character code points.
The opposite of text is not necessarily binary.

Most of those "text" files under /etc expect ASCII. In many contexts,
they tolerate UTF-8 or Latin-3 or whatever, but it's a bit iffy (how are
extra-ASCII passwords encoded in the /etc/shadow?). Also, the files
under /etc, /var/log etc should not depend on the locale since they are
typically interpreted by daemons, which typically don't possess locales.

> Relatively rare. Like, um, email, news, html, Unix config files,
> Windows ini files, source code in just about every language ever,
> SMSes, XML, JSON, YAML, instant messenger apps,

I would be especially wary of letting Python 3 interpret those files for
me. Python's [text] strings could be a wonderful tool on the inside of
my program, but I definitely would like to micromanage the I/O. Do I
obey the locale or not? That's too big (and painful) a question for
Python to answer on its own (and pretend like everything's under

> word processors... even *graphic* applications invariably have a text
> tool.

Thing is, the serious text utilities like word processors probably need
lots of ancillary information so Python's [text] strings might be too
naive to represent even a single character.

>> More often, len(b'λ') is what I want.
> Oh really? Are you sure? What exactly is b'λ'?

That's something that ought to work in the UTF-8 paradise.
Unfortunately, Python only allows ASCII in bytes. ASCII only! In this
day and age! Even C is not so picky:

   #include <stdio.h>

   int main()
       printf("Hyvää yötä\n");
       return 0;


More information about the Python-list mailing list