[Python-ideas] Why decode()/encode() name is harmful

random832 at fastmail.us random832 at fastmail.us
Fri May 29 21:32:04 CEST 2015


On Fri, May 29, 2015, at 04:56, anatoly techtonik wrote:
> First, let me start with The Curse of Knowledge
> https://en.wikipedia.org/wiki/Curse_of_knowledge
> which can be summarized as:
> 
> "Once you get something, it becomes hard
> to think how it was to be without it".

Let's think about how it is to be without _the idea that text is a byte
stream in the first place_ - which some people here learned from Python
2, some learned from C, some may have learned from some other language.
It was the way things always were, after all, before Unicode came along.

The language I was using the most immediately before I started using
Python was C#. And C# uses Unicode (well, UTF-16, but the important
thing is that it's not an ASCII-compatible sequence of bytes) for
strings. One could argue that this paradigm - and the attendant "encode"
and "decode" concepts, and stream wrappers that take care of it in the
common cases, are _the future_, and that one day nobody will learn that
text's natural form is as a sequence of ASCII-compatible bytes... even
if text files continue to be encoded that way on the disk.

> Now imaging a person who has a text file. The
> person need to process that with Python. That
> person is probably a journalist and doesn't know
> anything that "any developer should know about
> unicode". In Python 2 he just copy pastes regular
> expressions to match the letter and is happy. In
> Python 3 he needs to *convert* that text to unicode.

You don't have to do so explicitly, if the text file's encoding matches
your locale. You can just open the file and read it, and it will open as
a text-mode stream that takes care of this for you and returns unicode
strings. It's a text file, so you open it in text mode.

Even if it doesn't match your locale, the proper way is to pass an
"encoding" argument to the open function; not to go so deep as to open
it in binary mode and decode the bytes yourself.


More information about the Python-ideas mailing list