[Python-Dev] Python3 "complexity"

Thu Jan 9 10:19:15 CET 2014

On 9 January 2014 09:01, Mark Shannon <mark at hotpy.org> wrote:
> On 09/01/14 00:07, Ben Finney wrote:
>>
>> Kristján Valur Jónsson <kristjan at ccpgames.com> writes:
>>
>>> Believe it or not, sometimes you really don't care about encodings.
>>> Sometimes you just want to parse text files.
>>
>>
>> Files don't contain text, they contain bytes. Bytes only become text
>> when filtered through the correct encoding.
>>
> I'm glad someone pointed this out.

Try working on Windows with Powershell as your default shell for a
while. You learn that message *very* fast. You end up with a mix of
CP1250 and UTF-16 files, and you can no longer even assume that a file
of "simple text" is in an ASCII-compatible encoding. After tools like
grep fail to work often enough, you get a really strong sense of why
knowing the encoding matters (and you feel this urge to rewrite all
the GNU tools in Python 3 ;-)). And that's on a single PC in an
English-speaking locale :-( (You also get this fun with the £ sign
being encoded differently in the console and the GUI). So it's not
just people that "use funny foreign languages" (apologies to 99% of
the globe for that :-)) who are affected. I assume Kristján knows all
this, given the "á" in his name :-)

But certainly just using open without specifying an encoding has
always served me fine in Python 3, in the sense that it does at least
as well as Python 2  So I think that if this discussion is to be of
any real benefit, a specific example is needed. I honestly don't think
I've ever encountered a case where "Sometimes [I] just want to parse
text files" and code that uses the default encoding (i.e., looks
pretty much identical to Python 2) has *failed* to do the job for me.

PEP460 is addressing a very specific use case, and certainly isn't for
"just parsing text files" - at least as I understand it.

Paul.