[Python-Dev] Python 3.0.1 (io-in-c)

Wed Jan 28 12:19:50 CET 2009

2009/1/28 Antoine Pitrou <solipsis at pitrou.net>:
> When writing large chunks of text (4096, 1e6), bookkeeping costs become
> marginal and encoding costs dominate. 2.x has no encoding costs, which
> explains why it's so much faster.

Interesting. However, it's still "slower" in terms of perception. In
2.x, I regularly do the equivalent of

    f = open("filename", "r")
    ... read strings from f ...

Yes, I know this is byte I/O in reality, but for everything I do
(Latin-1 on input and output, and for most practical purposes
ASCII-only) it simply isn't relevant to me.

If Python 3.x makes this substantially slower (working in a naive mode
where I ignore encoding issues), claiming it's "encoding costs"
doesn't make any difference - in a practical sense, I don't get any
benefits and yet I pay the cost. (You can say my approach is wrong,
but so what? I'll just say that 2.x is faster for me, and not migrate.
Ultimately, this is about "marketing" 3.x...)

It would be helpful to limit this cost as much as possible - maybe
that's simply ensuring that the default encoding for open is (in the
majority of cases) a highly-optimised one whose costs *don't* dominate
in the way you describe (although if you're using UTF-8, I'd guess
that would be the usual default on Linux, so it looks like there's
some work needed there). Hmm, I just checked and on Windows, it
appears that sys.getdefaultencoding() is UTF-8. That seems odd - I
would have thought the majority of Windows systems were NOT set to use
UTF-8 by default...

Paul.