[Python-ideas] Python 3000 TIOBE -3%

Paul Moore p.f.moore at gmail.com
Sun Feb 12 00:14:23 CET 2012


On 11 February 2012 17:00, Masklinn <masklinn at masklinn.net> wrote:
>> Good example. I believe adding ", encoding='latin-1'" to open() is sufficient.
>
> Why not open the file in binary mode in stead? (and replace `'*'` by `b'*'` in
> the startswith call)

In my view, that's less scalable to more complex cases. It's likely
you'll hit things you need to do that don't translate easily to bytes
sooner than if you stick in a string-only world. A simple example,
check for a regex rather than a simple starting character.

The problem I have with encoding="latin-1" is that in many cases I
*know* that's a lie. From what's been said in this discussion so far,
I think that the "better" way to say "I know this file contains mostly
ASCII, but there's some other bits I'm not sure about but don't care
too much as long as they round-trip cleanly" is
encoding="ascii",errors="surrogateescape". But as we've seen here,
that's not the idiom that gets recommended by everyone (the "One
Obvious Way", if you like).

I suspect that if the community did embrace a "one obvious way", that
would reduce the "Python 3 makes me need to know Unicode" FUD that's
around. But as long as people get 3 different answers when they ask
the question,  there's going to be uncertainty and doubt (and hence,
probably, fear...)

Paul.

PS I'm pretty confident that I have *my* answer now
(ascii/surrogateescape). So this thread was of benefit to me, if
nothing else, and my thanks for that.



More information about the Python-ideas mailing list