[Python-ideas] Python 3000 TIOBE -3%

Mon Feb 13 09:12:43 CET 2012

On 13 February 2012 05:12, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Paul Moore writes:
>
>  > I'm now 100% convinced that
>  > encoding="ascii",errors="surrogateescape" is the way to say this in
>  > code.
>
> It probably is, for you.  If that ever gives you a UnicodeError, you
> know how to find out how to deal with it.  And it probably won't.<wink/>

And yet, after your earlier posting on latin-1, and your comments
here, I'm less certain. Thank you so much :-)

Seriously, I find these discussions about Unicode immensely useful. I
now have a much better feel for how to deal with (and think about)
text in "unknown but mostly ASCII" format, which can only be a good
thing.

> I don't think either argument applies to everybody who needs such a
> recipe, though.  Many will be best served with encoding='latin-1' by
> some name.

Probably the key question is, how do we encapsulate this debate in a
simple form suitable for people to find out about *without* feeling
like they "have to learn all about Unicode"? A note in the Unicode
HOWTO seems worthwhile, but how to get people to look there? Given
that this is people who don't want to delve too deeply into Unicode
issues.

Just to be clear, my reluctance to "do the right thing" was *not*
because I didn't want to understand Unicode - far from it, I'm
interested in, and inclined towards, "doing Unicode right". The
problem is that I know enough to realise that "proper" handling of
files where I don't know the encoding, and it seems to be inconsistent
sometimes (both between files, and even on occasion within a file), is
a seriously hard issue. And I don't want to get into really hard
Unicode issues for what, in practical terms, is a simple problem as
it's one-off code and minor corruption isn't really an issue.

Paul.