[Python-ideas] Python 3000 TIOBE -3%
steve at pearwood.info
Thu Feb 16 05:08:39 CET 2012
On Thu, Feb 16, 2012 at 02:37:12PM +1300, Greg Ewing wrote:
> On 16/02/12 02:39, Oleg Broytman wrote:
> >On Wed, Feb 15, 2012 at 11:15:36AM +1100, Ben Finney wrote:
> >>If people want to remain wilfully ignorant of text encoding in the third
> > This returns us to the very beginning of the thread. The original
> >complain was: Python3 requires users to learn too much about unicode,
> >more than they really need.
> I don't think it's helpful to label everyone who wants to use the
> techniques being discussed here as lazy or ignorant. As we've seen,
> there are cases where you truly *can't* know the true encoding,
> and at the same time it *doesn't matter*, because all you want to
> do is treat the unknown bytes as opaque data. To tell someone in
> that position that they're being lazy is both wrong and insulting.
In fairness, this thread was originally started with the scenario "I'm
reading files which are only mostly ASCII, but I don't want to learn
about Unicode" rather than "I know about Unicode, but it doesn't help me
in this situation because the encoding truly is unknown". So wilful
ignorance does apply, at least in the use-case the thread started with.
(If it helps, think of them as too busy to learn, not too lazy.)
If you already know about Unicode, then you probably don't need to be
given a simple recipe to follow, because you probably already have a
solution that works for you.
Which brings us back to the original use-case:
"I have a file which is only mostly ASCII, and I don't care to learn
about Unicode at this time to deal with it. I need a recipe I can
follow that will do the right-thing so I can continue to ignore the
issue for a little longer."
I don't think that we should either insist that these people be forced
to learn Unicode, nor expect to be able to solve every possible problem
they might find.
A couple of recipes in the FAQs, and discussion of why you
might prefer one to the other, should be able to cover most simple
open(filename, encoding='ascii', errors='surrogateescape')
Both recipes hint at the wider world of encodings and error handlers,
hence act as a non-threatening introduction to Unicode.
More information about the Python-ideas