[Python-ideas] Python 3000 TIOBE -3%

Steven D'Aprano steve at pearwood.info
Thu Feb 16 05:08:39 CET 2012

On Thu, Feb 16, 2012 at 02:37:12PM +1300, Greg Ewing wrote:
> On 16/02/12 02:39, Oleg Broytman wrote:
> >On Wed, Feb 15, 2012 at 11:15:36AM +1100, Ben Finney wrote:
> >>If people want to remain wilfully ignorant of text encoding in the third
> >>millennium
> >
> >    This returns us to the very beginning of the thread. The original
> >complain was: Python3 requires users to learn too much about unicode,
> >more than they really need.
> I don't think it's helpful to label everyone who wants to use the
> techniques being discussed here as lazy or ignorant. As we've seen,
> there are cases where you truly *can't* know the true encoding,
> and at the same time it *doesn't matter*, because all you want to
> do is treat the unknown bytes as opaque data. To tell someone in
> that position that they're being lazy is both wrong and insulting.

In fairness, this thread was originally started with the scenario "I'm 
reading files which are only mostly ASCII, but I don't want to learn 
about Unicode" rather than "I know about Unicode, but it doesn't help me 
in this situation because the encoding truly is unknown". So wilful 
ignorance does apply, at least in the use-case the thread started with. 
(If it helps, think of them as too busy to learn, not too lazy.)

If you already know about Unicode, then you probably don't need to be 
given a simple recipe to follow, because you probably already have a 
solution that works for you.

Which brings us back to the original use-case: 

"I have a file which is only mostly ASCII, and I don't care to learn 
about Unicode at this time to deal with it. I need a recipe I can 
follow that will do the right-thing so I can continue to ignore the 
issue for a little longer."

I don't think that we should either insist that these people be forced 
to learn Unicode, nor expect to be able to solve every possible problem 
they might find. 

A couple of recipes in the FAQs, and discussion of why you 
might prefer one to the other, should be able to cover most simple 

open(filename, encoding='ascii', errors='surrogateescape')
open(filename, encoding='latin1')

Both recipes hint at the wider world of encodings and error handlers, 
hence act as a non-threatening introduction to Unicode.


More information about the Python-ideas mailing list