[Python-ideas] Python 3000 TIOBE -3%

Thu Feb 16 14:44:25 CET 2012

Paul Moore wrote:
> On 16 February 2012 04:08, Steven D'Aprano <steve at pearwood.info> wrote:
>> On 16/02/12 02:39, Oleg Broytman wrote:
>>> I don't think it's helpful to label everyone who wants to use the
>>> techniques being discussed here as lazy or ignorant. As we've seen,
>>> there are cases where you truly *can't* know the true encoding,
>>> and at the same time it *doesn't matter*, because all you want to
>>> do is treat the unknown bytes as opaque data. To tell someone in
>>> that position that they're being lazy is both wrong and insulting.
>> In fairness, this thread was originally started with the scenario "I'm
>> reading files which are only mostly ASCII, but I don't want to learn
>> about Unicode" rather than "I know about Unicode, but it doesn't help me
>> in this situation because the encoding truly is unknown". So wilful
>> ignorance does apply, at least in the use-case the thread started with.
>> (If it helps, think of them as too busy to learn, not too lazy.)
> 
> As the person who started the thread with this use case, I'd dispute
> that description of what I said.

I am sorry, I spoke poorly. Apologies if you feel I misrepresented you.

To be honest, this thread has been so large, and so rambling, and covering so 
much ground, I have no idea what the *actual* first mention of encoding 
related issues was. The oldest I can find was Giampaolo Rodolà on 9 Feb 2012 
20:16:00 +0100:

     I bet a lot of people don't want to upgrade for another reason:
     unicode. The impression I got is that python 3 forces the user to
     use and *understand* unicode and a lot of people simply don't want
     to deal with that.

two days before the first post from you mentioning encoding issues that I can 
find. Another mention of a similar use-case was by Stephen J Turnbull on 10 
Feb 2012 17:41:21 +0900:

     True, if one sticks to pure ASCII, there's no difference to notice,
     but that's just not possible for people who live outside of the U.S.,
     or who share text with people outside of the U.S.  They need currency
     symbols, they have friends whose names have little dots on them.
     Every single one of those is a backtrace waiting to happen.  A
     backtrace on

         f = open('text-file.txt')
         for line in f: pass

     is an imposition.  That doesn't happen in 2.x (for the wrong reasons,
     but it's very convenient 95% of the time).

     This is what Victor's "locale" codec is all about.  I think that's
     the wrong spelling for the feature, but there does need to be a way
     to express "don't bother me about Unicode" in most scripts for most
     people.  We don't have a decent boilerplate for that yet.

which I *paraphrased* as "I have text files that are mostly ASCII and I don't 
want to deal with Unicode yadda yadda yadda".

But in any case, I expressed myself poorly, and I'm sorry about that.

Regardless of who made the very first mention of the encoding problem in this 
thread, I think we should all be able to agree that laziness is *not* the only 
reason for having encoding problems. I thought I made it clear that I did not 
subscribe to that opinion.

-- 
Steven