[Python-ideas] Python 3000 TIOBE -3%

Sat Feb 11 13:53:40 CET 2012

Masklinn, 11.02.2012 13:41:
> On 2012-02-11, at 13:33 , Stefan Behnel wrote:
>> Paul Moore, 11.02.2012 11:47:
>>> On 11 February 2012 00:07, Terry Reedy wrote:
>>>>>> Nor is there in 3.x.
>>>>
>>>> I view that claim as FUD, at least for many users, and at least until the
>>>> persons making the claim demonstrate it. In particular, I claim that people
>>>> who use Python2 knowing nothing of unicode do not need to know much more to
>>>> do the same things in Python3.
>>>
>>> Concrete example, then.
>>>
>>> I have a text file, in an unknown encoding (yes, it does happen to
>>> me!) but opening in an editor shows it's mainly-ASCII. I want to find
>>> all the lines starting with a '*'. The simple
>>>
>>> with open('myfile.txt') as f:
>>>    for line in f:
>>>        if line.startswith('*'):
>>>            print(line)
>>>
>>> fails with encoding errors. What do I do? Short answer, grumble and go
>>> and use grep (or in more complex cases, awk) :-(
>>
>> Or just use the ISO-8859-1 encoding.
> 
> It's true that requires to handle encodings upfront where Python 2 allowed you
> to play fast-and-lose though.

Well, except for the cases where that didn't work. Remember that implicit
encoding behaves in a platform dependent way in Python 2, so even if your
code runs on your machine doesn't mean it will work for anyone else.

> And using latin-1 in that context looks and feels weird/icky, the file is not
> encoded using latin-1, the encoding just happens to work to manipulate bytes as
> ascii text + non-ascii stuff.

Correct. That's precisely the use case described above.

Besides, it's perfectly possible to process bytes in Python 3. You just
have to open the file in binary mode and do the processing at the byte
string level. But if you don't care (and if most of the data is really
ASCII-ish), using the ISO-8859-1 encoding in and out will work just fine
for problems like the above.

Stefan