[Python-ideas] Python 3000 TIOBE -3%

Sun Feb 12 04:09:58 CET 2012

2012/2/11 Paul Moore <p.f.moore at gmail.com>

> On 11 February 2012 00:07, Terry Reedy <tjreedy at udel.edu> wrote:
> >>>  Nor is there in 3.x.
> >
> > I view that claim as FUD, at least for many users, and at least until the
> > persons making the claim demonstrate it. In particular, I claim that
> people
> > who use Python2 knowing nothing of unicode do not need to know much more
> to
> > do the same things in Python3.
>
> Concrete example, then.
>
> I have a text file, in an unknown encoding (yes, it does happen to
> me!) but opening in an editor shows it's mainly-ASCII. I want to find
> all the lines starting with a '*'. The simple
>
> with open('myfile.txt') as f:
>    for line in f:
>        if line.startswith('*'):
>            print(line)
>
> fails with encoding errors. What do I do? Short answer, grumble and go
> and use grep (or in more complex cases, awk) :-(
>
> Paul.

I just look at the Python 3 documentation (
http://docs.python.org/release/3.1.3/library/functions.html#open), there is
a "error" parameter to the open function. when set to "ignore" or "replace"
it will solved your problem.

Another way is to try to guess the encoding programaticaly (I found chardet
module http://pypi.python.org/pypi/chardet) and pass it to decode your file
with unknown encoding.

Then why not put a value "auto" available for "encoding" parameter which
makes "open" call a detector before opening and throw error when the guess
is less than a certain percentage.

Gabriel AHTUNE
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20120212/fe778199/attachment.html>