2012/2/11 Paul Moore <p.f.moore@gmail.com>
On 11 February 2012 00:07, Terry Reedy <tjreedy@udel.edu> wrote:
>>>  Nor is there in 3.x.
>
> I view that claim as FUD, at least for many users, and at least until the
> persons making the claim demonstrate it. In particular, I claim that people
> who use Python2 knowing nothing of unicode do not need to know much more to
> do the same things in Python3.

Concrete example, then.

I have a text file, in an unknown encoding (yes, it does happen to
me!) but opening in an editor shows it's mainly-ASCII. I want to find
all the lines starting with a '*'. The simple

with open('myfile.txt') as f:
   for line in f:
       if line.startswith('*'):
           print(line)

fails with encoding errors. What do I do? Short answer, grumble and go
and use grep (or in more complex cases, awk) :-(

Paul.


I just look at the Python 3 documentation (http://docs.python.org/release/3.1.3/library/functions.html#open), there is a "error" parameter to the open function. when set to "ignore" or "replace" it will solved your problem.

Another way is to try to guess the encoding programaticaly (I found chardet module http://pypi.python.org/pypi/chardet) and pass it to decode your file with unknown encoding.

Then why not put a value "auto" available for "encoding" parameter which makes "open" call a detector before opening and throw error when the guess is less than a certain percentage.

Gabriel AHTUNE