[Python-ideas] Py3 unicode impositions
Eric Snow
ericsnowcurrently at gmail.com
Sun Feb 12 04:10:22 CET 2012
On Sat, Feb 11, 2012 at 7:27 PM, Carl M. Johnson
<cmjohnson.mailinglist at gmail.com> wrote:
>
> On Feb 11, 2012, at 12:40 AM, Paul Moore wrote:
>
>> In Python 2, I can ignore the issue. Sure, I can end up with mojibake,
>> but for my uses, that's not a disaster. Mostly-readable works. But in
>> Python 3, I get an error and can't process the file.
>>
>> I can just use latin-1, or surrogateescape. But that doesn't come
>> naturally to me yet. Maybe it will in time... Or maybe there's a
>> better solution I don't know about yet.
>
> I'm confused what you're asking for. Setting errors to surrogateescape or encoding to Latin-1 causes Python 3 to behave the exact same way as Python 2: it's doing the "wrong" thing and may result in mojibake, but at least it isn't screwing up anything new so long as the stuff you add to the file is in ASCII. The only way to make Python 3 slightly more like Python 2 would be to set errors="surrogateescape" by default instead of asking the programmer to know to use it. I think that would be going too far, but it could be done. I think it would be simpler though to just publicize errors="surrogateescape" more.
>
> "Dear people who don't care about encodings and don't want to take the time to get them right, just put errors='surrogateescape' into your open commands and Python 3 will behave almost exactly like Python 2. The end."
So something like this:
import functools, builtins
open = builtins.open = functools.partial(open, encoding="ascii",
errors="surrogateescape")
-eric
More information about the Python-ideas
mailing list