[Python-ideas] Py3 unicode impositions
Steven D'Aprano
steve at pearwood.info
Sun Feb 12 06:26:24 CET 2012
Nick Coghlan wrote:
> On Sun, Feb 12, 2012 at 1:19 PM, Carl M. Johnson
> <cmjohnson.mailinglist at gmail.com> wrote:
>> On Feb 11, 2012, at 5:10 PM, Eric Snow wrote:
>>
>>> So something like this:
>>>
>>> import functools, builtins
>>> open = builtins.open = functools.partial(open, encoding="ascii",
>>> errors="surrogateescape")
>>
>> We could pack it in and call it something like "python2open". :-)
>
> An open_ascii() builtin isn't as crazy as it may initially sound -
> it's not at all uncommon to have a file that's almost certainly in
> some ASCII compatible encoding like utf-8, latin-1 or one of the other
> extended ASCII encodings, but you don't know which one specifically.
To me, "open_ascii" suggests either:
- it opens ASCII files, and raises an error if they are not ASCII; or
- it opens non-ASCII files, and magically translates their content to ASCII
using some variant of "The Unicode Hammer" recipe:
http://code.activestate.com/recipes/251871-latin1-to-ascii-the-unicode-hammer/
We should not be discouraging developers from learning even the most trivial
basics of Unicode. I'm not suggesting that we try to force people to become
Unicode experts (they wouldn't, even if we tried) but making this a built-in
is dumbing things down too much. I don't believe that it is an imposition for
people to explicitly use open(filename, 'ascii', 'surrogateescape') if that's
what they want.
If they want open_ascii, let them define this at the top of their modules:
open_ascii = (lambda name:
open(name, encoding='ascii', errors='surrogateescape'))
A one liner, if you don't mind long lines.
I'm not entirely happy with the surrogateescape solution, but I can see it's
possibly the least worst *simple* solution for the case where you don't know
the source encoding. (Encoding guessing heuristics are awesome but hardly
simple.) So put the recipe in the FAQs, in the docs, and the docstring for
open[1], and let people copy and paste the recipe. That's a pretty gentle
introduction to Unicode.
[1] Which is awfully big and complex in Python 3.1, but that's another story.
--
Steven
More information about the Python-ideas
mailing list