[I18n-sig] encoding support for Docutils: please review
Martin v. Loewis
martin@v.loewis.de
29 Jun 2002 21:43:01 +0200
David Goodger <goodger@users.sourceforge.net> writes:
> - Try the encoding specified by a command-line option, if any.
>
> - Try the locale's encoding.
>
> - Try UTF-8.
>
> - Try platform-specific encodings: CP-1252 on Windows, Mac-Roman on
> MacOS, perhaps Latin-9 (iso-8859-15) otherwise.
>
> Does this look right, or am I missing something?
I'd reorder this: (try command line). Try ASCII first, then UTF-8. If
ASCII passes, it most likely is ASCII. If not, and UTF-8 passes, it
most likely is UTF-8. Then try the locale's encoding.
> - Does the application have to call
> ``locale.setlocale(locale.LC_ALL, '')``, and if so, where? Is it OK
> to call setlocale from within the decoding function, or should it be
> left up to the client application?
Atleast on Solaris, you need this to get nl_langinfo to work correctly.
> - Should I use the result of ``locale.getlocale()``? On
> Win2K/Python2.2.1, I get this::
>
> >>> import locale
> >>> locale.getlocale()
> (None, None)
> >>> locale.getdefaultlocale()
> ('en_US', 'cp1252')
>
> Looks good so far.
No; this is broken beyond repair. On Unix, try nl_langinfo(CODESET)
(requires Python 2.2). On Windows, try _getdefaultlocale. If either
fails, you may then fall-back to getlocale, but expect it to fail with
exceptions, and to err.
> How can I use ``locale.getlocale()`` when it doesn't return a
> known encoding? Or put another way, how can I get a known
> encoding out of ``locale.getlocale()``?
[Don't use getlocale]. If nl_langinfo gives an unknown codeset,
produce a warning message, asking the user to report that as a
bug. Keep a list of additional aliases for codesets that occur in the
wild and are aliases to known codecs, also keep a list of known
unsupported codesets (again, restrict yourself to those occurring in
the wild).
> - Does ``locale.getdefaultlocale()[1]`` reliably produce the
> platform-specific encoding?
No.
Regards,
Martin