[Tutor] UnicodeDecodeError while parsing a .csv file.
eryksun
eryksun at gmail.com
Tue Oct 29 14:35:42 CET 2013
On Tue, Oct 29, 2013 at 6:33 AM, Albert-Jan Roskam <fomcl at yahoo.com> wrote:
> Why is do_setlocale=False here? Actually, what does this parameter do?
> It seems strange that a getter function has a 'set' argument.
On Windows, getpreferredencoding doesn't use setlocale. It calls
WinAPI GetACP to fetch the ANSI codepage.
On POSIX, calling setlocale(LC_CTYPE, "") ensures the locale is
initialized from the LC_* and LANG environment variables.
getpreferredencoding saves and restores the previous locale (e.g.
"C"); it isn't set permanently.
For example, in 2.x on my Debian system, calling
locale.getpreferredencoding(False) returns "ANSI_X3.4-1968" (i.e. the
"C" locale uses ASCII), while locale.getpreferredencoding(True)
returns "UTF-8".
setlocale isn't guaranteed to be thread safe, so 3.x io.TextIOWrapper
uses do_setlocale=False. That's OK; Py_Initialize in CPython 3.x has
already called setlocale(LC_CTYPE, ""). On the other hand, 2.x
io.TextIOWrapper has to use do_setlocale=True, as demonstrated by the
previous example.
> Other remark: I have not read this entire thread, but I was thinking the
> OP might use codecs.open to open the file in the correct encoding.
The OP is using 3.2.
For some time there's been talk of deprecating codecs.open in 3.x
(even the Stream* classes); see issue 8796 or PEP 400. It's still in
3.4, however.
More information about the Tutor
mailing list