[Python-ideas] Py3 unicode impositions

Jim Jewett jimjjewett at gmail.com
Sat Feb 11 00:33:42 CET 2012


On Fri, Feb 10, 2012 at 3:41 AM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Terry Reedy writes:
>
>  > > In python 2 there was no such a strong imposition [of Unicode
>  > > awareness on users].

>  > Nor is there in 3.x.

> Sorry, Terry, but you're basically wrong here.  True, if one sticks to
> pure ASCII, there's no difference to notice, but that's just not
> possible for people who live outside of the U.S., or who share text
> with people outside of the U.S.  They need currency symbols, they
> have friends whose names have little dots on them.  Every single
> one of those is a backtrace waiting to happen.  A backtrace on

>    f = open('text-file.txt')
>    for line in f: pass

> is an imposition.  That doesn't happen in 2.x (for the wrong reasons,
> but it's very convenient 95% of the time).

I may be missing something, but as best I can tell

(1)  That uses an implicit encoding of None.
(2)  encoding=None is documented as being platform-dependent.

Are you saying that some (many?  all?) platforms make a bad choice there?

Does that only happen when sys.getdefaultencoding() !=
sys.getfilesystemencoding(), or when one of them gives bad
information?  (FWIW, on a mostly ASCII windows machine, the default is
utf-8 but the filesystem encoding is mbcs, so merely being different
doesn't always provoke problems.)

Would it cause problems to make the default be whatever locale
returns, or whatever it returns the first time open is called?

-jJ



More information about the Python-ideas mailing list