[Python-3000] locale-aware strings ?

Guido van Rossum guido at python.org
Mon Sep 4 04:11:02 CEST 2006

On 9/3/06, Jim Jewett <jimjjewett at gmail.com> wrote:
> On 9/1/06, Nick Coghlan <ncoghlan at gmail.com> wrote:
> > Fredrik Lundh wrote:
> > > today's Python supports "locale aware" 8-bit strings ...
> > > to what extent should this be supported by Python 3000 ?
> > Since all strings will be Unicode by then:
> >  >>> u"åäö".isalpha()
> > True
> Two followup questions, then ...
> (1)  To what extent should python support files (including stdin,
> stdout) in local (non-unicode) encodings?  (not at all, per-file,
> settable global default?)

I've always said (can someone find a quote perhaps?) that there ought
to be a sensible default encoding for files (including but not limited
to stdin/out/err), perhaps influenced by personalized settings,
environment variables, the OS, etc.

> (2)  To what extent will strings have an opaque (or at least
> on-demand) backing store, so that decoding/encoding could be delayed?
> (For example, Swedish text could be stored in single-byte characters,
> and only converted to standard unicode on the rare occasions when it
> met strings in an incompatible encoding.)

That seems to be a bit of a leading question. Talin is currently
championing strings with different fixed-width storage, and others
have proposed even more flexible "polymorphic strings". You might want
to learn about the NSString type on Apple's ObjectiveC.

BTW the term "backing store" is typically used for *disk-based*
storage of large amounts of data -- but (despite that your first
question is about files) I don't believe this what you're referring

--Guido van Rossum (home page: http://www.python.org/~guido/)

More information about the Python-3000 mailing list