[Python-ideas] Processing surrogates in

Thu May 7 07:55:07 CEST 2015

On 7 May 2015 at 15:27, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 7 May 2015 at 11:41, Rob Cliffe <rob.cliffe at btinternet.com> wrote:
>> Or is there really some fundamental reason why things can't be simpler?
>> (Like, REALLY, REALLY simple?)
>
> Yep, there are around 7 billion fundamental reasons currently alive,
> and I have no idea how many that have gone before us: humans :)

Heh, a message from Stephen off-list made me realise that an info dump
of all the reasons the edge cases are hard probably wasn't a good way
to answer your question :)

What "we're" working towards (where "we" ~= the Unicode consortium +
operating system designers + programming language designers) is a
world where everything "just works", and computers talk to humans in
each human's preferred language (or a collection of languages,
depending on what the human is doing), and to each other in Unicode.
There are then a whole host of technical and political reasons why
it's taking decades to get from the historical point A (where
computers talk to humans in at most one language at a time, and don't
talk to each other at all) to that desired point B.

We'll know we're done with that transition when Unicode becomes almost
transparently invisible, and the vast majority of programmers are once
again able to just deal with "text" without worrying too much about
how it's represented internally (but also having their programs be
readily usable in language's other than their own).

Python 3 is already a lot closer to that ideal than Python 2 was, but
there are still some rough edges to iron out. The ones I'm personally
aware of affecting 3.4+ (including the one Serhiy started this thread
about) are listed as dependencies of http://bugs.python.org/issue22555

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia