[Python-ideas] Proposal for default character representation

Chris Angelico rosuav at gmail.com
Fri Oct 14 04:26:21 EDT 2016


On Fri, Oct 14, 2016 at 7:18 PM, Cory Benfield <cory at lukasa.co.uk> wrote:
> The many glyphs that exist for writing various human languages are not inefficiency to be optimised away. Further, I should note that most places to not legislate about what character sets are acceptable to transcribe their languages. Indeed, plenty of non-romance-language-speakers have found ways to transcribe their languages of choice into the limited 8-bit character sets that the Anglophone world propagated: take a look at Arabish for the best kind of example of this behaviour, where "الجو عامل ايه النهارده فى إسكندرية؟" will get rendered as "el gaw 3amel eh elnaharda f eskendereya?”
>

I've worked with transliterations enough to have built myself a
dedicated translit tool. It's pretty straight-forward to come up with
something you can type on a US-English keyboard (eg "a\o" for "å", and
"d\-" for "đ"), and in some cases, it helps with visual/audio
synchronization, but nobody would ever claim that it's the best way to
represent that language.

https://github.com/Rosuav/LetItTrans/blob/master/25%20languages.srt

> But I think you’re in a tiny minority of people who believe that all languages should be rendered in the same script. I can think of only two reasons to argue for this:
>
> 1. Dealing with lots of scripts is technologically tricky and it would be better if we didn’t bother. This is the anti-Unicode argument, and it’s a weak argument, though it has the advantage of being internally consistent.
> 2. There is some genuine harm caused by learning non-ASCII scripts.

#1 does carry a decent bit of weight, but only if you start with the
assumption that "characters are bytes". If you once shed that
assumption (and the related assumption that "characters are 16-bit
numbers"), the only weight it carries is "right-to-left text is
hard"... and let's face it, that *is* hard, but there are far, far
harder problems in computing.

Oh wait. Naming things. In Hebrew.

That's hard.

ChrisA


More information about the Python-ideas mailing list