Unicode [was Re: Cult-like behaviour]

Chris Angelico rosuav at gmail.com
Mon Jul 16 12:22:59 EDT 2018


On Tue, Jul 17, 2018 at 2:05 AM, Mark Lawrence <breamoreboy at gmail.com> wrote:
> On 16/07/18 15:17, Dan Sommers wrote:
>>
>> On Mon, 16 Jul 2018 10:39:49 +0000, Steven D'Aprano wrote:
>>
>>> ... people who think that if ISO-8859-7 was good enough for Jesus ...
>>
>>
>> It may have been good enough for his disciples, but Jesus spoke Aramaic.
>>
>> Also, ISO-8859-7 doesn't cover ancient polytonic Greek; it only covers
>> modern monotonic Greek.
>>
>> See also the Unicode Greek FAQ (https://www.unicode.org/faq/greek.html).
>>
>
> Out of curiosity where does my mum's Welsh come into the equation as I
> believe that it is not recognised by the EU as a language?
>

What characters does it use? Mostly Latin letters? If so, it's easy -
most Western European languages are covered by the basic Latin
alphabetics (the ASCII ones), plus the combining diacriticals (U+0300
and following), plus a small handful of language-specific characters
(eg U+0130/U+0131 for Turkish). There are combined forms of some of
these, which can be found via NFC normalization, and a few ligatures
for some languages, but by and large, that's all you need for most
Latin-derived languages.

ChrisA


More information about the Python-list mailing list