Thanks for writing this Petr! A few comments below. On Mon, Nov 01, 2021 at 01:17:02PM +0100, Petr Viktorin wrote:
ASCII-only Considerations -------------------------
ASCII is a subset of Unicode
While issues with the ASCII character set are generally well understood, the're presented here to help better understanding of the non-ASCII cases.
You should mention that some very common typefaces (fonts) are more confusable than others. For instance, Arial (a common font on Windows systems) makes the two letter combination 'rn' virtually indistinguishable from the single letter 'm'.
Before the age of computers, most mechanical typewriters lacked the keys for the digits ``0`` and ``1``
I'm not sure that "most" is justifed here. One of the most popular typewriters in history, the Underwood #5 (from 1900 to 1920), lacked the 1 key but had a 0 distinct from O. https://i1.wp.com/curiousasacathy.com/wp-content/uploads/2016/04/underwood-n... The Oliver 5 (1894 – 1928) had both a 0 and a 1, as did the 1895 Ford Typewriter. As did possibly the best selling typewriter in history, the IBM Selectric (introduced in 1961). http://www.technocrazed.com/the-interesting-history-of-evolution-of-typewrit... Perhaps you should say "many older mechanical typewriters"?
Bidirectional Text ------------------
The section on bidirectional text is interesting, because reading it in my email client mutt, all the examples are left to right. You might like to note that not all applications support bidirectional text.
Unicode includes alorithms to *normalize* variants like these to a single form, and Python identifiers are normalized.
Typo: "algorithms". This is a good and useful document, thank you again. -- Steve