Newbie question about text encoding

Chris Angelico rosuav at gmail.com
Fri Mar 6 08:39:26 EST 2015


On Sat, Mar 7, 2015 at 12:33 AM,  <random832 at fastmail.us> wrote:
> However, when do you _really_ want the number of characters? You may
> want to use it for, for example, the number of columns in a 'monospace'
> font, which you've already screwed up because you haven't accounted for
> double-wide characters or combining marks. Or you may want the position
> that pressing an arrow key or backspace or forward-delete a number of
> times will reach, which has its own rules in e.g. Indic languages (and
> also fails on Latin with combining marks).

Number of code points is the most logical way to length-limit
something. If you want to allow users to set their display names but
not to make arbitrarily long ones, limiting them to X code points is
the safest way (and preferably do an NFC or NFD normalization before
counting, for consistency); this means you disallow pathological cases
where every base character has innumerable combining marks added.

ChrisA



More information about the Python-list mailing list