[Python-ideas] Proposal for default character representation
Steven D'Aprano
steve at pearwood.info
Fri Oct 14 20:18:10 EDT 2016
On Fri, Oct 14, 2016 at 07:56:29AM -0400, Random832 wrote:
> On Fri, Oct 14, 2016, at 01:54, Steven D'Aprano wrote:
> > Good luck with that last one. Even if you could convince the Chinese and
> > Japanese to swap to ASCII, I'd like to see you pry the emoji out of the
> > young folk's phones.
>
> This is actually probably the one part of this proposal that *is*
> feasible. While encoding emoji as a single character each makes sense
> for a culture that already uses thousands of characters; before they
> existed the English-speaking software industry already had several
> competing "standards" emerging for encoding them as sequences of ASCII
> characters.
It really isn't feasible to use emoticons instead of emoji, not if
you're serious about it. To put it bluntly, emoticons are amateur hour.
Emoji implemented as dedicated code points are what professionals use.
Why do you think phone manufacturers are standardising on dedicated code
points instead of using emoticons?
Anyone who has every posted (say) source code on IRC, Usenet, email or
many web forums has probably seen unexpected smileys in the middle of
their code (false positives). That's because some sequence of characters
is being wrongly interpreted as an emoticon by the client software.
The more emoticons you support, the greater the chance this will
happen. A concrete example: bash code in Pidgin (IRC) will often show
unwanted smileys.
The quality of applications can vary greatly: once the false emoticon is
displayed as a graphic, you may not be able to copy the source code
containing the graphic and paste it into a text editor unchanged.
There are false negatives as well as false positives: if your :-)
happens to fall on the boundary of a line, and your software breaks the
sequence with a soft line break, instead of seeing the smiley face you
expected, you might see a line ending with :- and a new line starting
with ).
It's hard to use punctuation or brackets around emoticons without
risking them being misinterpreted as an invalid or different sequence.
If you are serious about offering smileys, snowmen and piles of poo to
your users, you are much better off supporting real emoji (dedicated
Unicode characters) instead of emoticons. It is much easier to support ☺
than :-) and you don't need any special software apart from fonts that
support the emoji you care about.
--
Steve
More information about the Python-ideas
mailing list