Unicode Support in Ruby, Perl, Python, Emacs Lisp

Sean McAfee eefacm at gmail.com
Sat Oct 9 18:45:42 EDT 2010


Xah Lee <xahlee at gmail.com> writes:
> Perl's exceedingly lousy unicode support hack is well known. In fact
> it is the primary reason i “switched” to python for my scripting needs
> in 2005. (See: Unicode in Perl and Python)

I think your assessment is antiquated.  I've been doing Unicode
programming with Perl for about three years, and it's generally quite
wonderfully transparent.

On the programmers' web site stackoverflow.com, I flag questions with
the "unicode" tag, and of questions that mention a specific language,
Python and C++ seem to come up the most often.

> I'll have to say, as far as text processing goes, the most beautiful
> lang with respect to unicode is emacs lisp. In elisp code (e.g.
> Generate a Web Links Report with Emacs Lisp ), i don't have to declare
> none of the unicode or encoding stuff. I simply write code to process
> string or buffer text, without even having to know what encoding it
> is. Emacs the environment takes care of all that.

It's not quite perfect, though.  I recently discovered that if I enter a
Chinese character using my Mac's Chinese input method, and then enter
the same character using a Japanese input method, Emacs regards them as
different characters, even though they have the same Unicode code point.
For example, from describe-char:

  character: 一 (43323, #o124473, #xa93b, U+4E00)
  character: 一 (55404, #o154154, #xd86c, U+4E00)

On saving and reverting a file containing such text, the characters are
"normalized" to the Japanese version.
  
I suppose this might conceivably be the correct behavior, but it sure
was a surprise that (equal "一" "一") can be nil.



More information about the Python-list mailing list