Unicode Support in Ruby, Perl, Python, Emacs Lisp
eefacm at gmail.com
Sun Oct 10 00:45:42 CEST 2010
Xah Lee <xahlee at gmail.com> writes:
> Perl's exceedingly lousy unicode support hack is well known. In fact
> it is the primary reason i “switched” to python for my scripting needs
> in 2005. (See: Unicode in Perl and Python)
I think your assessment is antiquated. I've been doing Unicode
programming with Perl for about three years, and it's generally quite
On the programmers' web site stackoverflow.com, I flag questions with
the "unicode" tag, and of questions that mention a specific language,
Python and C++ seem to come up the most often.
> I'll have to say, as far as text processing goes, the most beautiful
> lang with respect to unicode is emacs lisp. In elisp code (e.g.
> Generate a Web Links Report with Emacs Lisp ), i don't have to declare
> none of the unicode or encoding stuff. I simply write code to process
> string or buffer text, without even having to know what encoding it
> is. Emacs the environment takes care of all that.
It's not quite perfect, though. I recently discovered that if I enter a
Chinese character using my Mac's Chinese input method, and then enter
the same character using a Japanese input method, Emacs regards them as
different characters, even though they have the same Unicode code point.
For example, from describe-char:
character: 一 (43323, #o124473, #xa93b, U+4E00)
character: 一 (55404, #o154154, #xd86c, U+4E00)
On saving and reverting a file containing such text, the characters are
"normalized" to the Japanese version.
I suppose this might conceivably be the correct behavior, but it sure
was a surprise that (equal "一" "一") can be nil.
More information about the Python-list