[Python-Dev] GSoC: Replace MS Windows Console with Unicode UI
Glenn Linderman
v+python at g.nevcal.com
Wed Mar 25 01:02:30 CET 2009
On approximately 3/24/2009 10:16 AM, came the following characters from
the keyboard of INADA Naoki:
> Hi. I'm Japanese and non-ascii charactor user. (cp932)
>
> We have to use "IME" to input non-ascii charactor in Windows.
> When "> chcp 65001" in cmd.exe, we cannot use IME on cmd.exe.
>
> So setting codepage to 65001 make output universal but make input ascii-only.
> Sit!!!
>
> I hope PyQtShell <http://code.google.com/p/pyqtshell/> become good
> IDLE alternative.
Thanks for the feedback.
So at least one version of the code I posted shows that
programmatically, the code page can be set differently for input and
output, although the last version brought both to 65001. It seems that
the chcp 65001 always does both. If the IME only works for cp932, then
leave input at cp932, and set output to 65001?
I have no idea if that could be a solution for you, but I would be
interested in your results if you find that it is, or isn't, as that
would add to the collective knowledge base about the subject. This is
idea 2, below, where I tried to cover the solution space more broadly.
Looking briefly at the definition of cp932, it seems that it covers most
of the Unicode characters... so perhaps any or several of the following
could happen:
1) the IME could be converted to produce UTF-8 instead of cp932,
allowing use of 65001 for input and output
2) the split code page could be used to avoid the conversion of Unicode
to cp932 for output.
3) Unicode could be converted to cp932 for output, allowing use of cp932
for both input and output.
These are listed in the order of increased overhead for character handling.
Perhaps you could enlighten us all as to the issues with each of these
ideas.
I realize the IME exists today, and is likely coded to use cp932, and
that it would take some work to convert it to produce Unicode. However,
there seems to be a straightforward conversion chart between cp932 and
Unicode at Wikipedia, so perhaps that isn't a huge effort.
It seems that the long term goal of having all software speak Unicode
would increase the efficiency of all software when dealing with
multi-lingual issues, as a common solution can be applied universally,
rather than re-inventing solutions that only work for particular code pages.
But I'm not fully aware of whether or not the design or implementation
of Unicode precludes universal solutions: I have heard rumors that
certain characters must be interpreted differently in different locale
contexts, which seems to be counter to the "one solution fits all"
possibility.
--
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
More information about the Python-Dev
mailing list