[issue1602] windows console doesn't print or input Unicode

STINNER Victor report at bugs.python.org
Mon Mar 21 15:25:20 CET 2011


STINNER Victor <victor.stinner at haypocalc.com> added the comment:

I did some tests with WriteConsoleW():
 - with raster fonts, U+00E9 is displayed as é, U+0141 as L and U+042D as ? => good (work as expected)
 - with TrueType font (Lucida), U+00E9 is displayed as é, U+0141 as Ł and U+042D as Э => perfect! (all characters are rendered correctly)

Now I agree that WriteConsoleW() is the best solution to fix this issue.

My test code (added to Python/sysmodule.c):
---------
static PyObject *
sys_write_stdout(PyObject *self, PyObject *args)
{
    PyObject *textobj;
    wchar_t *text;
    DWORD written, total;
    Py_ssize_t len, chunk;
    HANDLE console;
    BOOL ok;

    if (!PyArg_ParseTuple(args, "U:write_stdout", &textobj))
        return NULL;

    console = GetStdHandle(STD_OUTPUT_HANDLE);
    if (console == INVALID_HANDLE_VALUE) {
        PyErr_SetFromWindowsErr(GetLastError());
        return NULL;
    }

    text = PyUnicode_AS_UNICODE(textobj);
    len = PyUnicode_GET_SIZE(textobj);
    total = 0;
    while (len != 0) {
        if (len > 10000)
            /* WriteConsoleW() is limited to 64 KB (32,768 UTF-16 units), but
               this limit depends on the heap usage. Use a safe limit of 10,000
               UTF-16 units.
               http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1232 */
            chunk = 10000;
        else
            chunk = len;
        ok = WriteConsoleW(console, text, chunk, &written, NULL);
        if (!ok) 
            break;
        text += written;
        len -= written;
        total += written;
    }
    return PyLong_FromUnsignedLong(total);
}
---------


The question is now how to integrate WriteConsoleW() into Python without breaking the API, for example:
 - Should sys.stdout be a TextIOWrapper or not?
 - Should sys.stdout.fileno() returns 1 or raise an error?
 - What about sys.stdout.buffer: should sys.stdout.buffer.write() calls WriteConsoleA() or sys.stdout should not have a buffer attribute? I think that many modules and programs now rely on sys.stdout.buffer to write directly bytes into stdout. There is at least python -m base64.
 - Should we use ReadConsoleW() for stdin?

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue1602>
_______________________________________


More information about the Python-bugs-list mailing list