[New-bugs-announce] [issue20049] string.lowercase and string.uppercase can contain garbage

Alexander Pyhalov report at bugs.python.org
Sat Dec 21 22:38:37 CET 2013


New submission from Alexander Pyhalov:

When Python 2.6 (or 2.7) compiled with _XOPEN_SOURCE=600 on illumos  string.lowercase and string.uppercase contain garbage when UTF-8 locale is used. 
(OpenIndiana bug report - https://www.illumos.org/issues/4411 ).
The reason is that with UTF-8 locale islower()/isupper() and similar functions are not expected to work with non-ascii symbols. 
So, code like 

    n = 0;
    for (c = 0; c < 256; c++) {
        if (islower(c))
            buf[n++] = c;
    }

is expected to fail, because it calls islower on illegal UTF-8 symbols (with codes 128-255). It should be converted to something like

    n = 0;
    for (c = 0; c < 256; c++) {
        if (isascii(c) && islower(c))
            buf[n++] = c;
    }

or to 

    n = 0;
    for (c = 0; c < 128; c++) {
        if (islower(c))
            buf[n++] = c;
    }

Before doing this you should check if locale is UTF-8. However, almost all non-C locales on illumos are UTF-8. 


Example of incorrect behavior: 

Python 2.6.9 (unknown, Nov 12 2013, 13:54:48) 
[GCC 4.7.3] on sunos5
Type "help", "copyright", "credits" or "license" for more information.
>>> import string
>>> string.lowercase
'abcdefghijklmnopqrstuvwxyz\\xaa\\xb5\\xba\\xdf\\xe0\\xe1\\xe2\\xe3\\xe4\\xe5\\xe6\\xe7\\xe8\\xe9\\xea\\xeb\\xec\\xed\\xee\\xef\\xf0\\xf1\\xf2\\xf3\\xf4\\xf5\\xf6\\xf8\\xf9\\xfa\\xfb\\xfc\\xfd\\xfe\\xff'
>>> string.uppercase
'ABCDEFGHIJKLMNOPQRSTUVWXYZ\\xc0\\xc1\\xc2\\xc3\\xc4\\xc5\\xc6\\xc7\\xc8\\xc9\\xca\\xcb\\xcc\\xcd\\xce\\xcf\\xd0\\xd1\\xd2\\xd3\\xd4\\xd5\\xd6\\xd8\\xd9\\xda\\xdb\\xdc\\xdd\\xde'
>>>

----------
components: Unicode
messages: 206786
nosy: Alexander.Pyhalov, ezio.melotti, haypo
priority: normal
severity: normal
status: open
title: string.lowercase and string.uppercase can contain garbage
type: behavior
versions: Python 2.7

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue20049>
_______________________________________


More information about the New-bugs-announce mailing list