convert Unicode to lower/uppercase?
Peter Otten
__peter__ at web.de
Tue Sep 23 13:24:36 EDT 2003
jallan wrote:
> I don't see any particular reason why Python "cannot handle case
> mappings that increase string lengths".
Now that's a long post. I think it essentially boils down to the above
statement.
Looking into stringobject.c (judging from a first impression,
unicodeobject.c has essentially the same algorithm, but with a few
indirections):
static PyObject *
string_upper(PyStringObject *self)
{
char *s = PyString_AS_STRING(self), *s_new;
int i, n = PyString_GET_SIZE(self);
PyObject *new;
new = PyString_FromStringAndSize(NULL, n);
if (new == NULL)
return NULL;
s_new = PyString_AsString(new);
for (i = 0; i < n; i++) {
int c = Py_CHARMASK(*s++);
if (islower(c)) {
*s_new = toupper(c);
} else
*s_new = c;
s_new++;
}
return new;
}
The whole routine builds on the assumption that len(s) == len(s.upper()) and
nothing short of a complete rewrite will fix that. But if you volunteer...
Personally, I think it's a long way to go for a little s, sharp as it may be
:-)
Peter
More information about the Python-list
mailing list