[Python-3000] How will unicode get used?
Marcin 'Qrczak' Kowalczyk
qrczak at knm.org.pl
Thu Sep 21 00:34:40 CEST 2006
Josiah Carlson <jcarlson at uci.edu> writes:
> Regardless of our choice, *some platform* is going to be angry. Why?
> GTK takes utf-8 encoded strings. (I don't know what Qt or linux system
> calls take) Windows takes utf-16.
The representation of QChar in Qt-3.3.5:
ushort ucs;
#if defined(QT_QSTRING_UCS_4)
ushort grp;
#endif
The representation of QStringData in Qt-3.3.5:
QChar *unicode;
char *ascii;
#ifdef Q_OS_MAC9
uint len;
#else
uint len : 30;
#endif
uint issimpletext : 1;
#ifdef Q_OS_MAC9
uint maxl;
#else
uint maxl : 30;
#endif
uint islatin1 : 1;
I would say that it's silly. It seems a transition from UCS-2 to UCS-4
in Qt is incomplete. Almost no code is prepared for QT_QSTRING_UCS_4.
For example the implementation of a function which explains what
issimpletext means:
void QString::checkSimpleText() const
{
QChar *p = d->unicode;
QChar *end = p + d->len;
while ( p < end ) {
ushort uc = p->unicode();
// sort out regions of complex text formatting
if ( uc > 0x058f && ( uc < 0x1100 || uc > 0xfb0f ) ) {
d->issimpletext = FALSE;
return;
}
p++;
}
d->issimpletext = TRUE;
}
QChar documentation says:
Unicode characters are (so far) 16-bit entities without any markup or
structure. This class represents such an entity. It is lightweight, so
it can be used everywhere. Most compilers treat it like a "short int".
(In a few years it may be necessary to make QChar 32-bit when more
than 65536 Unicode code points have been defined and come into use.)
Bleh...
--
__("< Marcin Kowalczyk
\__/ qrczak at knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/
More information about the Python-3000
mailing list