[Python-3000] How will unicode get used?

Marcin 'Qrczak' Kowalczyk qrczak at knm.org.pl
Thu Sep 21 00:34:40 CEST 2006


Josiah Carlson <jcarlson at uci.edu> writes:

> Regardless of our choice, *some platform* is going to be angry.  Why? 
> GTK takes utf-8 encoded strings.  (I don't know what Qt or linux system
> calls take) Windows takes utf-16.

The representation of QChar in Qt-3.3.5:

    ushort ucs;
#if defined(QT_QSTRING_UCS_4)
    ushort grp;
#endif

The representation of QStringData in Qt-3.3.5:

    QChar *unicode;
    char *ascii;
#ifdef Q_OS_MAC9
    uint len;
#else
    uint len : 30;
#endif
    uint issimpletext : 1;
#ifdef Q_OS_MAC9
    uint maxl;
#else
    uint maxl : 30;
#endif
   uint islatin1 : 1;

I would say that it's silly. It seems a transition from UCS-2 to UCS-4
in Qt is incomplete. Almost no code is prepared for QT_QSTRING_UCS_4.
For example the implementation of a function which explains what
issimpletext means:

void QString::checkSimpleText() const
{
    QChar *p = d->unicode;
    QChar *end = p + d->len;
    while ( p < end ) {
        ushort uc = p->unicode();
        // sort out regions of complex text formatting
        if ( uc > 0x058f && ( uc < 0x1100 || uc > 0xfb0f ) ) {
            d->issimpletext = FALSE;
            return;
        }
        p++;
    }
    d->issimpletext = TRUE;
}

QChar documentation says:

   Unicode  characters are (so far) 16-bit entities without any markup or
   structure. This class represents such an entity. It is lightweight, so
   it can be used everywhere. Most compilers treat it like a "short int".
   (In  a  few  years  it may be necessary to make QChar 32-bit when more
   than 65536 Unicode code points have been defined and come into use.)

Bleh...

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/


More information about the Python-3000 mailing list