[Python-Dev] PEP 393 Summer of Code Project
Stefan Behnel
stefan_ml at behnel.de
Tue Aug 23 14:14:39 CEST 2011
Torsten Becker, 22.08.2011 20:58:
> I have implemented an initial version of PEP 393 -- "Flexible String
> Representation" as part of my Google Summer of Code project. My patch
> is hosted as a repository on bitbucket [1] and I created a related
> issue on the bug tracker [2]. I posted documentation for the current
> state of the development in the wiki [3].
One thing that occurred to me regarding the object struct:
typedef struct {
PyObject_HEAD
Py_ssize_t length; /* Number of code points in the string */
void *str; /* Canonical, smallest-form Unicode buffer */
Py_hash_t hash; /* Hash value; -1 if not set */
int state; /* != 0 if interned. In this case the two
* references from the dictionary to this
* object are *not* counted in ob_refcnt.
* See SSTATE_KIND_* for other bits */
Py_ssize_t utf8_length; /* Number of bytes in utf8, excluding the
* terminating \0. */
char *utf8; /* UTF-8 representation (null-terminated) */
Py_ssize_t wstr_length; /* Number of code points in wstr, possible
* surrogates count as two code points. */
wchar_t *wstr; /* wchar_t representation (null-terminated) */
} PyUnicodeObject;
Wouldn't the "normal" approach be to use a union for the str field? I.e.
union str {
unsigned char* latin1;
Py_UCS2* ucs2;
Py_UCS4* ucs4;
}
Given that they're all pointers, all fields have the same size, but I find
it more readable to write
u.str.latin1
than
((const unsigned char*)u.str)
Plus, the three types would be given by the struct, rather than by a
per-usage cast.
Has this been considered before? Was there a reason to decide against it?
Stefan
More information about the Python-Dev
mailing list