[MAL] > BTW, wouldn't it be possible to take pcre and have it > use Py_Unicode instead of char ? [Of course, there would have to > be some extensions for character classes etc.] No, alas. The assumption that characters are 8 bits is ubiquitous, in both obvious and subtle ways. if ((start_bits[c/8] & (1 << (c&7))) == 0) start_match++; else break;