[Python-ideas] Add "has_surrogates" flags to string object

Masklinn masklinn at masklinn.net
Tue Oct 8 13:38:19 CEST 2013


On 2013-10-08, at 13:17 , Serhiy Storchaka wrote:

> Here is an idea about adding a mark to PyUnicode object which allows fast answer to the question if a string has surrogate code. This mark has one of three possible states:
> 
> * String doesn't contain surrogates.
> * String contains surrogates.
> * It is still unknown.
> 
> We can combine this with "is_ascii" flag in 2-bit value:
> 
> * String is ASCII-only (and doesn't contain surrogates).
> * String is not ASCII-only and doesn't contain surrogates.
> * String is not ASCII-only and contains surrogates.
> * String is not ASCII-only and it is still unknown if it contains surrogate.

Isn't that redundant with the kind under shortest form representation?


More information about the Python-ideas mailing list