2018-01-26 12:17 GMT+01:00 INADA Naoki
No, because you can pass in maxchar to PyUnicode_New() and the implementation will take this as hint to the max code point used in the string. There is no check done whether maxchar is indeed the minimum upper bound to the code point ordinals.
API doc says:
""" maxchar should be the true maximum code point to be placed in the string. As an approximation, it can be rounded up to the nearest value in the sequence 127, 255, 65535, 1114111. """ https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_New
Since doc says *should*, strings created with wrong maxchar are considered invalid object.
PyUnicode objects must always use the most efficient storage. It's a very strong requirement of the PEP 393. As Naoki wrote, many functions rely on this assumption to implement fast-path. The assumption is even implemented in the debug check _PyUnicode_CheckConsistency(): https://github.com/python/cpython/blob/e76daebc0c8afa3981a4c5a8b54537f756e80... Victor