On 26.01.2018 12:17, INADA Naoki wrote:
No, because you can pass in maxchar to PyUnicode_New() and the implementation will take this as hint to the max code point used in the string. There is no check done whether maxchar is indeed the minimum upper bound to the code point ordinals.
API doc says:
""" maxchar should be the true maximum code point to be placed in the string. As an approximation, it can be rounded up to the nearest value in the sequence 127, 255, 65535, 1114111. """ https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_New
Since doc says *should*, strings created with wrong maxchar are considered invalid object.
Not really: "should" means should, not must :-) Objects created with PyUnicode_New() are valid and ready (this only has a meaning for legacy strings). You can set maxchar to 64k and still just use ASCII as content. In some cases, you may want the internal string representation to be wchar_t compatible or work with Py_UCS2/4, so both 64k and sys.maxunicode are reasonable and valid values. Overall, I'm starting to believe that a str.maxchar() function would be a better choice than to only go for ASCII. This could have an optional parameter "exact" to force scanning the string and returning the actual max code point ordinal when set to True (default), or return the approximation based on the used kind if not set (which is many cases, will give you a good hint). For checking ASCII, you'd then write: def isascii(s): if s.maxchar(exact=False) < 128: return True if s.maxchar() < 128: return True return False -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jan 26 2018)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ Python Database Interfaces ... http://products.egenix.com/ Plone/Zope Database Interfaces ... http://zope.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/