Adding str.isascii() ?

Hi. Currently, int(), str.isdigit(), str.isalnum(), etc... accepts non-ASCII strings.
But sometimes, we want to accept only ascii string. For example, ipaddress module uses: _DECIMAL_DIGITS = frozenset('0123456789') ... if _DECIMAL_DIGITS.issuperset(str): ref: https://github.com/python/cpython/blob/e76daebc0c8afa3981a4c5a8b54537f756e80... If str has str.isascii() method, it can be simpler: `if s.isascii() and s.isdigit():` I want to add it in Python 3.7 if there are no opposite opinions. Regrads, -- INADA Naoki <songofacandy@gmail.com>

On Fri, Jan 26, 2018 at 7:42 PM, INADA Naoki <songofacandy@gmail.com> wrote:
I'm not sure that the decimal-digit check is actually improved by this, but nonetheless, I am in favour of this feature. In CPython, this method can simply look at the object headers to see if it has the 'ascii' flag set; otherwise, it'd be effectively equivalent to: def isascii(self): return ord(max(self)) < 128 Would be handy when working with semi-textual protocols, where ASCII text is trivially encoded, but non-ASCII text may require negotiation or a protocol header. ChrisA

On 26.01.2018 09:53, Chris Angelico wrote:
+1 Just a note: checking the header in CPython will only give a hint, since strings created using higher order kinds can still be 100% ASCII. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jan 26 2018)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

Oh, really? I think checking header is enough for all ready unicode. For example, this is _PyUnicode_EqualToASCIIString implementation: if (PyUnicode_READY(unicode) == -1) { /* Memory error or bad data */ PyErr_Clear(); return non_ready_unicode_equal_to_ascii_string(unicode, str); } if (!PyUnicode_IS_ASCII(unicode)) return 0; And I think str.isascii() can be implemented as: if (PyUnicode_READY(unicode) == -1) { return NULL; } if (PyUnicode_IS_ASCII(unicode)) { Py_RETURN_TRUE; } else { Py_RETURN_FALSE; }

On 26.01.2018 10:44, INADA Naoki wrote:
No, because you can pass in maxchar to PyUnicode_New() and the implementation will take this as hint to the max code point used in the string. There is no check done whether maxchar is indeed the minimum upper bound to the code point ordinals. The reason for doing this is simple: you don't want to have to scan the string every time you create a Unicode object. CPython itself often does do such a scan before calling PyUnicode_New(), so in many cases, the header will be set to ASCII, but not always.
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jan 26 2018)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

API doc says: """ maxchar should be the true maximum code point to be placed in the string. As an approximation, it can be rounded up to the nearest value in the sequence 127, 255, 65535, 1114111. """ https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_New Since doc says *should*, strings created with wrong maxchar are considered invalid object. We already ignores string with wrong maxchars in some places. Even "a" == "a" may fail for such invalid string object. So I don't think str.iascii() should consider about it. Regards,

On 26.01.2018 12:17, INADA Naoki wrote:
Not really: "should" means should, not must :-) Objects created with PyUnicode_New() are valid and ready (this only has a meaning for legacy strings). You can set maxchar to 64k and still just use ASCII as content. In some cases, you may want the internal string representation to be wchar_t compatible or work with Py_UCS2/4, so both 64k and sys.maxunicode are reasonable and valid values. Overall, I'm starting to believe that a str.maxchar() function would be a better choice than to only go for ASCII. This could have an optional parameter "exact" to force scanning the string and returning the actual max code point ordinal when set to True (default), or return the approximation based on the used kind if not set (which is many cases, will give you a good hint). For checking ASCII, you'd then write: def isascii(s): if s.maxchar(exact=False) < 128: return True if s.maxchar() < 128: return True return False -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jan 26 2018)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

Do you mean we should fix *all* of CPython unicode handling, not only str.isascii()? At least, equality test doesn't care wrong kind. https://github.com/python/cpython/blob/master/Objects/stringlib/eq.h https://github.com/python/cpython/blob/e76daebc0c8afa3981a4c5a8b54537f756e80... https://github.com/python/cpython/blob/e76daebc0c8afa3981a4c5a8b54537f756e80... There may be many others, but I'm not sure. On Fri, Jan 26, 2018 at 10:02 PM, M.-A. Lemburg <mal@egenix.com> wrote:
-- INADA Naoki <songofacandy@gmail.com>

On Fri, Jan 26, 2018 at 10:17 PM, INADA Naoki <songofacandy@gmail.com> wrote:
Can you create a simple test-case that proves this? If so, I would say that this is a bug in the docs, and recommend rewording it somewhat thus: maxchar is either the actual maximum code point to be placed in the string, or (as an approximation) rounded up to the nearest value in the sequence 127, 255, 65535, 1114111. Failing a basic operation like equality checking would be considered a total failure. ChrisA

Can you create a simple test-case that proves this?
Sure. $ git diff diff --git a/Modules/_testcapimodule.c b/Modules/_testcapimodule.c index 2ad4322eca..475d5219e1 100644 --- a/Modules/_testcapimodule.c +++ b/Modules/_testcapimodule.c @@ -5307,6 +5307,12 @@ PyInit__testcapi(void) Py_INCREF(&PyInstanceMethod_Type); PyModule_AddObject(m, "instancemethod", (PyObject *)&PyInstanceMethod_Type); + PyObject *wrong_unicode = PyUnicode_New(1, 65535); + PyUnicode_WRITE(PyUnicode_2BYTE_KIND, + PyUnicode_DATA(wrong_unicode), + 0, 'a'); + PyModule_AddObject(m, "wrong_unicode", wrong_unicode); + PyModule_AddIntConstant(m, "the_number_three", 3); #ifdef WITH_PYMALLOC PyModule_AddObject(m, "WITH_PYMALLOC", Py_True); $ ./python Python 3.7.0a4+ (heads/master-dirty:e76daebc0c, Jan 26 2018, 22:31:18) [GCC 7.2.0] on linux Type "help", "copyright", "credits" or "license" for more information.

On Fri, 26 Jan 2018 22:33:36 +0900 INADA Naoki <songofacandy@gmail.com> wrote:
Can you create a simple test-case that proves this?
Sure.
I think the question assumed "without writing custom C or ctypes code that deliberately builds a non-conformant unicode object" ;-) Regards Antoine.

On 26.01.2018 15:58, Antoine Pitrou wrote:
I think his example is spot on, since this is how you'd expect to use the APIs. Even more so, if you don't know the maximum code point used in the data you write to the object upfront. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jan 26 2018)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

No. See this mail. https://mail.python.org/pipermail/python-ideas/2018-January/048748.html Point is should we support invalid Unicode created by C API. And I assume no. 2018/01/26 午後11:58 "Antoine Pitrou" <solipsis@pitrou.net>: On Fri, 26 Jan 2018 22:33:36 +0900 INADA Naoki <songofacandy@gmail.com> wrote:
Can you create a simple test-case that proves this?
Sure.
I think the question assumed "without writing custom C or ctypes code that deliberately builds a non-conformant unicode object" ;-) Regards Antoine. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

2018-01-26 12:17 GMT+01:00 INADA Naoki <songofacandy@gmail.com>:
PyUnicode objects must always use the most efficient storage. It's a very strong requirement of the PEP 393. As Naoki wrote, many functions rely on this assumption to implement fast-path. The assumption is even implemented in the debug check _PyUnicode_CheckConsistency(): https://github.com/python/cpython/blob/e76daebc0c8afa3981a4c5a8b54537f756e80... Victor

On 26.01.2018 14:31, Victor Stinner wrote:
If that's indeed being used as assumption, the docs must be fixed and PyUnicode_New() should verify this assumption as well - not only in debug builds using C asserts() :-) Going through the code, I saw a lot of calls to find_maxchar_surrogates() before calling PyUnicode_New(). This call would then have to be moved inside PyUnicode_New() instead. C extensions can easily create strings using PyUnicode_New() which do not adhere to such a requirement and then write arbitrary content using PyUnicode_WRITE(). In some cases, this may even be necessary, say in case the extension doesn't know what data is being written, reading it from some external source. I'm not too familiar with the new Unicode code, but it seems that this requirement is not checked everywhere, e.g. the resize code doesn't seem to have such checks either (only in debug versions). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jan 26 2018)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

2018-01-26 14:43 GMT+01:00 M.-A. Lemburg <mal@egenix.com>:
As PyUnicode_FromStringAndSize(NULL, size), PyUnicode_New(size, maxchar) only allocates memory with uninitialized characters. I don't see how PyUnicode_New() could check the string content since the content is unknow yet... The new public C API added by PEP 393 is hard to use correctly, but they are the most efficient. Functions like PyUnicode_FromString() are simple to use and very hard to misuse :-) PyPy developers asked me to simply drop all these new public C API, make them private. At least, deprecate them. But I never looked in depth at the new API. I don't know if Cython uses it for example. Some APIs are still private like _PyUnicodeWriter which allows to create a string in multiple steps with a smart strategy to reduce or even avoid realloc() and conversions from the different storage types (UCS1, UCS2, UCS4). This API is very efficient, but also hard to use.
It would be a bug in the C extension.
It must be checked everywhere. If it's not the case, it's an obvious bug in CPython. If you spotted a bug, please report a bug ;-) Victor

On 26.01.2018 14:55, Victor Stinner wrote:
You do have a point there ;-) I guess making the assumption very clear in the docs would be a good first step - as Chris suggested.
Dropping them would most likely seriously limit the usefulness of the Unicode API. If you always have to copy strings to create objects, this would make text intense work very slow. The usual approach is to have a three step process: 1. create a container object of sufficient size 2. write data 3. resize container to actual size I guess marking objects returned by PyUnicode_New() as "not ready" would help resolve the issue. Whenever the maxchar check is applied, the ready flag could then be set. The resize operations would then have to apply the maxchar check as well. Unfortunately, many of he readiness checks are only available in debug builds, but at least it's a way forward to make the API more robust.
Is there a way to call an API which fixes the setting (a public version of unicode_adjust_maxchar()) ? Without this, how would an extension be able to provide a correct value upfront without knowing the content ?
Yes, will do. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jan 26 2018)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

On Fri, Jan 26, 2018, at 09:18, M.-A. Lemburg wrote:
It obviously has to know the content before it can finally return the string (or pass it to any other function, etc), because strings are immutable. Why not then do all the intermediate work in an array of int32's (or perhaps a UCS-4 PyUnicode to be returned only if needed), then afterward scan and build the string?

On 26.01.2018 16:16, Random832 wrote:
The create, write data, resize approach is a standard way to build (longer) Pythhon string objects in the Python C API, since it avoids temporary copies. E.g. you don't want to first build a buffer to hold 100MB XML, then scan it for the max code point being used, create a python string from it (which copies the data into a second 100MB buffer) and then deallocate the first buffer again. Instead you create an uninitialized Python Unicode object and use PyUnicde_WRITE() to write the data directly into the object, avoiding the 100MB temp buffer. PS: Strings are immutable in Python, but they are not in C. You can manipulate string objects provided you own the only reference. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jan 26 2018)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

We have _PyUnicodeWriter for such use cases. We may able to expose it as public API, but please start another thread for it. Unicode created by wrong maxchar is not supported from Python 3.3. == and hash() doesn't work properly for such unicode object. So str.isascii() has not to support it too.
-- INADA Naoki <songofacandy@gmail.com>

+1 The idea is not new and I like it. Naoki created https://bugs.python.org/issue32677 Victor 2018-01-26 11:22 GMT+01:00 Antoine Pitrou <solipsis@pitrou.net>:

On Fri, Jan 26, 2018 at 05:42:31PM +0900, INADA Naoki wrote:
I have no objection to isascii, but I don't think it goes far enough. Sometimes I want to know whether a string is compatible with Latin-1 or UCS-2 as well as ASCII. For that, I used a function that exposes the size of code points in bits: @property def size(self): # This can be implemented much more efficiently in CPython. c = ord(max(self)) if self else 0 if c <= 0x7F: return 7 elif c <= 0xFF: return 8 elif c <= 0xFFFF: return 16 else: assert c <= 0x10FFFF return 21 A quick test for ASCII will be: string.size == 7 and to test that it is entirely within the BMP (Basic Multilingual Plane): string.size <= 16 -- Steve

2018-01-26 13:39 GMT+01:00 Steven D'Aprano <steve@pearwood.info>:
Really? I never required such check in practice. Would you mind to elaborate your use case? ASCII is very very common and hardcoded in many file formats and protocols. Other character sets are more rare.
An efficient, O(1) complexity, implementation can be annoying to implement. I don't think that it's worth it. Python doesn't have this method, and I never see any user requesting this feature. IMHO this size() idea comes from the PEP 393 design, but not from a real use case. In CPython, str.isascii() would be a O(1) operation since the result is "cached" by design in the implementation of PyUnicode. PEP 393 is an implementation detail. PyPy is now using utf8 internally, not PEP 393 (UCS1, UCS2 or UCS4). PyPy might want to use a bit to cache if the string is ASCII or not, but I'm not sure that it's worth it to check the maximum character or the size() result. Victor

On Fri, Jan 26, 2018 at 02:37:14PM +0100, Victor Stinner wrote:
tcl/tk and Javascript only support UCS-2 (16 bit) Unicode strings. Dealing with the Supplementary Unicode Planes have the same problems that older "narrow" builds of Python sufferred from: single code points were counted as len(2) instead of len(1), slicing could be wrong, etc. There are still many applications which assume Latin-1 data. For instance, I use a media player which displays mojibake when passed anything outside of Latin-1. Sometimes it is useful to know in advance when text you pass to another application is going to run into problems because of the other application's limitations. -- Steve

IMO the special status for isascii() matches the special status of ASCII as encoding (yeah, I know, it's not the default encoding anywhere, but it still comes up regularly in standards and as common subset of other encodings). Should you wish to check for compatibility with other ranges IMO some expression involving max(<the_string>) should cut it. (FWIW there should be a special place in hell for those people who say "ASCII" when they mean "Latin-1".) On Fri, Jan 26, 2018 at 5:27 PM, Steven D'Aprano <steve@pearwood.info> wrote:
-- --Guido van Rossum (python.org/~guido)

On 1/27/2018 2:01 AM, Guido van Rossum wrote:
It occurred to me that this might be an issue. Rather than define a LBYL scanner in Python, I think I will try wrapping inserts of user-supplied strings into widgets with with try: insert; except: <do whatever else>. -- Terry Jan Reedy

On Fri, Jan 26, 2018 at 5:27 PM, Steven D'Aprano <steve@pearwood.info> wrote:
I'm confused -- isn't the way to do this to encode your text into the encoding the other application accepts ? if you really want to know in advance, it is so hard to run it through a encode/decode sandwich? Wait -- I can't find UCS-2 in the built-in encodings -- am I dense or is it not there? Shouldn't it be? If only for this reason? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 30 January 2018 at 06:54, Chris Barker <chris.barker@noaa.gov> wrote:
If you're wanting to check whether or not something lies entirely within the BMP, check for: 2*len(text) == len(text.encode("utf-16")) # True iff text is UCS-2 If there's an astral code point in there, then the encoded version will need more than 2 bytes for at least one element, so the result will end up being longer than it would for UCS-2 data. You can also check for pure ASCII in much the same way: len(text) == len(text.encode("utf-8")) # True iff text is 7-bit ASCII So this is partly an optimisation question: - folks want to avoid allocating a bytes object just to throw it away - folks want to avoid running the equivalent of "max(map(ord, text))" - folks know that CPython (at least) tracks this kind of info internally to manage its own storage allocations But it's also a readability question: "is_ascii()" and "is_UCS2()/is_BMP()" just require knowing what 7-bit ASCII and UCS-2 (or the basic multilingual plane) *are*, whereas the current ways of checking for them require knowing how they *behave*. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Jan 30, 2018 at 03:12:52PM +1000, Nick Coghlan wrote: [...]
Agreed with all of those. However, given how niche the varieties other than is_ascii() are, I'm not going to push for them. I use them rarely enough, or on small enough strings, that doing an O(N) max(string) is not that great a burden. I can continue using a helper function. -- Steve

On Tue, Jan 30, 2018 at 12:00 AM, Steven D'Aprano <steve@pearwood.info> wrote:
This is important. Agreed with all of those.
sure, but adding is_ascii() and is_bmp() are pretty small additions as well. I"d say for the newbiew among us, it would be a nice feature: +1 As for is_bmp() -- yes, UCS-2 is "deprecated", but there are plenty of systems that don't handle UTF-16 well, so it's nice to know, and not hard to write. I also think a UCS-2 encoding would be handy -- but I won't personally use it, so I'll wait for someone that has a use case to ask for it. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Mon, Jan 29, 2018 at 12:54:41PM -0800, Chris Barker wrote:
I'm confused -- isn't the way to do this to encode your text into the encoding the other application accepts ?
Its more about warning the user of *my* application that the data they're exporting could generate mojibake, or even fail, in the other application.
if you really want to know in advance, it is so hard to run it through a encode/decode sandwich?
See Nick's answer.
Wait -- I can't find UCS-2 in the built-in encodings -- am I dense or is it not there? Shouldn't it be? If only for this reason?
Strictly speaking, UCS-2 is an obsolute standard more or less equivalent to UTF-16, except it doesn't support "astral characters" encoded by a pair of supplementary code points. However, in practice, some languages' nominal UTF-16 handling is less than 100% conformant, in that they treat a surrogate pair as two undefined characters of one code point each, instead of a single defined character of two code points. So I guess I'm using UCS-2 in an informal sense of "like UTF-16, without the astral characters". I'm not asking for an explicit UCS-2 codec. -- Steve

26.01.18 10:42, INADA Naoki пише:
There were discussions about this. See for example https://bugs.python.org/issue18814. In short, there are two considerations that prevented adding this feature: 1. This function can have the constant computation complexity in CPython (just check a single bit), but other implementations may provide only the linear computation complexity. 2. In many cases just after taking the answer to this question we encode the string to bytes (or decode bytes to string). Thus the most natural way to determining if the string is ASCII-only is trying to encode it to ASCII. And adding a new method to the basic type has a high bar. The code in ipaddress if not _BaseV4._DECIMAL_DIGITS.issuperset(prefixlen_str): cls._report_invalid_netmask(prefixlen_str) try: prefixlen = int(prefixlen_str) except ValueError: cls._report_invalid_netmask(prefixlen_str) if not (0 <= prefixlen <= cls._max_prefixlen): cls._report_invalid_netmask(prefixlen_str) return prefixlen can be rewritten as: if not prefixlen_str.isdigit(): cls._report_invalid_netmask(prefixlen_str) try: prefixlen = int(prefixlen_str.encode('ascii')) except UnicodeEncodeError: cls._report_invalid_netmask(prefixlen_str) except ValueError: cls._report_invalid_netmask(prefixlen_str) if not (0 <= prefixlen <= cls._max_prefixlen): cls._report_invalid_netmask(prefixlen_str) return prefixlen Other possibility -- adding support of the boolean argument in str.isdigit() and similar predicates that switch them to the ASCII-only mode. Such option will be very useful for the str.strip(), str.split() and str.splilines() methods. Currently they split using all Unicode whitespaces and line separators, but there is a need to split only on ASCII whitespaces and line separators CR, LF and CRLF. In case of str.strip() and str.split() you can just pass the string of whitespace characters, but there is no such option for str.splilines().

Hm, it seems I was too hurry to implement it...
Yes. There are no O(1) guarantee about .isascii(). But I expect UTF-8 based string implementation PyPy will have can achieve O(1); just test len(s) == __internal_utf8_len(s) I think if *some* of implementations can achieve O(1), it's beneficial to implement.
Yes. But ASCII is so special. Someone may want to check ASCII before passing string to int(), float(), decimal.Decimal(), etc... But I don't think there is real use case for encodings other than ASCII.
And adding a new method to the basic type has a high bar.
Agree.
Yes. But .isascii() will be match faster than try ... .encode('ascii') ... except UnicodeEncodeError on most Python implementations.
It sounds good idea. Maybe, keyword only argument `ascii=False`? But if revert adding str.isascii() from Python 3.7, same keyword-only argument should be added to int(), float(), decimal.Decimal(), fractions.Fraction(), etc... It's bit hard. So I think adding .isascii() is beneficial even if all str.is***() methods have `ascii=False` flag.

31.01.18 13:18, INADA Naoki пише:
In this case this doesn't matter since this is an exceptional case, and in any case an exception is raised for non-ascii string. But you are true that str.isascii() can be faster than str.encode(), and encoding is not required for converting to int.
There is an issue for str.splilines() (don't remember the number). The main problem was that I was not sure about an obvious argument name.
Ah, it is already committed. Then I think it is too later to revert this. I had doubts about this feature and were -0 for adding it (until we discuss it more), but since it is added I don't see much benefit from removing it.

I like the idea of str.isdigit(ascii=True): would behave as str.isdigit() and str.isascii(). It's easy to implement and likely to be very efficient. I'm just not sure that it's so commonly required? At least, I guess that some users can be surprised that str.isdigit() is "Unicode aware", accept non-ASCII digits, as int(str). Victor 2018-01-31 12:18 GMT+01:00 INADA Naoki <songofacandy@gmail.com>:

On Fri, Jan 26, 2018 at 7:42 PM, INADA Naoki <songofacandy@gmail.com> wrote:
I'm not sure that the decimal-digit check is actually improved by this, but nonetheless, I am in favour of this feature. In CPython, this method can simply look at the object headers to see if it has the 'ascii' flag set; otherwise, it'd be effectively equivalent to: def isascii(self): return ord(max(self)) < 128 Would be handy when working with semi-textual protocols, where ASCII text is trivially encoded, but non-ASCII text may require negotiation or a protocol header. ChrisA

On 26.01.2018 09:53, Chris Angelico wrote:
+1 Just a note: checking the header in CPython will only give a hint, since strings created using higher order kinds can still be 100% ASCII. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jan 26 2018)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

Oh, really? I think checking header is enough for all ready unicode. For example, this is _PyUnicode_EqualToASCIIString implementation: if (PyUnicode_READY(unicode) == -1) { /* Memory error or bad data */ PyErr_Clear(); return non_ready_unicode_equal_to_ascii_string(unicode, str); } if (!PyUnicode_IS_ASCII(unicode)) return 0; And I think str.isascii() can be implemented as: if (PyUnicode_READY(unicode) == -1) { return NULL; } if (PyUnicode_IS_ASCII(unicode)) { Py_RETURN_TRUE; } else { Py_RETURN_FALSE; }

On 26.01.2018 10:44, INADA Naoki wrote:
No, because you can pass in maxchar to PyUnicode_New() and the implementation will take this as hint to the max code point used in the string. There is no check done whether maxchar is indeed the minimum upper bound to the code point ordinals. The reason for doing this is simple: you don't want to have to scan the string every time you create a Unicode object. CPython itself often does do such a scan before calling PyUnicode_New(), so in many cases, the header will be set to ASCII, but not always.
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jan 26 2018)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

API doc says: """ maxchar should be the true maximum code point to be placed in the string. As an approximation, it can be rounded up to the nearest value in the sequence 127, 255, 65535, 1114111. """ https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_New Since doc says *should*, strings created with wrong maxchar are considered invalid object. We already ignores string with wrong maxchars in some places. Even "a" == "a" may fail for such invalid string object. So I don't think str.iascii() should consider about it. Regards,

On 26.01.2018 12:17, INADA Naoki wrote:
Not really: "should" means should, not must :-) Objects created with PyUnicode_New() are valid and ready (this only has a meaning for legacy strings). You can set maxchar to 64k and still just use ASCII as content. In some cases, you may want the internal string representation to be wchar_t compatible or work with Py_UCS2/4, so both 64k and sys.maxunicode are reasonable and valid values. Overall, I'm starting to believe that a str.maxchar() function would be a better choice than to only go for ASCII. This could have an optional parameter "exact" to force scanning the string and returning the actual max code point ordinal when set to True (default), or return the approximation based on the used kind if not set (which is many cases, will give you a good hint). For checking ASCII, you'd then write: def isascii(s): if s.maxchar(exact=False) < 128: return True if s.maxchar() < 128: return True return False -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jan 26 2018)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

Do you mean we should fix *all* of CPython unicode handling, not only str.isascii()? At least, equality test doesn't care wrong kind. https://github.com/python/cpython/blob/master/Objects/stringlib/eq.h https://github.com/python/cpython/blob/e76daebc0c8afa3981a4c5a8b54537f756e80... https://github.com/python/cpython/blob/e76daebc0c8afa3981a4c5a8b54537f756e80... There may be many others, but I'm not sure. On Fri, Jan 26, 2018 at 10:02 PM, M.-A. Lemburg <mal@egenix.com> wrote:
-- INADA Naoki <songofacandy@gmail.com>

On Fri, Jan 26, 2018 at 10:17 PM, INADA Naoki <songofacandy@gmail.com> wrote:
Can you create a simple test-case that proves this? If so, I would say that this is a bug in the docs, and recommend rewording it somewhat thus: maxchar is either the actual maximum code point to be placed in the string, or (as an approximation) rounded up to the nearest value in the sequence 127, 255, 65535, 1114111. Failing a basic operation like equality checking would be considered a total failure. ChrisA

Can you create a simple test-case that proves this?
Sure. $ git diff diff --git a/Modules/_testcapimodule.c b/Modules/_testcapimodule.c index 2ad4322eca..475d5219e1 100644 --- a/Modules/_testcapimodule.c +++ b/Modules/_testcapimodule.c @@ -5307,6 +5307,12 @@ PyInit__testcapi(void) Py_INCREF(&PyInstanceMethod_Type); PyModule_AddObject(m, "instancemethod", (PyObject *)&PyInstanceMethod_Type); + PyObject *wrong_unicode = PyUnicode_New(1, 65535); + PyUnicode_WRITE(PyUnicode_2BYTE_KIND, + PyUnicode_DATA(wrong_unicode), + 0, 'a'); + PyModule_AddObject(m, "wrong_unicode", wrong_unicode); + PyModule_AddIntConstant(m, "the_number_three", 3); #ifdef WITH_PYMALLOC PyModule_AddObject(m, "WITH_PYMALLOC", Py_True); $ ./python Python 3.7.0a4+ (heads/master-dirty:e76daebc0c, Jan 26 2018, 22:31:18) [GCC 7.2.0] on linux Type "help", "copyright", "credits" or "license" for more information.

On Fri, 26 Jan 2018 22:33:36 +0900 INADA Naoki <songofacandy@gmail.com> wrote:
Can you create a simple test-case that proves this?
Sure.
I think the question assumed "without writing custom C or ctypes code that deliberately builds a non-conformant unicode object" ;-) Regards Antoine.

On 26.01.2018 15:58, Antoine Pitrou wrote:
I think his example is spot on, since this is how you'd expect to use the APIs. Even more so, if you don't know the maximum code point used in the data you write to the object upfront. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jan 26 2018)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

No. See this mail. https://mail.python.org/pipermail/python-ideas/2018-January/048748.html Point is should we support invalid Unicode created by C API. And I assume no. 2018/01/26 午後11:58 "Antoine Pitrou" <solipsis@pitrou.net>: On Fri, 26 Jan 2018 22:33:36 +0900 INADA Naoki <songofacandy@gmail.com> wrote:
Can you create a simple test-case that proves this?
Sure.
I think the question assumed "without writing custom C or ctypes code that deliberately builds a non-conformant unicode object" ;-) Regards Antoine. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

2018-01-26 12:17 GMT+01:00 INADA Naoki <songofacandy@gmail.com>:
PyUnicode objects must always use the most efficient storage. It's a very strong requirement of the PEP 393. As Naoki wrote, many functions rely on this assumption to implement fast-path. The assumption is even implemented in the debug check _PyUnicode_CheckConsistency(): https://github.com/python/cpython/blob/e76daebc0c8afa3981a4c5a8b54537f756e80... Victor

On 26.01.2018 14:31, Victor Stinner wrote:
If that's indeed being used as assumption, the docs must be fixed and PyUnicode_New() should verify this assumption as well - not only in debug builds using C asserts() :-) Going through the code, I saw a lot of calls to find_maxchar_surrogates() before calling PyUnicode_New(). This call would then have to be moved inside PyUnicode_New() instead. C extensions can easily create strings using PyUnicode_New() which do not adhere to such a requirement and then write arbitrary content using PyUnicode_WRITE(). In some cases, this may even be necessary, say in case the extension doesn't know what data is being written, reading it from some external source. I'm not too familiar with the new Unicode code, but it seems that this requirement is not checked everywhere, e.g. the resize code doesn't seem to have such checks either (only in debug versions). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jan 26 2018)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

2018-01-26 14:43 GMT+01:00 M.-A. Lemburg <mal@egenix.com>:
As PyUnicode_FromStringAndSize(NULL, size), PyUnicode_New(size, maxchar) only allocates memory with uninitialized characters. I don't see how PyUnicode_New() could check the string content since the content is unknow yet... The new public C API added by PEP 393 is hard to use correctly, but they are the most efficient. Functions like PyUnicode_FromString() are simple to use and very hard to misuse :-) PyPy developers asked me to simply drop all these new public C API, make them private. At least, deprecate them. But I never looked in depth at the new API. I don't know if Cython uses it for example. Some APIs are still private like _PyUnicodeWriter which allows to create a string in multiple steps with a smart strategy to reduce or even avoid realloc() and conversions from the different storage types (UCS1, UCS2, UCS4). This API is very efficient, but also hard to use.
It would be a bug in the C extension.
It must be checked everywhere. If it's not the case, it's an obvious bug in CPython. If you spotted a bug, please report a bug ;-) Victor

On 26.01.2018 14:55, Victor Stinner wrote:
You do have a point there ;-) I guess making the assumption very clear in the docs would be a good first step - as Chris suggested.
Dropping them would most likely seriously limit the usefulness of the Unicode API. If you always have to copy strings to create objects, this would make text intense work very slow. The usual approach is to have a three step process: 1. create a container object of sufficient size 2. write data 3. resize container to actual size I guess marking objects returned by PyUnicode_New() as "not ready" would help resolve the issue. Whenever the maxchar check is applied, the ready flag could then be set. The resize operations would then have to apply the maxchar check as well. Unfortunately, many of he readiness checks are only available in debug builds, but at least it's a way forward to make the API more robust.
Is there a way to call an API which fixes the setting (a public version of unicode_adjust_maxchar()) ? Without this, how would an extension be able to provide a correct value upfront without knowing the content ?
Yes, will do. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jan 26 2018)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

On Fri, Jan 26, 2018, at 09:18, M.-A. Lemburg wrote:
It obviously has to know the content before it can finally return the string (or pass it to any other function, etc), because strings are immutable. Why not then do all the intermediate work in an array of int32's (or perhaps a UCS-4 PyUnicode to be returned only if needed), then afterward scan and build the string?

On 26.01.2018 16:16, Random832 wrote:
The create, write data, resize approach is a standard way to build (longer) Pythhon string objects in the Python C API, since it avoids temporary copies. E.g. you don't want to first build a buffer to hold 100MB XML, then scan it for the max code point being used, create a python string from it (which copies the data into a second 100MB buffer) and then deallocate the first buffer again. Instead you create an uninitialized Python Unicode object and use PyUnicde_WRITE() to write the data directly into the object, avoiding the 100MB temp buffer. PS: Strings are immutable in Python, but they are not in C. You can manipulate string objects provided you own the only reference. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jan 26 2018)
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/

We have _PyUnicodeWriter for such use cases. We may able to expose it as public API, but please start another thread for it. Unicode created by wrong maxchar is not supported from Python 3.3. == and hash() doesn't work properly for such unicode object. So str.isascii() has not to support it too.
-- INADA Naoki <songofacandy@gmail.com>

+1 The idea is not new and I like it. Naoki created https://bugs.python.org/issue32677 Victor 2018-01-26 11:22 GMT+01:00 Antoine Pitrou <solipsis@pitrou.net>:

On Fri, Jan 26, 2018 at 05:42:31PM +0900, INADA Naoki wrote:
I have no objection to isascii, but I don't think it goes far enough. Sometimes I want to know whether a string is compatible with Latin-1 or UCS-2 as well as ASCII. For that, I used a function that exposes the size of code points in bits: @property def size(self): # This can be implemented much more efficiently in CPython. c = ord(max(self)) if self else 0 if c <= 0x7F: return 7 elif c <= 0xFF: return 8 elif c <= 0xFFFF: return 16 else: assert c <= 0x10FFFF return 21 A quick test for ASCII will be: string.size == 7 and to test that it is entirely within the BMP (Basic Multilingual Plane): string.size <= 16 -- Steve

2018-01-26 13:39 GMT+01:00 Steven D'Aprano <steve@pearwood.info>:
Really? I never required such check in practice. Would you mind to elaborate your use case? ASCII is very very common and hardcoded in many file formats and protocols. Other character sets are more rare.
An efficient, O(1) complexity, implementation can be annoying to implement. I don't think that it's worth it. Python doesn't have this method, and I never see any user requesting this feature. IMHO this size() idea comes from the PEP 393 design, but not from a real use case. In CPython, str.isascii() would be a O(1) operation since the result is "cached" by design in the implementation of PyUnicode. PEP 393 is an implementation detail. PyPy is now using utf8 internally, not PEP 393 (UCS1, UCS2 or UCS4). PyPy might want to use a bit to cache if the string is ASCII or not, but I'm not sure that it's worth it to check the maximum character or the size() result. Victor

On Fri, Jan 26, 2018 at 02:37:14PM +0100, Victor Stinner wrote:
tcl/tk and Javascript only support UCS-2 (16 bit) Unicode strings. Dealing with the Supplementary Unicode Planes have the same problems that older "narrow" builds of Python sufferred from: single code points were counted as len(2) instead of len(1), slicing could be wrong, etc. There are still many applications which assume Latin-1 data. For instance, I use a media player which displays mojibake when passed anything outside of Latin-1. Sometimes it is useful to know in advance when text you pass to another application is going to run into problems because of the other application's limitations. -- Steve

IMO the special status for isascii() matches the special status of ASCII as encoding (yeah, I know, it's not the default encoding anywhere, but it still comes up regularly in standards and as common subset of other encodings). Should you wish to check for compatibility with other ranges IMO some expression involving max(<the_string>) should cut it. (FWIW there should be a special place in hell for those people who say "ASCII" when they mean "Latin-1".) On Fri, Jan 26, 2018 at 5:27 PM, Steven D'Aprano <steve@pearwood.info> wrote:
-- --Guido van Rossum (python.org/~guido)

On 1/27/2018 2:01 AM, Guido van Rossum wrote:
It occurred to me that this might be an issue. Rather than define a LBYL scanner in Python, I think I will try wrapping inserts of user-supplied strings into widgets with with try: insert; except: <do whatever else>. -- Terry Jan Reedy

On Fri, Jan 26, 2018 at 5:27 PM, Steven D'Aprano <steve@pearwood.info> wrote:
I'm confused -- isn't the way to do this to encode your text into the encoding the other application accepts ? if you really want to know in advance, it is so hard to run it through a encode/decode sandwich? Wait -- I can't find UCS-2 in the built-in encodings -- am I dense or is it not there? Shouldn't it be? If only for this reason? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 30 January 2018 at 06:54, Chris Barker <chris.barker@noaa.gov> wrote:
If you're wanting to check whether or not something lies entirely within the BMP, check for: 2*len(text) == len(text.encode("utf-16")) # True iff text is UCS-2 If there's an astral code point in there, then the encoded version will need more than 2 bytes for at least one element, so the result will end up being longer than it would for UCS-2 data. You can also check for pure ASCII in much the same way: len(text) == len(text.encode("utf-8")) # True iff text is 7-bit ASCII So this is partly an optimisation question: - folks want to avoid allocating a bytes object just to throw it away - folks want to avoid running the equivalent of "max(map(ord, text))" - folks know that CPython (at least) tracks this kind of info internally to manage its own storage allocations But it's also a readability question: "is_ascii()" and "is_UCS2()/is_BMP()" just require knowing what 7-bit ASCII and UCS-2 (or the basic multilingual plane) *are*, whereas the current ways of checking for them require knowing how they *behave*. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Jan 30, 2018 at 03:12:52PM +1000, Nick Coghlan wrote: [...]
Agreed with all of those. However, given how niche the varieties other than is_ascii() are, I'm not going to push for them. I use them rarely enough, or on small enough strings, that doing an O(N) max(string) is not that great a burden. I can continue using a helper function. -- Steve

On Tue, Jan 30, 2018 at 12:00 AM, Steven D'Aprano <steve@pearwood.info> wrote:
This is important. Agreed with all of those.
sure, but adding is_ascii() and is_bmp() are pretty small additions as well. I"d say for the newbiew among us, it would be a nice feature: +1 As for is_bmp() -- yes, UCS-2 is "deprecated", but there are plenty of systems that don't handle UTF-16 well, so it's nice to know, and not hard to write. I also think a UCS-2 encoding would be handy -- but I won't personally use it, so I'll wait for someone that has a use case to ask for it. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Mon, Jan 29, 2018 at 12:54:41PM -0800, Chris Barker wrote:
I'm confused -- isn't the way to do this to encode your text into the encoding the other application accepts ?
Its more about warning the user of *my* application that the data they're exporting could generate mojibake, or even fail, in the other application.
if you really want to know in advance, it is so hard to run it through a encode/decode sandwich?
See Nick's answer.
Wait -- I can't find UCS-2 in the built-in encodings -- am I dense or is it not there? Shouldn't it be? If only for this reason?
Strictly speaking, UCS-2 is an obsolute standard more or less equivalent to UTF-16, except it doesn't support "astral characters" encoded by a pair of supplementary code points. However, in practice, some languages' nominal UTF-16 handling is less than 100% conformant, in that they treat a surrogate pair as two undefined characters of one code point each, instead of a single defined character of two code points. So I guess I'm using UCS-2 in an informal sense of "like UTF-16, without the astral characters". I'm not asking for an explicit UCS-2 codec. -- Steve

26.01.18 10:42, INADA Naoki пише:
There were discussions about this. See for example https://bugs.python.org/issue18814. In short, there are two considerations that prevented adding this feature: 1. This function can have the constant computation complexity in CPython (just check a single bit), but other implementations may provide only the linear computation complexity. 2. In many cases just after taking the answer to this question we encode the string to bytes (or decode bytes to string). Thus the most natural way to determining if the string is ASCII-only is trying to encode it to ASCII. And adding a new method to the basic type has a high bar. The code in ipaddress if not _BaseV4._DECIMAL_DIGITS.issuperset(prefixlen_str): cls._report_invalid_netmask(prefixlen_str) try: prefixlen = int(prefixlen_str) except ValueError: cls._report_invalid_netmask(prefixlen_str) if not (0 <= prefixlen <= cls._max_prefixlen): cls._report_invalid_netmask(prefixlen_str) return prefixlen can be rewritten as: if not prefixlen_str.isdigit(): cls._report_invalid_netmask(prefixlen_str) try: prefixlen = int(prefixlen_str.encode('ascii')) except UnicodeEncodeError: cls._report_invalid_netmask(prefixlen_str) except ValueError: cls._report_invalid_netmask(prefixlen_str) if not (0 <= prefixlen <= cls._max_prefixlen): cls._report_invalid_netmask(prefixlen_str) return prefixlen Other possibility -- adding support of the boolean argument in str.isdigit() and similar predicates that switch them to the ASCII-only mode. Such option will be very useful for the str.strip(), str.split() and str.splilines() methods. Currently they split using all Unicode whitespaces and line separators, but there is a need to split only on ASCII whitespaces and line separators CR, LF and CRLF. In case of str.strip() and str.split() you can just pass the string of whitespace characters, but there is no such option for str.splilines().

Hm, it seems I was too hurry to implement it...
Yes. There are no O(1) guarantee about .isascii(). But I expect UTF-8 based string implementation PyPy will have can achieve O(1); just test len(s) == __internal_utf8_len(s) I think if *some* of implementations can achieve O(1), it's beneficial to implement.
Yes. But ASCII is so special. Someone may want to check ASCII before passing string to int(), float(), decimal.Decimal(), etc... But I don't think there is real use case for encodings other than ASCII.
And adding a new method to the basic type has a high bar.
Agree.
Yes. But .isascii() will be match faster than try ... .encode('ascii') ... except UnicodeEncodeError on most Python implementations.
It sounds good idea. Maybe, keyword only argument `ascii=False`? But if revert adding str.isascii() from Python 3.7, same keyword-only argument should be added to int(), float(), decimal.Decimal(), fractions.Fraction(), etc... It's bit hard. So I think adding .isascii() is beneficial even if all str.is***() methods have `ascii=False` flag.

31.01.18 13:18, INADA Naoki пише:
In this case this doesn't matter since this is an exceptional case, and in any case an exception is raised for non-ascii string. But you are true that str.isascii() can be faster than str.encode(), and encoding is not required for converting to int.
There is an issue for str.splilines() (don't remember the number). The main problem was that I was not sure about an obvious argument name.
Ah, it is already committed. Then I think it is too later to revert this. I had doubts about this feature and were -0 for adding it (until we discuss it more), but since it is added I don't see much benefit from removing it.

I like the idea of str.isdigit(ascii=True): would behave as str.isdigit() and str.isascii(). It's easy to implement and likely to be very efficient. I'm just not sure that it's so commonly required? At least, I guess that some users can be surprised that str.isdigit() is "Unicode aware", accept non-ASCII digits, as int(str). Victor 2018-01-31 12:18 GMT+01:00 INADA Naoki <songofacandy@gmail.com>:
participants (12)
-
Antoine Pitrou
-
Chris Angelico
-
Chris Barker
-
Guido van Rossum
-
INADA Naoki
-
M.-A. Lemburg
-
Nick Coghlan
-
Random832
-
Serhiy Storchaka
-
Steven D'Aprano
-
Terry Reedy
-
Victor Stinner