[Python-ideas] Adding str.isascii() ?
Serhiy Storchaka
storchaka at gmail.com
Wed Jan 31 05:49:42 EST 2018
26.01.18 10:42, INADA Naoki пише:
> Currently, int(), str.isdigit(), str.isalnum(), etc... accepts
> non-ASCII strings.
>
>>>> s = 123"
>>>> s
> '123'
>>>> s.isdigit()
> True
>>>> print(ascii(s))
> '\uff11\uff12\uff13'
>>>> int(s)
> 123
>
> But sometimes, we want to accept only ascii string. For example,
> ipaddress module uses:
>
> _DECIMAL_DIGITS = frozenset('0123456789')
> ...
> if _DECIMAL_DIGITS.issuperset(str):
>
> ref: https://github.com/python/cpython/blob/e76daebc0c8afa3981a4c5a8b54537f756e805de/Lib/ipaddress.py#L491-L494
>
> If str has str.isascii() method, it can be simpler:
>
> `if s.isascii() and s.isdigit():`
>
> I want to add it in Python 3.7 if there are no opposite opinions.
There were discussions about this. See for example
https://bugs.python.org/issue18814.
In short, there are two considerations that prevented adding this feature:
1. This function can have the constant computation complexity in CPython
(just check a single bit), but other implementations may provide only
the linear computation complexity.
2. In many cases just after taking the answer to this question we encode
the string to bytes (or decode bytes to string). Thus the most natural
way to determining if the string is ASCII-only is trying to encode it to
ASCII.
And adding a new method to the basic type has a high bar.
The code in ipaddress
if not _BaseV4._DECIMAL_DIGITS.issuperset(prefixlen_str):
cls._report_invalid_netmask(prefixlen_str)
try:
prefixlen = int(prefixlen_str)
except ValueError:
cls._report_invalid_netmask(prefixlen_str)
if not (0 <= prefixlen <= cls._max_prefixlen):
cls._report_invalid_netmask(prefixlen_str)
return prefixlen
can be rewritten as:
if not prefixlen_str.isdigit():
cls._report_invalid_netmask(prefixlen_str)
try:
prefixlen = int(prefixlen_str.encode('ascii'))
except UnicodeEncodeError:
cls._report_invalid_netmask(prefixlen_str)
except ValueError:
cls._report_invalid_netmask(prefixlen_str)
if not (0 <= prefixlen <= cls._max_prefixlen):
cls._report_invalid_netmask(prefixlen_str)
return prefixlen
Other possibility -- adding support of the boolean argument in
str.isdigit() and similar predicates that switch them to the ASCII-only
mode. Such option will be very useful for the str.strip(), str.split()
and str.splilines() methods. Currently they split using all Unicode
whitespaces and line separators, but there is a need to split only on
ASCII whitespaces and line separators CR, LF and CRLF. In case of
str.strip() and str.split() you can just pass the string of whitespace
characters, but there is no such option for str.splilines().
More information about the Python-ideas
mailing list