
Serhiy Storchaka wrote:
As shown in issue #15016 [1], there is a use cases when it is useful to determine that string can be encoded in ASCII or Latin1. In working with Tk or Windows console applications can be useful to determine that string can be encoded in UCS2. C API provides interface for this, but at Python level it is not available.
I propose to add to strings class new methods: isascii(), islatin1() and isbmp() (in addition to such methods as isalpha() or isdigit()). The implementation will be trivial.
Pro: The current trick with trying to encode has O(n) complexity and has overhead of exception raising/catching.
Are you suggesting that isascii and friends would be *better* than O(n)? How can that work -- wouldn't it have to scan the string and look at each character? Why just ASCII, Latin1 and BMP (whatever that is, googling has not come up with anything relevant)? It seems to me that adding these three tests will open the doors to a steady stream of requests for new methods is<insert encoding name here>. I suggest that a better API would be a method that takes the name of an encoding (perhaps defaulting to 'ascii') and returns True|False: string.encodable(encoding='ascii') -> True|False Return True if string can be encoded using the named encoding, otherwise False. One last pedantic issue: strings aren't ASCII or Latin1, etc., but Unicode. There is enough confusion between Unicode text strings and bytes without adding methods whose names blur the distinction slightly. -- Steven