Terry Reedy wrote:
On 6/30/2012 8:59 PM, Steven D'Aprano wrote:
I suggest that a better API would be a method that takes the name of an encoding (perhaps defaulting to 'ascii') and returns True|False:
string.encodable(encoding='ascii') -> True|False
Return True if string can be encoded using the named encoding, otherwise False.
But then one might as well try the encoding and check for exception. The point of the proposal is to avoid things like
try: body = text.encode('ascii') header = 'ascii' #abbreviating here except UnicodeEncodeError: try: body = text.encode('latin1') header = 'latin1' except UnicodeEncodeError: body = text.encode('utf-8') header = 'utf-8'
Right. And re-written with the hypothetical encodable method, you have the usual advantage of LBYL that it is slightly more concise: body = header = None for encoding in ('ascii', 'latin1', 'utf-8'): if text.encodable(encoding): body = text.encode(encoding) header = encoding instead of: body = header = None for encoding in ('ascii', 'latin1', 'utf-8'): try: body = text.encode(encoding) header = encoding except UnicodeEncodeError: pass As for as expressibility goes, it is not much of an advantage. But: - if there are optimizations that apply to some encodings but not others, the encodable method can take advantage of them without it being a promise of the language; - it only adds a single string method (and presumably a single bytes method, decodable) rather than a plethora of methods; So, I don't care much either way for a LBYL test, but if there is a good use case for such a test, better for it to be a single method taking the encoding name rather than a multitude of tests, or exposing an implementation-specific value that the coder then has to interpret themselves. -1 on isascii, islatin1, isbmp -1 on exposing max_code_point +0.5 on encodable -- Steven