Re: [Python-ideas] isascii()/islatin1()/isbmp()

1 Jul 2012

      Terry Reedy wrote:
...
On 6/30/2012 8:59 PM, Steven D'Aprano wrote:
...
...
I suggest that a better API would be a method that takes the name of an
encoding (perhaps defaulting to 'ascii') and returns True|False:
string.encodable(encoding='ascii') -> True|False
Return True if string can be encoded using the named encoding, otherwise
False.
But then one might as well try the encoding and check for exception. The 
point of the proposal is to avoid things like
try:
  body = text.encode('ascii')
  header = 'ascii'  #abbreviating here
except UnicodeEncodeError:
  try:
    body = text.encode('latin1')
    header = 'latin1'
  except UnicodeEncodeError:
    body = text.encode('utf-8')
    header = 'utf-8'
Right. And re-written with the hypothetical encodable method, you have the 
usual advantage of LBYL that it is slightly more concise:

body = header = None
for encoding in ('ascii', 'latin1', 'utf-8'):
     if text.encodable(encoding):
         body = text.encode(encoding)
         header = encoding

instead of:

body = header = None
for encoding in ('ascii', 'latin1', 'utf-8'):
     try:
         body = text.encode(encoding)
         header = encoding
     except UnicodeEncodeError:
         pass

As for as expressibility goes, it is not much of an advantage. But:

- if there are optimizations that apply to some encodings but not others,
   the encodable method can take advantage of them without it being a
   promise of the language;

- it only adds a single string method (and presumably a single bytes
   method, decodable) rather than a plethora of methods;

So, I don't care much either way for a LBYL test, but if there is a good use 
case for such a test, better for it to be a single method taking the encoding 
name rather than a multitude of tests, or exposing an implementation-specific 
value that the coder then has to interpret themselves.

-1 on isascii, islatin1, isbmp
-1 on exposing max_code_point
+0.5 on encodable

-- 
Steven