[Python-ideas] isascii()/islatin1()/isbmp()

Steven D'Aprano steve at pearwood.info
Sun Jul 1 05:21:32 CEST 2012


Terry Reedy wrote:
> On 6/30/2012 8:59 PM, Steven D'Aprano wrote:

>> I suggest that a better API would be a method that takes the name of an
>> encoding (perhaps defaulting to 'ascii') and returns True|False:
>>
>> string.encodable(encoding='ascii') -> True|False
>>
>> Return True if string can be encoded using the named encoding, otherwise
>> False.
> 
> But then one might as well try the encoding and check for exception. The 
> point of the proposal is to avoid things like
> 
> try:
>   body = text.encode('ascii')
>   header = 'ascii'  #abbreviating here
> except UnicodeEncodeError:
>   try:
>     body = text.encode('latin1')
>     header = 'latin1'
>   except UnicodeEncodeError:
>     body = text.encode('utf-8')
>     header = 'utf-8'

Right. And re-written with the hypothetical encodable method, you have the 
usual advantage of LBYL that it is slightly more concise:

body = header = None
for encoding in ('ascii', 'latin1', 'utf-8'):
     if text.encodable(encoding):
         body = text.encode(encoding)
         header = encoding


instead of:

body = header = None
for encoding in ('ascii', 'latin1', 'utf-8'):
     try:
         body = text.encode(encoding)
         header = encoding
     except UnicodeEncodeError:
         pass


As for as expressibility goes, it is not much of an advantage. But:

- if there are optimizations that apply to some encodings but not others,
   the encodable method can take advantage of them without it being a
   promise of the language;

- it only adds a single string method (and presumably a single bytes
   method, decodable) rather than a plethora of methods;


So, I don't care much either way for a LBYL test, but if there is a good use 
case for such a test, better for it to be a single method taking the encoding 
name rather than a multitude of tests, or exposing an implementation-specific 
value that the coder then has to interpret themselves.

-1 on isascii, islatin1, isbmp
-1 on exposing max_code_point
+0.5 on encodable




-- 
Steven



More information about the Python-ideas mailing list