[Python-ideas] isascii()/islatin1()/isbmp()
Steven D'Aprano
steve at pearwood.info
Sun Jul 1 05:21:32 CEST 2012
Terry Reedy wrote:
> On 6/30/2012 8:59 PM, Steven D'Aprano wrote:
>> I suggest that a better API would be a method that takes the name of an
>> encoding (perhaps defaulting to 'ascii') and returns True|False:
>>
>> string.encodable(encoding='ascii') -> True|False
>>
>> Return True if string can be encoded using the named encoding, otherwise
>> False.
>
> But then one might as well try the encoding and check for exception. The
> point of the proposal is to avoid things like
>
> try:
> body = text.encode('ascii')
> header = 'ascii' #abbreviating here
> except UnicodeEncodeError:
> try:
> body = text.encode('latin1')
> header = 'latin1'
> except UnicodeEncodeError:
> body = text.encode('utf-8')
> header = 'utf-8'
Right. And re-written with the hypothetical encodable method, you have the
usual advantage of LBYL that it is slightly more concise:
body = header = None
for encoding in ('ascii', 'latin1', 'utf-8'):
if text.encodable(encoding):
body = text.encode(encoding)
header = encoding
instead of:
body = header = None
for encoding in ('ascii', 'latin1', 'utf-8'):
try:
body = text.encode(encoding)
header = encoding
except UnicodeEncodeError:
pass
As for as expressibility goes, it is not much of an advantage. But:
- if there are optimizations that apply to some encodings but not others,
the encodable method can take advantage of them without it being a
promise of the language;
- it only adds a single string method (and presumably a single bytes
method, decodable) rather than a plethora of methods;
So, I don't care much either way for a LBYL test, but if there is a good use
case for such a test, better for it to be a single method taking the encoding
name rather than a multitude of tests, or exposing an implementation-specific
value that the coder then has to interpret themselves.
-1 on isascii, islatin1, isbmp
-1 on exposing max_code_point
+0.5 on encodable
--
Steven
More information about the Python-ideas
mailing list