xhtml encoding question

Peter Otten __peter__ at web.de
Thu Feb 2 06:02:21 EST 2012


Ulrich Eckhardt wrote:

> Am 01.02.2012 10:32, schrieb Peter Otten:
>> It doesn't matter for the OP (see Stefan Behnel's post), but If you want
>> to replace characters in a unicode string the best way is probably the
>> translate() method:
>>
>>>>> print u"\xa9\u2122"
>> ©™
>>>>> u"\xa9\u2122".translate({0xa9: u"©", 0x2122: u"™"})
>> u'©™'
>>
> 
> Yes, this is both more expressive and at the same time probably even
> more efficient.
> 
> 
> Question though:
> 
>  >>> u'abc'.translate({u'a': u'A'})
> u'abc'
> 
> I would call this a chance to improve Python. According to the
> documentation, using a string is invalid, but it neither raises an
> exception nor does it do the obvious and accept single-character strings
> as keys.
> 
> 
> Thoughts?

How could this raise an exception? You'd either need a typed dictionary (int 
--> unicode) or translate() would have to verify that all keys are indeed 
integers. The former would go against the grain of Python, the latter would 
make the method less flexible as the set of keys currently need not be 
predefined:

>>> class A(object):
...     def __getitem__(self, key):
...             return unichr(key).upper()
...
>>> u"alpha".translate(A())
u'ALPHA'

Using unicode instead of integer keys would be nice but breaks backwards 
compatibility, using both could double the number of dictionary lookups.




More information about the Python-list mailing list