xhtml encoding question

Ulrich Eckhardt ulrich.eckhardt at dominolaser.com
Thu Feb 2 13:40:22 CET 2012


Am 02.02.2012 12:02, schrieb Peter Otten:
> Ulrich Eckhardt wrote:
>>
>>   >>>  u'abc'.translate({u'a': u'A'})
>> u'abc'
>>
>> I would call this a chance to improve Python. According to the
>> documentation, using a string [as key] is invalid, but it neither raises
>> an exception nor does it do the obvious and accept single-character
>> strings as keys.
>>
>>
>> Thoughts?
>
> How could this raise an exception? You'd either need a typed dictionary (int
> -->  unicode) or translate() would have to verify that all keys are indeed
> integers.

The latter is exactly what I would have done, i.e. scan the dictionary 
for invalid values, in the spirit of not letting errors pass unnoticed.


> The former would go against the grain of Python, the latter would
> make the method less flexible as the set of keys currently need not be
> predefined:
>
>>>> class A(object):
> ...     def __getitem__(self, key):
> ...             return unichr(key).upper()
> ...
>>>> u"alpha".translate(A())
> u'ALPHA'

Working with __getitem__ is a point. I'm not sure if it is reasonable to 
expect this to work though. I'm -0 on that. I could also imagine a 
completely separate path for iterable and non-iterable mappings.


> Using unicode instead of integer keys would be nice but breaks backwards
> compatibility, using both could double the number of dictionary lookups.

Dictionary lookups are constant time and well-optimized, so I'd actually 
go for allowing both and paying that price. I could even imagine 
preprocessing the supplied dictionary while checking for invalid values. 
The result could be a structure that makes use of the fact that Unicode 
codepoints are < 22 bits and that makes the way from the elements of the 
source sequence to the according map entry as short as possible (I'm not 
sure if using codepoints or single-character strings is faster). 
However, those are early optimizations of which I'm not sure if they are 
worth it.

Anyway, thanks for your thoughts, they are always appreciated!

Uli




More information about the Python-list mailing list