xhtml encoding question
Ulrich Eckhardt
ulrich.eckhardt at dominolaser.com
Thu Feb 2 07:40:22 EST 2012
Am 02.02.2012 12:02, schrieb Peter Otten:
> Ulrich Eckhardt wrote:
>>
>> >>> u'abc'.translate({u'a': u'A'})
>> u'abc'
>>
>> I would call this a chance to improve Python. According to the
>> documentation, using a string [as key] is invalid, but it neither raises
>> an exception nor does it do the obvious and accept single-character
>> strings as keys.
>>
>>
>> Thoughts?
>
> How could this raise an exception? You'd either need a typed dictionary (int
> --> unicode) or translate() would have to verify that all keys are indeed
> integers.
The latter is exactly what I would have done, i.e. scan the dictionary
for invalid values, in the spirit of not letting errors pass unnoticed.
> The former would go against the grain of Python, the latter would
> make the method less flexible as the set of keys currently need not be
> predefined:
>
>>>> class A(object):
> ... def __getitem__(self, key):
> ... return unichr(key).upper()
> ...
>>>> u"alpha".translate(A())
> u'ALPHA'
Working with __getitem__ is a point. I'm not sure if it is reasonable to
expect this to work though. I'm -0 on that. I could also imagine a
completely separate path for iterable and non-iterable mappings.
> Using unicode instead of integer keys would be nice but breaks backwards
> compatibility, using both could double the number of dictionary lookups.
Dictionary lookups are constant time and well-optimized, so I'd actually
go for allowing both and paying that price. I could even imagine
preprocessing the supplied dictionary while checking for invalid values.
The result could be a structure that makes use of the fact that Unicode
codepoints are < 22 bits and that makes the way from the elements of the
source sequence to the according map entry as short as possible (I'm not
sure if using codepoints or single-character strings is faster).
However, those are early optimizations of which I'm not sure if they are
worth it.
Anyway, thanks for your thoughts, they are always appreciated!
Uli
More information about the Python-list
mailing list