Unicode 7
MRAB
python at mrabarnett.plus.com
Tue Apr 29 14:12:43 EDT 2014
On 2014-04-29 18:37, wxjmfauth at gmail.com wrote:
> Let see how Python is ready for the next Unicode version
> (Unicode 7.0.0.Beta).
>
>
>>>> timeit.repeat("(x*1000 + y)[:-1]", setup="x = 'abc'; y = 'z'")
> [1.4027834829454946, 1.38714224331963, 1.3822586635296261]
>>>> timeit.repeat("(x*1000 + y)[:-1]", setup="x = 'abc'; y = '\u0fce'")
> [5.462776291480395, 5.4479432055423445, 5.447874284053398]
>>>>
>>>>
>>>> # more interesting
>>>> timeit.repeat("(x*1000 + y)[:-1]",\
> ... setup="x = 'abc'.encode('utf-8'); y = '\u0fce'.encode('utf-8')")
> [1.3496489533188765, 1.328654286266783, 1.3300913977710707]
>>>>
>
Although the third example is the fastest, it's also the wrong way to
handle Unicode:
>>> x = 'abc'.encode('utf-8'); y = '\u0fce'.encode('utf-8')
>>> t = (x*1000 + y)[:-1].decode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position
3000-3001: unex
pected end of data
> Note 1: "lookup" is not the problem.
>
> Note 2: From Unicode.org : "[...] We strongly encourage [...] and test
> them with their programs [...]"
>
> -> Done.
>
> jmf
>
More information about the Python-list
mailing list