How to waste computer memory?
BartC
bc at freeuk.com
Sat Mar 19 08:24:33 EDT 2016
On 19/03/2016 11:07, Marko Rauhamaa wrote:
> Chris Angelico <rosuav at gmail.com>:
>
>> On Sat, Mar 19, 2016 at 8:31 PM, Marko Rauhamaa <marko at pacujo.net> wrote:
>>> Unicode made several (understandable but grave) mistakes along the way:
>>>
>>> * normalization
>>
>> Elaborate please? What's such a big mistake here?
>
> Unicode shouldn't have allowed multiple equivalent variants for a
> string.
>
> Now Python falls victim to:
>
> >>> '\u006e\u0303' == '\u00f1'
> False
>
> <URL: https://en.wikipedia.org/wiki/Unicode_equivalence>:
>
> For example, the code point U+006E (the Latin lowercase "n") followed
> by U+0303 (the combining tilde "◌̃") is defined by Unicode to be
> canonically equivalent to the single code point U+00F1 (the lowercase
> letter "ñ" of the Spanish alphabet). Therefore, those sequences
> should be displayed in the same manner, should be treated in the same
> way by applications such as alphabetizing names or searching, and may
> be substituted for each other.
>
So a string that looks like:
"ññññññññññññññññññññññññññññññññññññññññññññññññññ"
can have 2**50 different representations? And occupy somewhere between
50 and 200 bytes? Or is that 400?
OK...
--
Bartc
More information about the Python-list
mailing list