hex dump w/ or w/out utf-8 chars

Chris Angelico rosuav at gmail.com
Thu Jul 11 15:32:00 CEST 2013

On Thu, Jul 11, 2013 at 11:18 PM,  <wxjmfauth at gmail.com> wrote:
> Just to stick with this funny character ẞ, a ucs-2 char
> in the Flexible String Representation nomenclature.
> It seems to me that, when one needs more than ten bytes
> to encode it,
>>>> sys.getsizeof('a')
> 26
>>>> sys.getsizeof('ẞ')
> 40
> this is far away from the perfection.

Better comparison is to see how much space is used by one copy of it,
and how much by two copies:

>>> sys.getsizeof('aa')-sys.getsizeof('a')
>>> sys.getsizeof('ẞẞ')-sys.getsizeof('ẞ')

String objects have overhead. Big deal.

> BTW, for a modern language, is not ucs2 considered
> as obsolete since many, many years?

Clearly. And similarly, the 16-bit integer has been completely
obsoleted, as there is no reason anyone should ever bother to use it.
Same with the float type - everyone uses double or better these days,


Nope, nobody uses small integers any more, they're clearly completely obsolete.


More information about the Python-list mailing list