hex dump w/ or w/out utf-8 chars

Chris Angelico rosuav at gmail.com
Thu Jul 11 15:32:00 CEST 2013


On Thu, Jul 11, 2013 at 11:18 PM,  <wxjmfauth at gmail.com> wrote:
> Just to stick with this funny character ẞ, a ucs-2 char
> in the Flexible String Representation nomenclature.
>
> It seems to me that, when one needs more than ten bytes
> to encode it,
>
>>>> sys.getsizeof('a')
> 26
>>>> sys.getsizeof('ẞ')
> 40
>
> this is far away from the perfection.

Better comparison is to see how much space is used by one copy of it,
and how much by two copies:

>>> sys.getsizeof('aa')-sys.getsizeof('a')
1
>>> sys.getsizeof('ẞẞ')-sys.getsizeof('ẞ')
2

String objects have overhead. Big deal.

> BTW, for a modern language, is not ucs2 considered
> as obsolete since many, many years?

Clearly. And similarly, the 16-bit integer has been completely
obsoleted, as there is no reason anyone should ever bother to use it.
Same with the float type - everyone uses double or better these days,
right?

http://www.postgresql.org/docs/current/static/datatype-numeric.html
http://www.cplusplus.com/doc/tutorial/variables/

Nope, nobody uses small integers any more, they're clearly completely obsolete.

ChrisA



More information about the Python-list mailing list