RE Module Performance

Chris Angelico rosuav at
Thu Jul 25 21:18:44 CEST 2013

On Fri, Jul 26, 2013 at 5:07 AM,  <wxjmfauth at> wrote:
> Let start with a simple string \textemdash or \texttendash
>>>> sys.getsizeof('–')
> 40
>>>> sys.getsizeof('a')
> 26

Most of the cost is in those two apostrophes, look:

>>> sys.getsizeof('a')
>>> sys.getsizeof(a)

Okay, that's slightly unfair (bonus points: figure out what I did to
make this work; there are at least two right answers) but still, look
at what an empty string costs:

>>> sys.getsizeof('')

Or look at the difference between one of these characters and two:

>>> sys.getsizeof('aa')-sys.getsizeof('a')
>>> sys.getsizeof('––')-sys.getsizeof('–')

That's what the characters really cost. The overhead is fixed. It is,
in fact, almost completely insignificant. The storage requirement for
a non-ASCII, BMP-only string converges to two bytes per character.


More information about the Python-list mailing list