chunking a long string?
steve+comp.lang.python at pearwood.info
Sat Nov 9 01:46:32 CET 2013
On Fri, 08 Nov 2013 12:43:43 -0800, wxjmfauth wrote:
> "(say, 1 kbyte each)": one "kilo" of characters or bytes?
> Glad to read some users are still living in an ascii world, at the
> "Unicode time" where an encoded code point size may vary between 1-4
> Oops, sorry, I'm wrong,
That part is true.
> it can be much more.
That part is false. You're measuring the overhead of the object
structure, not the per-character storage. This has been the case going
back since at least Python 2.2: strings are objects, and have overhead.
27 bytes for two characters! Except it isn't, it's actually 25 bytes for
the object header and two bytes for the two characters.
And here you have four bytes each for the two characters and a 40 byte
py> c = '\U0001d11e'
py> sys.getsizeof(2*c) - sys.getsizeof(c)
py> sys.getsizeof(1000*c) - sys.getsizeof(999*c)
How big is the object overhead on a (say) thousand character string? Just
py> (sys.getsizeof(1000*c) - 4000)/4000
More information about the Python-list