String changing size on failure?
Grant Edwards
grant.b.edwards at gmail.com
Wed Nov 1 16:39:43 EDT 2017
On 2017-11-01, Ned Batchelder <ned at nedbatchelder.com> wrote:
> On 11/1/17 4:17 PM, MRAB wrote:
>> On 2017-11-01 19:26, Ned Batchelder wrote:
>>> From David Beazley
>>> (https://twitter.com/dabeaz/status/925787482515533830):
>>>
>>> >>> a = 'n'
>>> >>> b = 'ñ'
>>> >>> sys.getsizeof(a)
>>> 50
>>> >>> sys.getsizeof(b)
>>> 74
>>> >>> float(b)
>>> Traceback (most recent call last):
>>> File "<stdin>", line 1, in <module>
>>> ValueError: could not convert string to float: 'ñ'
>>> >>> sys.getsizeof(b)
>>> 77
>>>
>>> Huh?
>>>
>> It's all explained in PEP 393.
>>
>> It's creating an additional representation (UTF-8 + zero-byte
>> terminator) of the value and is caching that, so there'll then be the
>> bytes for 'ñ' and the bytes for the UTF-8 (0xC3 0xB1 0x00).
>>
>> When the string is ASCII, the bytes of the UTF-8 representation is
>> identical to those or the original string, so it can share them.
>
> That explains why b is larger than a to begin with
No, that size difference is due to the additional bytes required for
the internal representation of the string.
> but it doesn't explain why float(b) is changing the size of b.
The additional UTF-8 representation isn't being created and cached
until the float() call is made.
--
Grant Edwards grant.b.edwards Yow! ONE LIFE TO LIVE for
at ALL MY CHILDREN in ANOTHER
gmail.com WORLD all THE DAYS OF
OUR LIVES.
More information about the Python-list
mailing list