How is unicode implemented behind the scenes?
python at mrabarnett.plus.com
Sun Mar 9 03:40:28 CET 2014
On 2014-03-09 02:08, Dan Stromberg wrote:
> OK, I know that Unicode data is stored in an encoding on disk.
> But how is it stored in RAM?
> I realize I shouldn't write code that depends on any relevant
> implementation details, but knowing some of the more common
> implementation options would probably help build an intuition for
> what's going on internally.
> I've heard that characters are no longer all c bytes wide internally,
> so is it sometimes utf-8?
From Python 3.3, it's an array of 1, 2 or 4 bytes per codepoint.
In Python terms:
if all(c <= '\xFF' for c in string):
use 1 byte per codepoint
elif all(c <= '\xFFFF' for c in string):
use 2 bytes per codepoint
use 4 bytes per codepoint
More information about the Python-list