On Sat, Jan 11, 2014 at 04:28:34PM -0500, Terry Reedy wrote:
The problem with some criticisms of using 'unicode in Python 3' is that there really is no such thing. Unicode in 3.0 to 3.2 used the old internal model inherited from 2.x. Unicode in 3.3+ uses a different internal model that is a game changer with respect to certain issues of space and time efficiency (and cross-platform correctness and portability). So at least some the valid criticisms based on the old model are out of date and no longer valid.
While there are definitely performance savings (particularly of memory) regarding the FSR in Python 3.3, for the use-case we're talking about, Python 3.1 and 3.2 (and for that matter, 2.2 through 2.7) Unicode strings should be perfectly adequate. The textual data being used is ASCII, and the binary blobs are encoded to Latin-1, so everything is a subset of Unicode, namely U+0000 to U+00FF. That means there are no astral characters, and no behavioural differences between wide and narrow builds (apart from memory use). -- Steven