[Python-Dev] Replacement for array.array('u')?
Steven D'Aprano
steve at pearwood.info
Fri Mar 22 05:24:23 EDT 2019
On Fri, Mar 22, 2019 at 08:31:33PM +1300, Greg Ewing wrote:
> A poster on comp.lang.python is asking about array.array('u').
> He wants an efficient mutable collection of unicode characters
> that can be initialised from a string.
>
> According to the docs, the 'u' code is deprecated and will be
> removed in 4.0, but no alternative is suggested.
>
> Why is this being deprecated, instead of keeping it and making
> it always 32 bits? It seems like useful functionality that can't
> be easily obtained another way.
I can't answer any of those questions, but perhaps the poster can do
this instead:
py> a = array('L', 'ℍℰâѵÿ Ϻεταł'.encode('utf-32be'))
py> a
array('L', [220266496, 807469056, 3791650816, 1963196416, 4278190080,
536870912, 4194500608, 3036872704, 3288530944, 2969763840, 1107361792])
Getting the string out again is no harder:
py> bytes(a).decode('utf-32be')
'ℍℰâѵÿ Ϻεταł'
But having said that, it would be nice to have an array code which
treated the values as single UTF-32 characters:
array('?', ['ℍ', 'ℰ', 'â', 'ѵ', 'ÿ', ' ', 'Ϻ', 'ε', 'τ', 'α', 'ł'])
if for no other reason than it looks nicer than a bunch of 32 bit ints.
--
Steven
More information about the Python-Dev
mailing list