What is happening with array.array('u') in Python 4?
Hi all, What will happen to array.array('u') in Python 4? It is deprecated right now. I remember reading about mutable strings somewhere, but I forgot, and I can't find the discussion. In any case, I need to have a mutable character array, for efficient manipulations. (Not a byte array.) And I need to be able to use the "re" module to search through it. array.array('u') works great in Python 3. Will we still have something like this in Python 4? Jonathan
Jonathan Slenders schrieb am 08.05.2015 um 14:16:
What will happen to array.array('u') in Python 4? It is deprecated right now. I remember reading about mutable strings somewhere, but I forgot, and I can't find the discussion.
In any case, I need to have a mutable character array, for efficient manipulations. (Not a byte array.) And I need to be able to use the "re" module to search through it. array.array('u') works great in Python 3.
Well, for some value of "great" and "works". The problems are that 1) 'u' has a platform dependent size of 16 or 32 bits and 2) it does not match the internal representation of unicode strings. It will thus use surrogate pairs on some platforms and not on others, and converting between Unicode strings and arrays requires an encoding/decoding step. And it also does not seem like the "re" module currently supports searching in unicode arrays (everything else would have been very surprising). ISTM that your best bet is currently to look for a suitable module on PyPI that implements mutable character arrays. I'm sure you're not the only one who needs something like that. The usual suspect would be NumPy, but there may be smaller and simpler tools available. Stefan
On Fri, May 8, 2015 at 5:50 AM, Stefan Behnel
ISTM that your best bet is currently to look for a suitable module on PyPI that implements mutable character arrays. I'm sure you're not the only one who needs something like that. The usual suspect would be NumPy, but there may be smaller and simpler tools available.
Numpy does have mutable character arrays -- and the Unicode version uses 4bytes per char, regardless of platform (and so should array.array!) But I don't think you get much of any of the features of strings, and I doubt that the re module would work with it. A "real" mutable string type might be pretty nice to have , but I think it would be pretty hard to d to get it to do everything a string can do. (or maybe not -- I suppose you could cut and paste the regular string cdce, and simply add the mutable part....) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Thanks a lot,
So, apparently it is possible to use a re bytes pattern to search through
array.array('u') and it works as well for numpy.chararray.
However, I suppose that for doing this you need to have knowledge of the
internal encoding, because re.search will actually compare bytes (from the
pattern) to unicode chars (from the array). So, the bytes have to be
utf32-encoded strings, I suppose.
Currently I have not enough knowledge of how Python strings are
implemented. I'm convinced that it's a good thing to have mutable strings,
but I guess it could indeed be hard to implement.
Cheers,
Jonathan
2015-05-08 22:46 GMT+02:00 Chris Barker
On Fri, May 8, 2015 at 5:50 AM, Stefan Behnel
wrote: ISTM that your best bet is currently to look for a suitable module on PyPI that implements mutable character arrays. I'm sure you're not the only one who needs something like that. The usual suspect would be NumPy, but there may be smaller and simpler tools available.
Numpy does have mutable character arrays -- and the Unicode version uses 4bytes per char, regardless of platform (and so should array.array!)
But I don't think you get much of any of the features of strings, and I doubt that the re module would work with it.
A "real" mutable string type might be pretty nice to have , but I think it would be pretty hard to d to get it to do everything a string can do. (or maybe not -- I suppose you could cut and paste the regular string cdce, and simply add the mutable part....)
-Chris
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
participants (3)
-
Chris Barker
-
Jonathan Slenders
-
Stefan Behnel