Getting an array of indexes from a unicode string
Shane Holloway
shane.holloway at ieee.org
Fri Nov 3 11:27:30 EST 2006
I'm looking for a better way to map the characters of a unicode
string to indexes into an array of geometry. The following code is
functional, but it seems sub-optimal with all that numpy has to offer::
textOrds = map(ord, text.encode('utf-8'))
idx = indexMap[textOrds]
textGeo = geometry[idx]
text is a simple python string coming in. I then manually covert it
to unicode ordinals. Those are then mapped through indexMap, which
happens to be a 1-to-1 mapping between unicode ordinals and valid
indexes into geometry. I then use the idx array to take a selection
from geometry for the text.
As I mentioned before, this works alright, however two things seem
inefficient. First is the manual mapping to unicode ordinals. Is
there a way to have numpy do that for me? Secondly is the mapping
through indexMap, because it is only sparsely populated -- usually
only a 2-5 thousand entries out of the 64 thousand allocated. I've
thought of using unicode.translate, but characters cannot be used for
indexes in numpy.
What are your collective thoughts on making this cleaner and more
efficient?
Thanks,
-Shane Holloway
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
More information about the NumPy-Discussion
mailing list