[Numpy-discussion] numpy sort is not working
Jadhav, Alok
alok.jadhav at credit-suisse.com
Tue Sep 11 03:03:23 EDT 2012
Anthony, Travis,
I understand how an example that generates the data would be useful but
it will be difficult to provide this in my case for following reasons
- Data is read from hdf5 files (>50 MB)
- Attaching the code which is used to generate the data. It may
provide some more light.
- numpy.sort() example on the site for small array works fine. So
my guess is that only my data is having issue.
- Fyi, I am on a windows 7 machine with python 2.6 and numpy 1.6.2
for j in range(0,len(rics)):
ric=rics[j]
trd=h5.getTrades(ric)
idx=np.ones(len(trd))*j
opened=np.zeros(len(trd))
time=trd[0:,0]
trdp1=trd[0:,1]
trdp0=np.insert(trd[1:,1], 0, basePx[ric])
dt=trd[0:,0]- np.insert(trd[0:-1,0], 0, trd[0,0])
value=trd[0:,1]*trd[0:,2]
ricData=np.array([idx,opened,time,trdp1,trdp0,dt,value])
ricData=np.transpose(ricData)
if allRics is None:
allRics=ricData
else:
allRics=np.vstack((allRics, ricData))
allRics.dtype=[('idx', np.float), ('opened', np.float),
('time',
np.float),('trdp1',np.float),('trdp0',np.float),('dt',np.float),('value'
,np.float)]
allRics=np.sort(allRics,order='time') # This doesn't work
Please notice that I am using vstack to generate the array.
I just found out that I am able to sort numpy array before I set the
dtype in following crude way
allRics=allRics[allRics[:,2].argsort()] # This works
I am able to continue with my code right now but not sure why structured
array could not be sorted.
Hi Travis,
Very Strange. I am on version 1.6.2 L
What could I be missing. I started using numpy quite recently. Is there
a way to share the data with you?
Hi Alok,
Typically, a self-contained example which reproduces the error and
generates its own data is more useful / valuable than sharing datasets.
if you could come up with this, that'd be great!
Be Well
Anthony
Hey Alok,
This is worth taking a look. What version of NumPy are you
using?
It is not related directly to the issue you referenced as that
was an endian-ness issue and your data is native-order.
Your example seems to work for me (with a simulated case on
1.6.1)
Hi everyone,
I have a numpy array of dimensions
>>> allRics.shape
(583760, 1)
To sort the array, I set the dtype of the array as follows:
allRics.dtype = [('idx', np.float), ('opened', np.float),
('time',
np.float),('trdp1',np.float),('trdp0',np.float),('dt',np.float),('value'
,np.float)]
>>> allRics.dtype
dtype([('idx', '<f8'), ('opened', '<f8'), ('time', '<f8'),
('trdp1', '<f8'), ('trdp0', '<f8'), ('dt', '<f8'), ('value', '<f8')])
I checked and the endianness in dtype is correct.
When I sort the array, the output array of sort is same as
original array without any change. I want to sort the allRics numpy
array on 'time'. I do the following.
>>> x=np.sort(allRics,order='time')
>>> x[17330:17350]['time']
array([[ 61184.4 ],
[ 61188.51 ],
[ 61188.979],
[ 61188.979],
[ 61189.989],
[ 61191.66 ],
[ 61194.35 ],
[ 61194.35 ],
[ 61198.79 ],
[ 61198.145],
[ 36126.217],
[ 36126.217],
[ 36126.218],
[ 36126.218],
[ 36126.219],
[ 36126.271],
[ 36126.271],
[ 36126.271],
[ 36126.293],
[ 36126.293]])
Time column doesn't change its order. Could someone please
advise what is missing here? Is this related to the bug
http://www.mail-archive.com/numpy-discussion@scipy.org/msg23060.html
(from 2010).
