[Numpy-discussion] Outer join ?

bernhard.voigt at gmail.com bernhard.voigt at gmail.com
Thu Feb 12 09:20:03 EST 2009


You might consider the groupby from the itertools module.

Do you have two keys only? I would prefer grouping on the first
column. For grouby you need to sort the array after the first column
then.

from itertools import groupby
a.sort(order='col1')

# target array: first col are unique dates, second col values for
key1, third col values for key2
data = numpy.zeros(len(unique(a['col1'])), dtype=dict(names=['dates',
'key1', 'key2'] , types=[long, float, float]))

for i, (date, items) in enumerate(groupby(a, lambda item: item
['col1'])):
    data[i][dates] = date
    for col1, col2, col3 in items:
        data[i][col2] = col3

Hope this works! Bernhard

On Feb 12, 6:24 am, A B <python6... at gmail.com> wrote:
> Hi,
>
> I have the following data structure:
>
> col1 | col2 | col3
>
> 20080101|key1|4
> 20080201|key1|6
> 20080301|key1|5
> 20080301|key2|3.4
> 20080601|key2|5.6
>
> For each key in the second column, I would like to create an array
> where for all unique values in the first column, there will be either
> a value or zero if there is no data available. Like so:
>
> # 20080101, 20080201, 20080301, 20080601
>
> key1 - 4, 6, 5,    0
> key2 - 0, 0, 3.4, 5.6
>
> Ideally, the results would end up in a 2d array.
>
> What's the most efficient way to accomplish this? Currently, I am
> getting a list of uniq col1 and col2 values into separate variables,
> then looping through each unique value in col2
>
> a = loadtxt(...)
>
> dates = unique(a[:]['col1'])
> keys = unique(a[:]['col2'])
>
> for key in keys:
>     b = a[where(a[:]['col2'] == key)]
>     ???
>
> Thanks in advance.
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discuss... at scipy.orghttp://projects.scipy.org/mailman/listinfo/numpy-discussion



More information about the NumPy-Discussion mailing list