[Tutor] Extract several arrays from a large 2D array

Fri Jan 22 05:03:43 EST 2016

Ek Esawi wrote:

> Thank you all for your help. I am a decent programmer in another language
> but new to Python and I have some issues with a project I am working on.
> Some suggested using pandas but I am barley starting on Numpy. The
> suggestions were very helpful, however, I decided to replace the 2D array
> with several single arrays b/c the arrays are of different data types. I
> ran into another problems and am almost done but got stuck.
> 
> 
> 
> Part of my code is below. The question is how to put the variables for
> each j into a 14 by 6 array by a statement at the end of this code. I was
> hoping to get an array like this one below:
> 
>                      [2013 TT1 TT2 TT3 TT4 TT5 TT6]
> 
>                                                          [2012TT1 TT2 TT3
> TT4 TT5 TT6]
> 
>                                                          .
> 
>                                                          .
> 
>                                                          [1999TT1 TT2 TT3
> TT4 TT5 TT6]
> 
> for j in range(14)
> 
>     for i in range(200):

As a rule of thumb, if you are iterating over individual array entries in 
numpy you are doing something wrong ;)

> 
>             if TYear[i]==2013-j:
> 
>                 if TTrea[i]=='T1':
> 
>                     TT1+=TTemp[i]
> 
>                 elif TTrea[i]=='T2':
> 
>                     TT2+=TTemp[i]
> 
>                 elif TTrea[i]=='T3':
> 
>                     TT3+=TTemp[i]
> 
>                 elif TTrea[i]=='T4':
> 
>                     TT4+=TTemp[i]
> 
>                 elif TTrea[i]=='T5':
> 
>                     TT5+=TTemp[i]
> 
>                 elif TTrea[i]=='T6':
>                     TT6+=TTemp[i]

This looks like you are manually building a pivot table. If you don't want 
to use a spreadsheet (like Excel or Libre/OpenOffice Calc) you can do it 
with pandas, too:

import pandas
import numpy
df = pandas.DataFrame(
    [
        [2013, "T1", 42],
        [2013, "T2", 12],
        [2012, "T1", 1],
        [2012, "T1", 2],
        [2012, "T2", 10],
        [2012, "T3", 11],
        [2012, "T4", 12],
        ],
    columns=["year", "t", "v"])

print(
    pandas.pivot_table(df, cols="t", rows="year", aggfunc=numpy.sum)
)

Here's a generic Python solution for educational purposes. It uses nested 
dicts to model the table. The outer dict is used for the rows, the inner 
dicts for the cells in a row. An extra set keeps track of the column labels.

data = [
        [2013, "T1", 42],
        [2013, "T2", 12],
        [2012, "T1", 1],
        [2012, "T1", 2],
        [2012, "T2", 10],
        [2012, "T3", 11],
        [2012, "T4", 12],
]

table = {}
columns = set()
for year, t, v in data:
    columns.add(t)
    row = table.setdefault(year, {})
    row[t] = row.get(t, 0) + v

columns = sorted(columns)
print("    ", *["{:>5}".format(c) for c in  columns])
for year, row in sorted(table.items(), reverse=True):
    print(year, *["{:5}".format(row.get(c, 0)) for c in columns])