[CentralOH] N-dimensional array slicing/crosscutting

Andrew Kubera andrewkubera at gmail.com
Thu Dec 8 23:00:06 EST 2016


This is exactly the problem that pandas DataFrames solve.


Initialization may be done in multiple ways:

>>> import pandas as pd


>>> table = pd.DataFrame(
        {'forecast': {'occurred': 0, 'not_occurred': 0},
         'not_forecast': {'occurred': 0, 'not_occurred': 0}})

>>> table = pd.DataFrame([[0, 0], [0, 0]],
                         columns=('forecast', 'not_forecast'),
                         index=('occurred', 'not_occurred’))
>>> table
forecast	not_forecast
not_occurred	0	0
occurred	0	0


It supports attribute or item syntax

>>> table.forecast is table['forecast']
True
>>> table.forecast
not occurred    0
occurred        0
Name: forecast, dtype: int64

At this point ‘occurred’/‘not_occurred’ are the index, not an addressable column.
Query them by taking transpose (.T) of table:

>>> table.T
not_occurred	occurred
forecast	0	0
not_forecast	0	0

>>> table.T.not_occurred
forecast        0
not_forecast    0
Name: not_occurred, dtype: int64
There is also the stack and unstack methods which linearizes the matrix, either
bringing ‘occurred’ up to columns or ‘forecast’ down to index.

You can create higher-dimentional frames with a MultiIndex for double columns
of forecast+occurred, and an index like time or location.

Good Luck

- Andrew Kubera


> On Dec 8, 2016, at 9:19 PM, Eric Floehr <eric at intellovations.com> wrote:
> 
> I was wondering if anyone knows of any way in Python (functional or not), or a module, that would allow the following:
> 
> Let's say I have an n-dimensional array, with keys (numeric or otherwise) for each dimension. I'd like to be able to set up and grab the elements in any row in any dimension equally easily.
> 
> Here's a simple example, a 2x2 contingency table, with rows labelled "forecast" and "not forecast" and columns labelled "occurred" and "not occurred". Within each of the 4 cells, there is some value.
> 
> If I set it up traditionally, as a dict within the dict:
> 
> table = {}
> table['forecast'] = {'occurred': 0, 'not occurred': 0}
> table['not forecast'] = {'occurred': 0, 'not occurred': 0}
> 
> and access it traditionally:
> 
> table['forecast']['occurred'] = 123
> 
> The problems are:
> 
> 1. Set up is hard... I have to duplicate keys, and in an n-dimensional array, this is a real pain.
> 
> 2. I always have to access a cell in the right order, and have to remember that order. It would be great if I could access a cell in the example above as table['forecast']['occurred'] or table['occurred']['forecast'] (syntax doesn't matter, table('forecast', 'occurred') is fine too).
> 
> 3. It's hard to slice... in a 2-dimensional array, it's easy to get the cells in the outermost dict via table['forecast'].values() but how would I just as easily (equivalently) get table['not occurred'].values()?
> 
> Thoughts? Am I missing something obvious?
> 
> _______________________________________________
> CentralOH mailing list
> CentralOH at python.org
> https://mail.python.org/mailman/listinfo/centraloh

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/centraloh/attachments/20161208/ec03246c/attachment-0001.html>


More information about the CentralOH mailing list