[CentralOH] N-dimensional array slicing/crosscutting
Andrew Kubera
andrewkubera at gmail.com
Thu Dec 8 23:00:06 EST 2016
This is exactly the problem that pandas DataFrames solve.
Initialization may be done in multiple ways:
>>> import pandas as pd
>>> table = pd.DataFrame(
{'forecast': {'occurred': 0, 'not_occurred': 0},
'not_forecast': {'occurred': 0, 'not_occurred': 0}})
>>> table = pd.DataFrame([[0, 0], [0, 0]],
columns=('forecast', 'not_forecast'),
index=('occurred', 'not_occurred’))
>>> table
forecast not_forecast
not_occurred 0 0
occurred 0 0
It supports attribute or item syntax
>>> table.forecast is table['forecast']
True
>>> table.forecast
not occurred 0
occurred 0
Name: forecast, dtype: int64
At this point ‘occurred’/‘not_occurred’ are the index, not an addressable column.
Query them by taking transpose (.T) of table:
>>> table.T
not_occurred occurred
forecast 0 0
not_forecast 0 0
>>> table.T.not_occurred
forecast 0
not_forecast 0
Name: not_occurred, dtype: int64
There is also the stack and unstack methods which linearizes the matrix, either
bringing ‘occurred’ up to columns or ‘forecast’ down to index.
You can create higher-dimentional frames with a MultiIndex for double columns
of forecast+occurred, and an index like time or location.
Good Luck
- Andrew Kubera
> On Dec 8, 2016, at 9:19 PM, Eric Floehr <eric at intellovations.com> wrote:
>
> I was wondering if anyone knows of any way in Python (functional or not), or a module, that would allow the following:
>
> Let's say I have an n-dimensional array, with keys (numeric or otherwise) for each dimension. I'd like to be able to set up and grab the elements in any row in any dimension equally easily.
>
> Here's a simple example, a 2x2 contingency table, with rows labelled "forecast" and "not forecast" and columns labelled "occurred" and "not occurred". Within each of the 4 cells, there is some value.
>
> If I set it up traditionally, as a dict within the dict:
>
> table = {}
> table['forecast'] = {'occurred': 0, 'not occurred': 0}
> table['not forecast'] = {'occurred': 0, 'not occurred': 0}
>
> and access it traditionally:
>
> table['forecast']['occurred'] = 123
>
> The problems are:
>
> 1. Set up is hard... I have to duplicate keys, and in an n-dimensional array, this is a real pain.
>
> 2. I always have to access a cell in the right order, and have to remember that order. It would be great if I could access a cell in the example above as table['forecast']['occurred'] or table['occurred']['forecast'] (syntax doesn't matter, table('forecast', 'occurred') is fine too).
>
> 3. It's hard to slice... in a 2-dimensional array, it's easy to get the cells in the outermost dict via table['forecast'].values() but how would I just as easily (equivalently) get table['not occurred'].values()?
>
> Thoughts? Am I missing something obvious?
>
> _______________________________________________
> CentralOH mailing list
> CentralOH at python.org
> https://mail.python.org/mailman/listinfo/centraloh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/centraloh/attachments/20161208/ec03246c/attachment-0001.html>
More information about the CentralOH
mailing list