[CentralOH] N-dimensional array slicing/crosscutting

Thu Dec 8 23:48:31 EST 2016

On Thu, 8 Dec 2016 21:19:23 -0500
Eric Floehr <eric at intellovations.com> wrote:
> I was wondering if anyone knows of any way in Python (functional or not),
> or a module, that would allow the following:
> 
> Let's say I have an n-dimensional array, with keys (numeric or otherwise)
> for each dimension. I'd like to be able to set up and grab the elements in
> any row in any dimension equally easily.
> 
> Here's a simple example, a 2x2 contingency table, with rows labelled
> "forecast" and "not forecast" and columns labelled "occurred" and "not
> occurred". Within each of the 4 cells, there is some value.
> 
> If I set it up traditionally, as a dict within the dict:
> 
> table = {}
> table['forecast'] = {'occurred': 0, 'not occurred': 0}
> table['not forecast'] = {'occurred': 0, 'not occurred': 0}
> 
> and access it traditionally:
> 
> table['forecast']['occurred'] = 123

I've had similar issues eg when pulling data from persisted JSON
documents.  Transforming it into relational data has always worked
out well.

Row = namedtuple('Row', 'forecast occurred count')
table = [
   Row(True, True, 0),
   Row(True, False, 0),
   Row(False, True, 0),
   Row(False, False, 0),
]

def select(**key_value):
   s = table[:]
   for key, value in key_value.items():
      s = [ row for row in s
            if getattr(row, key) == value ]
   return s

- Wrap the table list in a class.
- Add insert, update, delete.  Note that count is not a key column.
- Use the standard library bisect.bisect_right to keep the rows
  sorted and speed up lookups and modifications.
- The internal table array doesn't need to contain tuples, especially
  if the count column is updated frequently.  Still nice to return or
  yield tuples for named, immutable key columns.
- Add convenience methods for non-trivial queries.  A little functional
  programming with the internal rows can accomplish a lot.

Also consider SQL or your favorite ORM, possibly running with an in-
memory DB.

> The problems are:
> 
> 1. Set up is hard... I have to duplicate keys, and in an n-dimensional
> array, this is a real pain.
> 
> 2. I always have to access a cell in the right order, and have to remember
> that order. It would be great if I could access a cell in the example above
> as table['forecast']['occurred'] or table['occurred']['forecast'] (syntax
> doesn't matter, table('forecast', 'occurred') is fine too).
> 
> 3. It's hard to slice... in a 2-dimensional array, it's easy to get the
> cells in the outermost dict via table['forecast'].values() but how would I
> just as easily (equivalently) get table['not occurred'].values()?
> 
> Thoughts? Am I missing something obvious?

Tuples as keys is valid syntax, but only addresses (1.):

	table['forecast', 'not occurred']

Maybe if you're really adventurous:

	table[frozenset(['not occurred', 'forecast'])]