[CentralOH] N-dimensional array slicing/crosscutting

Eric Floehr eric at intellovations.com
Fri Dec 9 20:38:07 EST 2016


Thanks Jim, Andrew, Erik, and Neil!

I think that xarray looks to be the solution that matches most closely my
needs. It looks to be really powerful and fairly easy to use. I'll
definitely have to look at how it does out-of-core and parallel
processing... that could definitely prove to be useful.

Cheers,
Eric


On Fri, Dec 9, 2016 at 2:53 AM, Erik Welch <erik.n.welch at gmail.com> wrote:

> Eric, I whipped up an example with four dimensions using xarray
> <http://xarray.pydata.org/>.  Check it out:
>
> https://anaconda.org/eriknw/xarray-starting/notebook
>
> Nothing beats getting your hands dirty when learning, and I hope the above
> is enough to get you started.  I think you may like xarray given the data
> sets that (I think) you work with.  I like its spelling for simple
> operations, and advanced things are possible.  It even supports out-of-core
> and parallel processing, which is great when data doesn't fit in
> memory--don't you have a few image files sitting around? ;-)
>
> btw, I think SQL, pandas, and pure Python data structures and functions
> are all great too.  Good question and good responses!
>
> Cheers,
> Erik
>
> On Thu, Dec 8, 2016 at 10:48 PM, Neil Ludban <nludban at columbus.rr.com>
> wrote:
>
>> On Thu, 8 Dec 2016 21:19:23 -0500
>> Eric Floehr <eric at intellovations.com> wrote:
>> > I was wondering if anyone knows of any way in Python (functional or
>> not),
>> > or a module, that would allow the following:
>> >
>> > Let's say I have an n-dimensional array, with keys (numeric or
>> otherwise)
>> > for each dimension. I'd like to be able to set up and grab the elements
>> in
>> > any row in any dimension equally easily.
>> >
>> > Here's a simple example, a 2x2 contingency table, with rows labelled
>> > "forecast" and "not forecast" and columns labelled "occurred" and "not
>> > occurred". Within each of the 4 cells, there is some value.
>> >
>> > If I set it up traditionally, as a dict within the dict:
>> >
>> > table = {}
>> > table['forecast'] = {'occurred': 0, 'not occurred': 0}
>> > table['not forecast'] = {'occurred': 0, 'not occurred': 0}
>> >
>> > and access it traditionally:
>> >
>> > table['forecast']['occurred'] = 123
>>
>> I've had similar issues eg when pulling data from persisted JSON
>> documents.  Transforming it into relational data has always worked
>> out well.
>>
>> Row = namedtuple('Row', 'forecast occurred count')
>> table = [
>>    Row(True, True, 0),
>>    Row(True, False, 0),
>>    Row(False, True, 0),
>>    Row(False, False, 0),
>> ]
>>
>> def select(**key_value):
>>    s = table[:]
>>    for key, value in key_value.items():
>>       s = [ row for row in s
>>             if getattr(row, key) == value ]
>>    return s
>>
>> - Wrap the table list in a class.
>> - Add insert, update, delete.  Note that count is not a key column.
>> - Use the standard library bisect.bisect_right to keep the rows
>>   sorted and speed up lookups and modifications.
>> - The internal table array doesn't need to contain tuples, especially
>>   if the count column is updated frequently.  Still nice to return or
>>   yield tuples for named, immutable key columns.
>> - Add convenience methods for non-trivial queries.  A little functional
>>   programming with the internal rows can accomplish a lot.
>>
>> Also consider SQL or your favorite ORM, possibly running with an in-
>> memory DB.
>>
>>
>> > The problems are:
>> >
>> > 1. Set up is hard... I have to duplicate keys, and in an n-dimensional
>> > array, this is a real pain.
>> >
>> > 2. I always have to access a cell in the right order, and have to
>> remember
>> > that order. It would be great if I could access a cell in the example
>> above
>> > as table['forecast']['occurred'] or table['occurred']['forecast']
>> (syntax
>> > doesn't matter, table('forecast', 'occurred') is fine too).
>> >
>> > 3. It's hard to slice... in a 2-dimensional array, it's easy to get the
>> > cells in the outermost dict via table['forecast'].values() but how
>> would I
>> > just as easily (equivalently) get table['not occurred'].values()?
>> >
>> > Thoughts? Am I missing something obvious?
>>
>> Tuples as keys is valid syntax, but only addresses (1.):
>>
>>         table['forecast', 'not occurred']
>>
>> Maybe if you're really adventurous:
>>
>>         table[frozenset(['not occurred', 'forecast'])]
>> _______________________________________________
>> CentralOH mailing list
>> CentralOH at python.org
>> https://mail.python.org/mailman/listinfo/centraloh
>>
>
>
> _______________________________________________
> CentralOH mailing list
> CentralOH at python.org
> https://mail.python.org/mailman/listinfo/centraloh
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/centraloh/attachments/20161209/d963570b/attachment.html>


More information about the CentralOH mailing list