Hello all, I'm developing a custom computational application which I chose to base in Numpy. Already quite in love with Python, and with proprietary things making me increasingly sick (through forced exposure to stupid errors I can't correct), Numpy was the way to go. Now, it is in times like this that I regret not going for graduate studies in computation - I'm a bit locked in the old paradigms that my [fortran] generation learn. Since my application is only vaguely required to be 'generic', I had to dive into the wonderful world of computer science - a previous post in this group led to some very interesting solutions for the application, which, while doing nothing, is capable of doing everything :) A bit of context: the application is supposed to process telemetry, outputing some chart, alarm, etc. Raw data is obtained through plugin-like objects, which provide a uniform interface to distinct sources. The processing routines are objects as well, but operate on data as if they were functions (sort of sin(x)). This way, I don't need to define anything other than the interfaces - the core remains flexible. I came to a problem, though, while trying to define some structure for data transit. At first I imagined I could keep both raw data and results inside the same object; unfortunately, if I want to use these results in a second stage, my flexibility is rather impaired. Then I thought about getting raw data into an object, passing that to the processing core, and finally storing its output in another object. While this has the advantage of clearing raw data out of memory as soon as I finish chewing it, I seem to lose the relation between raw and result data sets - which I have to maintain somewhere else. Yet another issue crops up, in relation to very large data sets. If there's not enough memory to cope with the data set, one either relies on swapping or changes the algorithm - and in this case having 'inteligent' data objects allows good, textbook encapsulation. My question is thus, does anyone have experience or could point to literature/code where related problems are addressed? I understand that my application may be suffering from excessive 'generality', but certainly this problem has surfaced elsewhere. Looking forward to your answers, Renato
It sounds like you want a new class X that does three things: knows where the data is and how to access it, knows how the data are to be processed and can do this when asked, is able to provide a "results" object when requested. The results object can store who made it and with what processing module, thus indirectly linking to the data and techniques (which the X instance may or may not store, as is convenient). fwiw, Alan Isaac
Hello there, indeed, the tasks you described correspond to what I'm seeking to implement. The thing is, for the sake of encapsulation (and laziness in the programming sense), I'm keeping responsibilities well defined in several objects. I guess this type of coding is pretty much ordinary for an OO person - it's just me having trouble with the philosophy. So, upon much thought, it breaks down like this: - Crunchers: lowest level objects that encapsulate stuff I do with Numpy/Scipy functions, on Numpy objects. Say, get data from arguments, unbias the data, zero-stuff, fft the set, etc. They are meant to be written as needed. - DataContainers: abstraction layer to data sources (DB, files, etc) and to other data objects still in memory. Data returned by Crunchers is stored inside - in practice, piped here by an Analysis object. So far, I see no need for nesting DCs inside other DCs. - Analysis: these are the glue between Crunchers, DataContainers and the user (batch, GUI, CLI). An Analysis is instanciated by the user, and directs both data flow into DCs, as well as out of them. While each Analysis has one and only one 'results' attribute, which points to some place within a DataContainer, I imagine Analysis made by concatenating several Analysis - just call Analysis.result() to access data at a certain stage of processing. Well, so it is. Hopefully this setup will lend a good degree of flexibility to my application - the crunchers are hard to develop, since I haven't seen the data yet. Nadav: I had looked into pytables while investigating low-level interfaces that would have to be supported. It's a lot below what I was looking for - my DataContainers do obtain their nature from other classes which are responsible for talking to DBs, files and the like - but it is the design of these containers that's hard to conceive! Cheers, Renato On Dec 7, 2007 6:48 PM, Alan Isaac <aisaac@american.edu> wrote:
It sounds like you want a new class X that does three things: knows where the data is and how to access it, knows how the data are to be processed and can do this when asked, is able to provide a "results" object when requested. The results object can store who made it and with what processing module, thus indirectly linking to the data and techniques (which the X instance may or may not store, as is convenient).
fwiw, Alan Isaac
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
participants (2)
-
Alan Isaac
-
Renato Serodio