Greetings, Enthought, Inc. is very pleased to announce the newest release of the Enthought Python Distribution (EPD) Py2.5 v4.1.30101: http://www.enthought.com/epd The size of the installer has be reduced by about half. Also, this is the first release to include a 3.1.0 version of the Enthought Tool Suite (http://code.enthought.com/), featuring Mayavi 3.1.0. This is also the first release to use Enthought's enhanced version of setuptools, Enstaller (http://code.enthought.com/projects/enstaller/). Windows installation enhancements, matplotlib and wx issues, and menu consistency accross platforms are among notable fixes. The full release notes for this release can be found here: https://svn.enthought.com/epd/wiki/Py25/4.1.30101/RelNotes Many thanks to the EPD team for putting this release together, and to the community of folks who have provided all of the valuable tools bundled here. Best Regards, Chris --------- About EPD --------- The Enthought Python Distribution (EPD) is a "kitchen-sink-included" distribution of the Python™ Programming Language, including over 80 additional tools and libraries. The EPD bundle includes NumPy, SciPy, IPython, 2D and 3D visualization, database adapters, and a lot of other tools right out of the box. http://www.enthought.com/products/epd.php It is currently available as an easy, single-click installer for Windows XP (x86), Mac OS X (a universal binary for Intel 10.4 and above) and RedHat EL3 (x86 and amd64). EPD is free for 30-day trial use and for use in degree-granting academic institutions. An annual Subscription and installation support are available for commercial use (http://www.enthought.com/products/epddownload.php ) including an Enterprise Subscription with support for particular deployment environments (http://www.enthought.com/products/enterprise.php ). _______________________________________________ Enthought-dev mailing list Enthought-dev@mail.enthought.com https://mail.enthought.com/mailman/listinfo/enthought-dev
Hi, This mailing list is full of people spending their time writing non-trivial numerical code. This is why I would like to share my interrogations on a code smell that I notice a lot in my numerical code that revolves around persisting to disk often, and the mess that results. It is a bit hard to describe and it has been on my mind for a couple of months. I have finally written a blog post in an attempt to share my thoughts: http://gael-varoquaux.info/blog/?p=83 Pointing to a blog post on a mailing list seems to me almost rude, and I hope you'll forgive, but I'd love any feedback. It seems to me I am missing a pattern, or simply some insight on a recurrent problem. Cheers, Gaël
Interesting topic indeed. I think I have been hit with similar problems on
toy experimental scripts. So far the solution was always adhoc FS caches of
numpy arrays with manual filename management. Maybe the first step for
designing a generic solution would be to list some representative yet simple
enough use cases with real sample python code so as to focus on concrete
matters and avoid over engineering a general solution for philosophical
problems.
--
Olivier
On Dec 23, 2008 1:40 AM, "Gael Varoquaux"
On Tue, Dec 23, 2008 at 02:10:50AM +0100, Olivier Grisel wrote:
Interesting topic indeed. I think I have been hit with similar problems on toy experimental scripts. So far the solution was always adhoc FS caches of numpy arrays with manual filename management. Maybe the first step for designing a generic solution would be to list some representative yet simple enough use cases with real sample python code so as to focus on concrete matters and avoid over engineering a general solution for philosophical problems.
Yes, that's clearly a first ste: list the usecases, and the way we would like it solved: think about the API. My internet connection is quite random currently, and I'll probably loose it for a week any time soon. Do you want to start such a page on the wiki. Mark it as a sratch page, and we'll delete it later. I should point out that joblib (on PyPI and launchpad) was a first attempt to solve this problem, so you could have a look at it. I have already identified things that are wrong with joblib (more on the API side than actual bugs), so I know it is not a final solution. Figuring out what was wrong only came from using it heavily in my work. I thing the only way forward it to start something, use it, figure out what's wrong, and start again... Looking forward to your input, Gaël
I prototyped an approach last year that worked out well. I don't really know what to call it - maybe something like "property based persistence." It is kind of strange and I am not sure how broadly applicable it is - I have only used it for financial time series data. I'll try to explain how the idea works. I start with a python object that has a number of properties and an associated large data set (in my case, financial instruments and their associated time series in the form of numpy arrays.) I then created infrastructure that allowed me to define a simple "mapper" function that used a subset of the object's properties to define a "path" (expressible in the same form either as a file system path or as a path in HDF to a table.) Then I persisted the bulky data set (again, time series in my case) at that location. This little piece of infrastructure is very lightweight and cuts the client side persistence code down to only the small "mapper" functions. The mapper functions don't actually build up paths - they just specify the properties and ordering that you want to use to build up the paths. It also makes querying very simple and fast because you don't really query at all - instead the properties associated with the query directly express the path at which the data is located. The drawback of this simplistic approach is that you need to add a second level of path addressing if you deal with datasets so large that you can not really persist them under a single path. If you have single multi GB or TB arrays you probably want to chunk things up a bit more in the style of GFS and its open source counterparts. I still have the python code for this properties based time series database. It is a very small and simple peice of code, but I am happy to give it a quick polish and open source it if anyone is interested in taking a look. I am also about to try this model using F# and db4o for a .Net project. On Wed, Dec 24, 2008 at 2:21 PM, Gael Varoquaux < gael.varoquaux@normalesup.org> wrote:
Interesting topic indeed. I think I have been hit with similar
On Tue, Dec 23, 2008 at 02:10:50AM +0100, Olivier Grisel wrote: problems on
toy experimental scripts. So far the solution was always adhoc FS caches of numpy arrays with manual filename management. Maybe the first step for designing a generic solution would be to list some representative yet simple enough use cases with real sample python code so as to focus on concrete matters and avoid over engineering a general solution for philosophical problems.
Yes, that's clearly a first ste: list the usecases, and the way we would like it solved: think about the API.
My internet connection is quite random currently, and I'll probably loose it for a week any time soon. Do you want to start such a page on the wiki. Mark it as a sratch page, and we'll delete it later.
I should point out that joblib (on PyPI and launchpad) was a first attempt to solve this problem, so you could have a look at it. I have already identified things that are wrong with joblib (more on the API side than actual bugs), so I know it is not a final solution. Figuring out what was wrong only came from using it heavily in my work. I thing the only way forward it to start something, use it, figure out what's wrong, and start again...
Looking forward to your input,
Gaël _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
On Sat, Dec 27, 2008 at 04:59:25PM +0100, Bradford Cross wrote:
I prototyped an approach last year that worked out well. I don't really know what to call it - maybe something like "property based persistence." It is kind of strange and I am not sure how broadly applicable it is - I have only used it for financial time series data.
Yeay, that's exactly what I had in mind for my second try. I though I would call this special object some kind of execution context.
I still have the python code for this properties based time series database. It is a very small and simple peice of code, but I am happy to give it a quick polish and open source it if anyone is interested in taking a look.
I am very interested in both your code, and anything you can to tell us about what worked well, and what you would do different.
I am also about to try this model using F# and db4o for a .Net project.
Functionally language are clearly a very interesting alley to go down for these problems. I am right now in Python, and staying there for a while, but I believe I can learn a lot from functionnal languages. Thanks for your feedback, Ga�l
participants (4)
-
Bradford Cross
-
Chris Casey
-
Gael Varoquaux
-
Olivier Grisel