Greetings,
I have just finished an extensive overhaul of the HaloProfiler that includes
a bunch of new features, as well as completely changes the running syntax.
Other than the addition of various new features, the main goals of this
overhaul were:
1) to make the tool more general (mostly by removing hard-coded filtering by
virial quantities), essentially allowing someone to profile a random list of
points that may or may not be actual halos
Easy filtering of halos by virial quantities remains an option, but is now
simply a specific instance of something more general and far more powerful:
a halo filtering device that allow the user to create their own filter
functions to comb through radial profile data and decide whether a halo
meets certain criteria.
2) to remove the dependence on an additional parameter file for the various
options.
The parameter file is now gone. Most of the HaloProfiler parameters have
been turned into instantiation keyword args. The keyword list is now
considerably longer, but the benefit is that the number of files needed to
run this thing has been reduced from 2 (running script and par file) to just
1 (running script). There are a large number of keywords options that are
specific to either the profile or projection routines that are taken in at
instantiation and stored as attributes. This was done to keep those
function calls simple. I'm curious to know peoples' thoughts on whether
these keyword args should stay put or move to the individual function
calls. Adding fields for profiling and projections have been moved to
functions addProfile and addProjection.
Here is a brief list of the new features that I can remember:
- the halo list read routine can be easily customized to read in columned
ascii data of varying formats through the use of a dictionary (see attribute
self.halo_list_format)
- ability to change the function call, args, and kwargs for the halo finder
Profiles:
- filter halo list with user-written filter functions (see new file
HaloFilters.py for an example of a function to filter based on virial
quantities)
- extract and output scalar quantities from halos using the filter
functions
- pre-filter halos based on values in the initial halo list (skipping the
profiling altogether and saving a load of time)
Projections:
- choose the axes to be projected (instead of hard-coded to all three)
- easily select the list of halos to be projected (the total list, filtered
list, a new file, or an actual list)
Before I commit all this, I'd like a little feedback, mostly on the
migration of the par file parameters to keyword args. I put three files in
the pastebin for people to download. I chose not to submit a diff since the
changes were so sweeping.
The new HaloProfiler.py: http://paste.enzotools.org/show/178/
HaloFilters.py (should go in yt/extensions with HaloProfiler.py)
http://paste.enzotools.org/show/179/
runHaloProfiler.py (an example script to run the new HaloProfiler)
http://paste.enzotools.org/show/180/
I will try my best to write up full documentation for this as soon as
possible, perhaps even today.
Please let me know what you think.
Britton
Hi everyone,
The rewritten hierarchy now passes all the unit tests. You can
inspect this branch in this repository:
http://hg.enzotools.org/yt/
in the named branch hierarchy-opt . The Orion hierarchy will need to
be converted (Jeff and I can talk about this offline) but overall I
think the new mechanism for instantiating and creating hierarchies is
now extremely straightforward, and will suit itself better to new AMR
formats. You can see the outline here:
http://hg.enzotools.org/yt/file/hierarchy-opt/yt/lagos/HierarchyType.py#l201
This is the base class, and other hierarchies just have to implement
those functions to be real. The enzo hierarchy instantiation should
be marginally faster and might use a bit less RAM. However, I still
need to work on both reorganizing the DataQueue and the Grid objects,
which will become smaller (eliminating some data members) and faster.
I intend to eliminate at least one dictionary in the grid object.
(Each dictionary removal drops 1K per object -- for the L7, this would
be 300 megs per dict.)
If you'd like to take a look, please do so, and let me know how it
works out. You can do this with:
hg clone http://hg.enzotools.org/yt/ ./experimental-yt
cd experimental-yt
hg up -C hierarchy-opt
Best,
-Matt
Hi guys,
Just as a note, if you're doing heavily performance-intensive stuff,
you can use the latest version of Cython to auto-cythonize (i.e.,
compile down to C) your code. This requires you have Cython > 0.11.2
or so installed.
http://docs.cython.org/docs/tutorial.html#pyximport-cython-compilation-the-…
At the top of your script using yt, put this line:
import pyximport; pyximport.install(pyimport=True)
And it'll try compiling to C all of your code before executing.
Without annotations, speedups of 10-30% have been reported.
Currently, even after hacking out the "exec" calls, it doesn't work
with yt ... but I'm going to play with this some more later.
-Matt
As some of you know, yt's GUI right now is based on hand-coded
wxPython. The next generation, available in the yt hg repo, is based
on TraitsUI. Right now it uses some wx-specific code for generating
Matplotlib canvases, but it's largely portable. It looks like going
forward TraitsUI will primarily use Qt. I've used the nextgen GUI
with Qt (switching toolkits is trivial with TraitsUI) and it looks
good, but this will still be something of a change, when it happens --
I anticipate we'll track TraitsUI on this front. It's not clear to me
how easy installing Qt is from an automated standpoint.
The second message contains Eric Jones's response about PyQt/PySide
and the future of TraitsUI.
Forwarded conversation
Subject: [Enthought-Dev] PySide to replace PyQt?
------------------------
From: Glenn Tarbox, PhD <glenn(a)tarbox.org>
Date: Tue, Aug 25, 2009 at 8:31 PM
To: IPython-dev(a)scipy.org, Enthought-Dev(a)enthought.com
Sooner or later, something was gonna need to happen WRT Riverbank and
the PyQt licensing. I had hoped that an entirely new project wasn't
going to be necessary.... but apparently it is.
PySide Released to the Wild:
http://labs.trolltech.com/blogs/2009/08/25/pyside-released-to-the-wild/
>From the PySide FAQ
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
What about PyQt?
Nokia’s initial research into Python bindings for Qt involved speaking
with Riverbank Computing, the makers of PyQt. We had several
discussions with them to see if it was possible to use PyQt to achieve
our goals. Unfortunately, a common agreement could not be found , so
in the end we decided to proceed with PySide.
We will however maintain API compatibility with PyQt (you can use the
same method names but can’t inter-operate with PyQt), at least for the
initial release. To import PySide you have to use “import PySide”
instead of “import PyQt4″.
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
I didn't know where to post this. PySide needs to mature a bit
(support more than Linux for example) but both Matplotlib and
Enthought are affected. PyQt will likely need to be replaced in both
packages once PySide becomes more mature as the licensing of PyQt is
problematic now that Qt is LGPL. Its also likely that with Nokia's
backing, the PySide API will eventually dominate.
Hopefully, the above statement regarding the similarity of the API
will make moving over easy. Personally I'd like to see a focus on Qt
vs Wx by Enthought as I believe it to be much more powerful... but
thats my personal opinion and what I use.
As a side note, I've successfully nailed my C++ Qt code to IPython
using a Cython shim. The PyQt event loop is available and all seems
to work great.
-glenn
--
Glenn H. Tarbox, PhD || 206-274-6919
http://www.tarbox.org
_______________________________________________
Enthought-Dev mailing list
Enthought-Dev(a)enthought.com
https://mail.enthought.com/mailman/listinfo/enthought-dev
----------
From: Eric Jones <eric(a)enthought.com>
Date: Wed, Aug 26, 2009 at 1:04 AM
To: enthought-dev(a)enthought.com
Here are my thoughts and comments:
1. We are switching our focus to Qt and away from wx for all new work.
That happened when the
Qt LGPL was announced. We aren't abandoning wx in the near future, though.
To much of our legacy stuff relies on it.
2. Evan Patterson has done excellent work this summer, and switching
between wx and qt backends
on traits.ui/Envisage applications work pretty dang well. At
least two of our large (consulting)
applications work well under either one without a line of code
changed to switch between the two.
3. We're going to use PyQt (purchasing the requisite licenses) for
consulting project releases later this year.
4. We are pained to see that Nokia and Riverbank didn't come to an agreement.
a. It has led to an awkward in-between period where Nokia's
solution isn't ready,
and Phil's is inevitably losing part of its mind share.
b. We like Phil and appreciate the huge contributions he has made
with PyQt and
also all the work he put into traits.ui for the Qt backend. I
really would like to see
the code purchased by Nokia. With the limited information we
all have, it does
seem that all parties (Nokia, Phil, the community) would have
come out ahead had
they reached an agreement. Of course, limited information is
of limited value...
c. We feel there are several iterations before PySide performs as
well (cpu and memory)
and is as robust as PyQt. That leads to more work for all of
us to use PyQt now
and switch to/test on PySide later.
d. It just hurts to think about talented engineers re-solving a
well solved problem with
little engineering benefit. The planet is losing many man
years of engineering effort that
could be applied to new problems. Talented man years are
harder to come by than
people think...
5. QScintilla is part of PyQt, but I am not sure (even doubt) it will
be in PySide. It
is a useful widget in the software IDE world, and it'll take a lot
of work to reproduce
the wrapper in a clean room. I'm not sure QTextEdit will suffice
as a replacement.
We'll have to see how this pans out.
6. Inevitably the licensing issue will lead to a larger community using PySide.
7. Much work is being done on IPython to make it easier to fit into a
GUI. We also
have a version that Gael put together last year for wx. We are
waiting to move this
to Qt until the huge IPython re-factor going on is finished.
There are many subtleties
in the corner's of IPython and GUIs that are hard to get right.
The wx version we
have is passable, but not robust enough to be an
IPython-in-console replacement.
Sigh.
eric
_______________________________________________
Enthought-Dev mailing list
Enthought-Dev(a)enthought.com
https://mail.enthought.com/mailman/listinfo/enthought-dev
_______________________________________________
Enthought-Dev mailing list
Enthought-Dev(a)enthought.com
https://mail.enthought.com/mailman/listinfo/enthought-dev
Hi all,
in the spirit started by Britton of announcing active development, some of may not know that I have been working on a highly parallel halo finder based on the HOP method.
The current method of parallelizing HOP that is in 1.5 and svn-trunk essentially runs HOP serially on each subregion and glues the results back together at the very end. It works wonderfully when it works. Its biggest constraint is that every halo must exist fully inside at least one of the subregions. For high-particle count, small-cosmology simulations, one can end up with too many particles in one region, and this method breaks down.
The new halo finder, called 'chainHOP' (for now) for lack of a better term, takes a different approach to parallelize halo finding. Without getting into the heavy details, chainHOP parallelizes and glues haloes together based on the membership of particles in the chains that make up the haloes. A single halo may exist in several subregions.
Unfortunately, the results are not identical to good old HOP, and never will be. A primary reason is because I'm using a different kd tree from HOP, and the kd tree HOP uses gives wrong answers (it calculates the distances between particles incorrectly, starting at something like the 6th decimal place, but it's been a while since I compared the two). However, the haloes found are very similar in what counts - size, center of mass and number of haloes.
The upshot of all of this is, I can, for example, find the haloes in the z=0 L7 (512**3 particles) dataset in about 16 minutes on 128 cores on Triton. I have done the same for a 1024**3 unigrid dataset on Triton (large nodes) on 512 cores in about 6 hours, and since then I have made a couple improvements, and it should be more like 5 hours now. Much of that increase in time comes from a 100x increase in the number of 'chains' that need to be merged.
My work is not ready for prime-time yet, but I thought you might be interested to know!
_______________________________________________________
sskory(a)physics.ucsd.edu o__ Stephen Skory
http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student
________________________________(_)_\(_)_______________
Hi everyone,
Stephen and I are at the SciPy 2009 workshop. I saw a talk yesterday
on Time Series analysis of nueroimaging data ( all the SciPy talks are
online here: http://www.archive.org/search.php?query=scipy but this is
the talk in question:
http://www.archive.org/details/scipy09_day1_04-Ariel_Rokem and Peter
Norvig's is especially good:
http://www.archive.org/details/scipy09_day1_03-Peter_Norvig ) and got
really jazzed up about time series analysis.
So, starting out initially on my own (and later to incorporate
Britton's EnzoSimulation code) I've written up a first pass at a time
series analysis framework. This, as with all the experimental stuff
that's not ready for trunk, is in the hg repo in the branch
timeseries.
Right now it works as follows -- you instantiate an EnzoTimeSeries,
with an optional name and an output log. (Currently the entire thing
is populated via the "OutputLog" which is in Enzo1.5 and beyond.) It
reads in all the parameter files and instantiates them (see below for
why this is not awesome) and then provides some mechanisms for getting
data about and from them.
The architecture is separated into three main sections -- time series
objects, data objects, and analysis tasks. The first two shouldn't
ever really have to be written by the user; anything that exists as a
data object in yt already (spheres, regions, etc etc) are
automaticlaly converted into time-series data objects. The analysis
tasks are a bit more user-approachable. You can see what's already
been created here:
http://hg.enzotools.org/yt/file/timeseries/yt/data_objects/analyzer_objects…
For example, the MaximumValue task:
class MaximumValue(AnalysisTask):
_params = ['field']
def eval(self, data_object):
v = data_object.quantities["MaxLocation"](
self.field, lazy_reader=True)[0]
return v
What this does is say, I will accept a 'field' parameter, and then
inside 'eval' it returns the appropriate value. (Note that the time
series framework introspects the 'eval' method, so if it accepts a
'pf' instead of 'data_object' it knows what to do.)
So, you might say, whatever, right? How would I use this? Well, I've
placed an example file here:
http://hg.enzotools.org/test_scripts/file/tip/time_series.py
from yt.mods import *
from yt.data_objects import *
ts = EnzoTimeSeries("OutputLog")
ret1 = ts.eval([MaximumValue("Density"),
SlicePlotDataset("Density", 0, [0.5, 0.5, 0.5])
])
sp = ts.sphere([0.5, 0.5, 0.5], 1.0)
ret2 = sp.eval([MaximumValue("Density"), CurrentTimeYears()])
So let's examine this one at a time.
The first line:
ts = EnzoTimeSeries("OutputLog")
initializes the time series. Here I've screwed up the arguments, so
the name is OutputLog, and the location of the OutputLog itself goes
to the default value of ... "OutputLog".
The next line:
ret1 = ts.eval([MaximumValue("Density"),
SlicePlotDataset("Density", 0, [0.5, 0.5, 0.5])
])
Creates two tasks, MaximumValue and SlicePlotDensity, each with their
own arguments. MaximumValue is told "Density" and SlicePlotDataset
accepts "Density", the axis, and the center. These are given to the
time series and it's told, go ahead and do this. It does, and it
returns the list of list of values as ret1 -- the first being the max
value, the second being the slice filenames.
Okay, now, if we keep going:
sp = ts.sphere([0.5, 0.5, 0.5], 1.0)
ret2 = sp.eval([MaximumValue("Density"), CurrentTimeYears()])
The first line here constructs a 'time series data operator' -- a
sphere. The sphere is centered at (0.5, 0.5, 0.5) and has a radius of
1.0. We can then call eval on it, just like we did on the time series
object, and it creates and operates on that sphere at every step in
the time series.
This is basically what I was hoping to move toward with yt-2.0 -- a
disconnected operator/data model. The framework isn't sufficient yet
to do this, but it's getting there. I'll be merging in Britton's work
wiht EnzoSimulation for grabbing and selecting cosmology information.
Additionally, I'll be splitting out the reliance on OutputLog into a
generalized time series creator, which CAN but NEEDN'T accept
OutputLog. The population step and the time series will be separate,
and Britton's got a mechanism for populating from an input parameter
file.
Shortcomings:
* All the hierarchies stick around because the parameter files are
stored. This won't work for large datasets, as it'll quickly blow out
the ram. I'm working on a work around that will also handle slicing,
etc.
* The plotting isn't done yet, but it will be.
* Everything is in lists right now, because the return values need
not be floating point, but can be strings, objects, 3D arrays,
whatever.
* Some basic tasks should be built in -- mostly things like
"filename" and "time" and "redshift" which you should be able to just
access and grab.
* Storing results of evaluation tasks is also not clear to me yet.
Recarrays would be great for fixed-size items, but not objects. Maybe
the user should just have the option to eval-and-store.
* Adding tasks requires some object instantiation, and usage of
class names. Do y'all have any thoughts on this? Should we move to
the same paradigm used in the callback method, where a plaintext
addressor name is used? (i.e. modify["grids"] instead of
add_callback(GridBoundaryCallback())...)
I really feel like moving toward time series analysis is the next
major step, and I'd like to continue working on this, as I'll be using
it more and more in the future. Additionally, I think we could move
toward something of a pipeline, as well -- stringing together inputs
and outputs. Something like:
my_pipeline = TaskPipeline(MaximumValueLocation("Density"), VirialRadius(), ...)
ts.eval(my_pipeline)
and then have the outputs of tasks fed from one to the next.
Anyway, what do you all think?
-Matt
PS The reorganization should happen around the time we focus on time
series stuff -- which is why it's under data_objects.
Hi everyone,
After talking to Sam and Devin at PiTP and then to Dave yesterday (at
UCSD, my new residence) it's become clear to me that the PlotTypes and
PlotCollections files -- which are remarkably unchanged over the last
2+ years of development, since they were first implemented -- have
become a bit stale and have some creeping bugs. People have worked
around them, but it's time for an overhaul.
I'm writing with a new plan for how to handle plots, and I'm writing
because it might affect some of you. The basic idea is that plots
will have valid and invalid states; invalid plots will be redrawn as
necessary. Additionally, Pixelization into buffers will occur only
inside FixedResolutionBuffers, and every plot will have an associated
FRB.
So, a plot starts out, and it sets itself up and marks itself and its
colorbar as invalid. We then save it, and it sees that it's invalid,
so it redraws. This marks the colorbar as invalid (even though it
already is), and then the colorbar gets redrawn. Both are now valid.
We then change the zlim, which marks the colorbar as invalid while the
image remains valid. (Colorbar and colormap are essentially
inextricable here.)
So far, I think it's fine -- but the question comes in during user intervention.
So we have a valid state for our plot. (The user can then mess with
it however they like, and on the next save, that messing with it will
still be there -- because it won't be marked as invalid -- which is a
huge advantage!) When we change the width of the plot, then the image
becomes invalid. But my real question is, what do we do if the user
then sets the zlim and then changes the width?
Set zlim: invalidate colorbar
Save: revalidate both
Set width: invalidate image
But, what happens to the colorbar? Do we invalidate it, and then
reset it on next save? That is, when we change the width, does that
override the user setting the colorbar?
Does anyone have any thoughts? Thanks!
-Matt
I forgot to forward this along. Let me know if you have problems!
_______________________________________________________
sskory(a)physics.ucsd.edu o__ Stephen Skory
http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student
________________________________(_)_\(_)_______________
----- Forwarded Message ----
> From: "help(a)teragrid.org" <help(a)teragrid.org>
> To: sskory(a)physics.ucsd.edu
> Sent: Wednesday, August 12, 2009 8:13:00 AM
> Subject: Re: Project Space on Kraken
>
> FROM: Lucio, Daniel
> (Concerning ticket No. 174658)
>
> Hi,
>
> I want to let you know that you have two project directories:
>
> lucio@krakenpf5(XT5):~> ls -ld /lustre/scratch/proj/yt_common
> drwxr-x--- 2 sskory ytcommon 4.0K 2009-08-11 15:57
> /lustre/scratch/proj/yt_common/
>
> and
>
> lucio@krakenpf5(XT5):~> ls -ld /nics/a/proj/yt_common
> drwxr-x--- 2 sskory ytcommon 4.0K 2009-08-11 15:18 /nics/a/proj/yt_common/
>
>
> Please let us know in the future if you need more assistance with this.
>
> Best regards!
>
> Daniel
> NICS Team
>
>
> Stephen Skory writes:
> >Hi Daniel,
> >
> >
> >> please provide us with the following:
> >>
> >> - desired project directory name
> >> - who or what groups of people you want to provide access to.
> >>
> >> Best regards!
> >>
> >> Daniel.
> >> NICS Team
> >
> >Can it please be named 'yt_common'? I'd like these usernames to have write
> access:
> >
> >sskory
> >turk
> >skillman
> >jsoishi
> >collins
> >
> >Thanks!
> >
> >Stephen Skory
see attached...
so what do we say?
_______________________________________________________
sskory(a)physics.ucsd.edu o__ Stephen Skory
http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student
________________________________(_)_\(_)_______________
----- Forwarded Message ----
> From: "help(a)teragrid.org" <help(a)teragrid.org>
> To: sskory(a)physics.ucsd.edu
> Sent: Friday, August 7, 2009 9:05:56 AM
> Subject: Re: Project Space on Kraken
>
> FROM: Lucio, Daniel
> (Concerning ticket No. 174658)
>
> Hi,
>
> please provide us with the following:
>
> - desired project directory name
> - who or what groups of people you want to provide access to.
>
> Best regards!
>
> Daniel.
> NICS Team
>
> Stephen Skory writes:
> >Hi,
> >
> >We would like to request some project space on Kraken's lustre scratch disk. We
> would like to have space to install the Enzo [1] data analysis toolkit yt [2]
> that can be accessed by our colleagues. yt uses Python, which must be built
> statically due to the specialized CNL kernel. It is not trivial to build Python
> statically, and having this installation made permanent on lustre would make
> things easier for us.
> >
> >We don't need very much space. A few gigabytes (10GB is more than we would ever
> need) will be plenty.
> >
> >Thanks!
> >
> >Stephen Skory (on behalf of the yt developers)
> >
> >
> >[1] http://lca.ucsd.edu/projects/enzo/wiki/Enzo1.5
> >[2] http://yt.enzotools.org/