Hi all,
This last Friday I had a chance to talk to Tom Abel and Oliver Hahn
(both CC'd on this message) about their experiences with using yt, and
they brought up some points which I've now had a chance to think
about, and which I find very interesting, certainly as something to
discuss. Here are my notes on it, along with a proposal for moving
forward.
As a quick note, what really hit home that we need better
documentation was trying to make a thin projection. The definition of
what a 'source' could be wasn't there, there were no examples, and I
had to go look at the source to figure out what the parameters were
even called. I think that's not ... good.
Python Inline Documentation
===========================
One of the coolest things about Python is the help() function, which
prints out the function signature and the contents of the doc string.
In the source code, the docstring is inline in the function, like so:
def some_function(a, b, c):
"""
This function does something.
"""
return a+b+c
The output of help(some_function) would look like this:
>>> help(some_function)
Help on function some_function in module __main__:
some_function(a, b, c)
This function does something.
>>>
Generated Documentation
=======================
The yt docs are generated using an extension to Sphinx called autodoc.
What this does, as you can see by going to the API docs and clicking
"view source" (which, counterintuitively, displays the doc source and
not the source code of the functions) is at documentaion build time,
pull all the docstrings from the source and render them in the
document. Ideally, we would want something that renders nicely as
well as looks good in the inline help -- and to maximize the detail
without becoming encumbering.
For most of the functions in yt that have docstrings, they have been
written in a narrative style, with parameters inside asterisks, so
that they would render nicely in the API docs:
http://yt.enzotools.org/doc/modules/amrcode.html#yt-lagos-outputtypes-outpu…
But, it's becoming clear that perhaps this is not the best approach.
I think a combination of narrative and explicit parameter declaration
would be better. The NumPy/SciPy projects have a CodingStandards
description:
http://projects.scipy.org/numpy/wiki/CodingStyleGuidelines
that covers docstrings, with a very detailed example of a completely
filled out docstring here:
http://svn.scipy.org/svn/numpy/trunk/doc/example.py
As an example, the 'tensorsolve' function is defined here:
http://svn.scipy.org/svn/numpy/trunk/numpy/linalg/linalg.py
and the API docs are here:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.tensorsolv…
This looks great, I think. yt is a bit more class-oriented than
NumPy, but I believe that we should strive for a similar level of
detail as well as a similar style: presenting parameters, what those
parameters can be, and a brief word on the return type.
Ideal Type Of Documentation
===========================
A few weeks ago, Tom and I were chatting and he mentioned to me a
Pascal manual. In this manual, there was a single function on every
page: a description, parameters (often repeated between functions, but
explicitly listed for each), and an example. My first Unix manual was
exactly like this, and I remember it being one of the best sets of
documentation I've ever used. I believe this is the model NumPy and
SciPy are striving for, as well.
I think this is what yt should strive for, too. One page per class or
function, with a description, parameters, and examples -- just like
mentioned above. In doing so, I think that the online help -- which
right now is sort of helpful, but not amazingly helpful, would become
much more useful.
The fact that on the mailing lists we get questions asking us about
fundamental operations in yt is, I think, an indictment of the way
it's presented. As the Enzo Workshop revs up, a couple of us will be
writing talks about using Enzo, using yt, etc, and I think this is a
time to harness that momentum to reorganize and rewrite some of the
doc strings. Of course, I would take the lead on the initial rewrite,
as I'm the one who wrote all the bad docstrings.
What does everyone think about this?
Action Items
============
(It wouldn't be a long email about procedures if we didn't use a
buzzword like 'action items' :)
Firstly: a vote and a request for comments.
Do we want to agree on the NumPy standard for docstrings? What does
everyone think about this idea, of a set of docstring guidelines, and
trying to focus on a better set of API documentation, to be used both
in generated form and inline via help()?
If we can agree on the NumPy standard, I believe that I should be able
to convert most of the docstrings with some relative ease; it's mostly
going to be a matter of typing, copy/pasting, etc. I will copy a
style guide into doc/, which will be largely taken from the NumPy
style guide, but I will additionally add a document with examples for
common strings: I would prefer we have a single, consistent manner for
referring to things like AMR3DData as a source, for instance. I will
then go through and convert all the doc strings that I am familiar
with. This would leave us with three files:
* Example docstring, which can be read in verbatim and edited.
* List of yt idioms for cross-referencing and describing things.
* File describing this standard, largely pulling from the NumPy standard.
The next thing will be, going forward, how do we ensure that the doc
strings are correctly inserted with new code? I am more guilty of
this than I would care to admit (I sometimes fall into the camp of
thinking that functions with well-named parameters are
self-documenting, which is probably a mistake!) but I think having
someone agree to review incoming changesets for documentation updates,
and then to email the committer if they do not have a sufficient
docstring. My inclination is to suggest that someone who already
reviews incoming changesets to do this, which I think means either me,
Sam or Stephen. Sam, would you be willing to take this on? It should
be relatively straightforward.
Additionally, would anyone volunteer to help me out with rewriting
some of the existing docstrings? In particular, for code you have
contributed?
The End
=======
I think that if we really take the docstrings seriously, then the
documentation on the whole will vastly improve. I am in the process
of rewriting some sections, removing the old-style tutorial and trying
to better walk the user through the process of getting up and running.
The current documentation has a lot of information, but it's not very
good at getting people up and running in anything other than the most
simple manner. I think that getting started on improving the
docstrings will also help refocus efforts toward better documentation
on the whole. And, I'd like to end by admitting culpability for the
sorry state of the docstrings we currently have. But I think this
might be good, in the long run, because it'll help out with getting us
on track for a better code that's much easier to use!
And finally, thanks to Tom and Oliver for taking the time to chat with
me about this -- I really appreciate their thoughtful feedback on
this.
Best,
Matt

Hi all,
Dave and I were chatting a couple weeks ago, and it became clear to me
that yt currently does something incorrectly with ghost zones and
smoothed covering grids. As it stands, in the smoothed covering grids
and the vertex centered data, the fields that are interpolated between
points are logged, then interpolated, then 10^values is returned.
This is not how Enzo does it, and I think yt should change too.
This may change some user-facing behavior, so I wanted to clear it
with everyone here first. This would change primarily the generation
of ghost zones, which are used for some finite stencils and for the
volume rendering. (However, I have been unable in my tests to see any
changes in the volume rendering.)
What do you all think?
-Matt

Just as a quick note, as I suspect at some point the future this will
be useful to someone else, as well.
Forwarded conversation
Subject: Summation of large float32/float64 arrays
------------------------
From: Matthew Turk <matthewturk(a)gmail.com>
Date: Fri, May 21, 2010 at 1:13 PM
To: Discussion of Numerical Python <numpy-discussion(a)scipy.org>
Hi all,
I have a possibly naive question. I don't really understand this
particular set of output:
In [1]: import numpy
In [2]: a1 = numpy.random.random((512,512,512)).astype("float32")
In [3]: a1.sum(axis=0).sum(axis=0).sum(axis=0)
Out[3]: 67110312.0
In [4]: a1.sum()
Out[4]: 16777216.0
I recognize that the intermediate sums may accumulate error
differently than a single call to .sum(), but I guess my concern is
that it's accumulating a lot faster than I anticipated. (Interesting
to note that a1.sum() returns 0.5*512^3, down to the decimal; is it
summing up the mean, which should be ~0.5?) However, with a 256^3
array:
In [1]: import numpy
In [2]: a1 = numpy.random.random((256,256,256)).astype("float32")
In [3]: a1.sum(axis=0).sum(axis=0).sum(axis=0)
Out[3]: 8389703.0
In [4]: a1.sum()
Out[4]: 8389245.0
The errors are much more reasonable. Is there an overflow or
something that occurs with the 512^3? These problems all go
completely away with a float64 array, but the issue originally showed
up when trying to normalize an on-disk float32 array of size 512^3,
where the normalization factor was off by a substantial factor (>2x)
depending on the mechanism used to sum. My suspicion is that perhaps
I have a naive misconception about intermediate steps in summations,
or there is a subtlety I'm missing here.
I placed a sample script I used to test this here:
http://pastebin.com/dGbHwFPK
Thanks for any insight anybody can provide,
Matt
----------
From: Robert Kern <robert.kern(a)gmail.com>
Date: Fri, May 21, 2010 at 1:26 PM
To: Discussion of Numerical Python <numpy-discussion(a)scipy.org>
It's not quite an overflow.
In [1]: from numpy import *
In [2]: x = float32(16777216.0)
In [3]: x + float32(0.9)
Out[3]: 16777216.0
You are accumulating your result in a float32. With the a.sum()
approach, you eventually hit a level where the next number to add is
always less than the relative epsilon of float32 precision. So the
result doesn't change. And will never change again as long as you only
add one number at a time. Summing along the other axes creates smaller
intermediate sums such that you are usually adding together numbers
roughly in the same regime as each other, so you don't lose as much
precision.
Use a.sum(dtype=np.float64) to use a float64 accumulator.
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion(a)scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
----------
From: Matthew Turk <matthewturk(a)gmail.com>
Date: Fri, May 21, 2010 at 1:32 PM
To: Discussion of Numerical Python <numpy-discussion(a)scipy.org>
Hi Robert,
Thank you very much for that explanation; that completely makes sense.
I didn't know about the dtype for accumulators/operators -- that did
just the trick.
Much obliged,
Matt

Hi guys,
On Monday I rewrote the (adaptive) projection backend to use Quad
Trees for point placement instead of the current system. As it
stands, the current system for projecting works pretty well, except
for a couple caveats:
- It has high peak memory usage
- It can only be decomposed in the image plane, so inline projections
can't be made, and load balancing AMR was poor
- It's complicated, but that complication comes from wanting things to be fast
Essentially, it's fast by wall-clock standards, but you need to run in
parallel for big datasets. Plus, if you want to run with big
datasets, you probably need to fully occupy nodes, because of the peak
memory usage.
Anyway, I was getting very frustrated with points #1 and #3 (but not
#2, oddly) and so I wrote a new backend using Quad Trees. We did
everything using int math anyway, but because joining grids was
O(NxM), things got a bit slow in some domains. The conversion wasn't
too bad.
I ran some simple benchmarks. These were all conducted on the Triton
resource, in serial, on a single node. They do not reflect time for
disk IO, but they should roughly reflect real memory load.
* For my 31 level PopIII binary star run (875 million cells, 8000
grids), it took 9 seconds with peak memory usage of 176 megs.
* For a large, 768^3 with (peak) 12 levels of refinement (874 million
cells in 125,000 grids), it takes 81 seconds with peak memory load of
840 megs.
* For the 512^3 L7 lightcone run, at a time when the data had 300,000
grids, it took 170 seconds with peak memory usage of 2.5 gigs.
In all cases, the results are correct according to the standard check
of projecting "Ones". These are crazy improvements, especially
considering that these are times for serial projections.
And important note here is that this *will* parallelize independent of
the image plane, although I have yet to see the need to parallelize
just yet. The load balancing may be better optimized based on image
plane decomposition, but the algorithm should work with independent
joins. Additionally in parallel the final join at the end of the
calculation may have higher peak memory usage if the decomposition is
not via image plane decomposition, but it should still work.
Unfortunately, right now my time is very constrained; so I am going to
put out there that if someone would be willing to help me break apart
the existing code and insert this code, then we can swap out the
engines. But in the absence of that, I'm not sure that it'll get
inserted into the main line before the Summer, or at least the late
Spring. As is, I can definitely provide a very simple and manual
interface to use it (and I intend to do so) but integrating it into
the mainline is not going to be my priority right now. But, again, I
will provide *an* interface, even if it's not a *final* interface and
likely not a *nice* interface.
Anyway, I'm pretty excited about this -- definitely a huge improvement
over the previous algorithm, and we still retain the *adaptive* nature
of the projections: project once, plot many.
-Matt

Hi all,
I've posted a first draft of a style guide on wave:
https://wave.google.com/wave/#restored:wave:googlewave.com!w%252B2ycf4bYPA
I'd appreciate comments. Feel free to reply below the wave or edit
inline, but I think this is a good first draft. There's quite a bit
of code that I wrote a while back that does not adhere to this, and
part of the reorganization effort (which keeps getting delayed,
although I think it is going to very rapidly gain momentum in the near
future) will be to increase consistency with this.
-Matt

Hi guys,
This paper:
http://arxiv.org/abs/1005.1648
is an impressive piece of work on simulated observations, using an AMR
RT code to solve non-LTE level states from regridded SPH data. I'm
going to take some time to digest it, but it's definitely very
interesting, and I thought I'd share here.
-Matt

Hi all,
I've just pushed to devel a new load-balancing implementation for parallelHF. The old way would repeatedly read data off disk during load-balancing. Instead, this one uses a random subset of all the particles and does the load-balancing in memory on the subset, only. This should be substantially faster, especially on slow disks. It turns out that using a very small percentage of the total number of particles can achieve decent load-balancing.
There is a new parameter that controls how many particles are used for load-balacing, called 'sample'. It gives the fraction of the total number of particles to be used for balancing. The default is 0.03, which means 3% of the total are used. I've experimented down to 0.3% and seen reasonable results there, too.
h = yparallelHF(pf, threshold=160.0, safety=1.5, \
dm_only=False,resize=True, fancy_padding=True, rearrange=True, sample=0.03)
Let me know if you run into any problems using this. I've done some testing and the results are identical to the old method, but something could be wrong, of course.
_______________________________________________________
sskory(a)physics.ucsd.edu o__ Stephen Skory
http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student
________________________________(_)_\(_)_______________

Hi all,
in the next 15 minutes or so (or whenever it finishes installing on the slow scratch disk), yt on Kraken will be updated to r1717 of trunk.
That is all.
_______________________________________________________
sskory(a)physics.ucsd.edu o__ Stephen Skory
http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student
________________________________(_)_\(_)_______________