Hi all,
I have prepared a Pull Request to change how yt processes arguments to
scripts. I just issued it, but I am emailing because I think
discussion of what it does warrants a bit more public hashing out.
The PR is not done yet, for the reasons I outline below, so please
don't anybody accept it yet.
https://bitbucket.org/yt_analysis/yt/pull-request/38/overhaul-configuration…
This will directly affect you if you have:
1) Ever written "from yt.config import ytcfg; ytcfg[...."
2) Ever put your *own* command-line parser into yt.
3) Gotten annoyed with configuration files.
What I've done is create a new file, startup_tasks.py, that gets
imported whenever yt.mods gets imported, and only the first time that
happens. It sets up an argument parser (using argparse, which is
Python 2.7 only) that parses looking for:
--parallel
--paste
--paste-detailed
--detailed
--rpdb
--parallel
One of the things this does is that it also provides --help, so you
can see what is available. Furthermore, I've added a --config option,
so that from the command line you can set configuration options. For
instance:
--config serialize=False
and so on. This is pretty cool I think and will go a long way toward
making things nicer. However, the way this works is still up for a
few more problems. There are basically two ways this can work:
* Parse the entirety of sys.args and accept all arguments that yt
finds, rejecting and throwing an error on unrecognized ones (i.e.,
typos or things you might pass in to a script your write on the
command line). This will be an exclusive operation.
* Parse *non-exclusively*, allowing unrecognized arguments to pass
through. However, the old arguments will still be there: so any
script that has issues with things like --parallel and whatnot will
now see there, whereas it did not before because yt (totally un-cool!)
stripped them out of the sys.args variable. I don't want to do this
anymore.
The way I have implemented this for the yt command line tool is to set
a flag that says, "We're also inside the command line, so don't parse
anything, we'll handle adding new options to the parser and then we'll
parse everything at the end." This way you can pass both --parallel
and whatever option the yt command line utility wants. This works
because startup_tasks creates a "parser" object, adds arguments to
that parser object, then delays actually conducting the parsing until
all the arguments from teh command line tool have been added.
There are four ways this can work. I have presented them in order of
my increasing preference. (Coincidentally, on the astropy mailing
list they discussed this this week, as I was thinking about my
feelings on it as well, and they are moving away from parsing args in
the library; I think that works for them because AstroPy is designed
to be used much more inside larger frameworks, whereas yt is somewhat
more insular.)
1) Don't do any argument parsing if not called through a yt-specific
script runner. This means if you want to pass --parallel, you have to
run with something like "yt run my_script.py --parallel". Same for
--config and so on.
2) Parse all arguments any time yt.mods is imported, do not allow for
additional arguments. This breaks scripts that have their own
parsing.
3) Parse *some* of the arguments, but not all. All typos would
succeed and this could lead to confusion for the user.
4) Provide a yt-specific mechanism for adding new arguments. So if
you want to add new arguments, you do it at the top of your script,
rather than the bottom, and at the bottom inside the construction "if
__name__ == '__main__'" you'd inspect the values.
Anyway, I'm inclined to go for #4, simply because it would be the
simplest mechanism for ensuring an explicit method of getting
arguments into user-written scripts.
Thoughts?
-Matt
Hey all (Matt, in particular),
I've got some dead time waiting for jobs to run so I'd like to A)
discuss this topic and B) make some "final" decisions about this so I
can go ahead and do some coding on this. Sorry about the length of
this!
A) Briefly, for those of you who aren't aware of this topic, the idea
is to replace the ~/.yt/parameter_files.csv text file with a SQLite
database file. This has many advantages over a text file, too many to
list here. But in particular, it has built-in locks for writes (*),
which is especially useful for multi-level parallelism. This is
something we're currently addressing in "official" examples with a
kludge [0]. I think everyone is in agreement that this is a good
thing, no?
The other big thing that this feeds into is a remote, centralized
storage point for a clone of this database. I've discussed this idea
before, sketched up a simple partially functional example, and made a
simple video cast of how it works. [1]
B) The final decisions that I'd like input on are these.
- What data fields should we include in the databases? There are three
ways to go with this.
#1. The same amount of data that is in the current csv (basically:
hash, name, location on disk, time, type). This is probably too few
data fields, so I think we can scratch it off immediately.
#2. Everything that can be gleamed from the dataset. This is
actually fine to do practically because of the database being binary
and searchable. However, because the fields in various datasets are so
different, this could result in a fairly unwieldy database with (in a
Chris Traeger voice) a literal ton of columns. This could be mitigated
by having a different database tables for each type of dataset (Enzo,
Athena, etc...), but that really only swaps one kind of complexity for
another.
#3. A minimal set of "interesting" fields (redshift, box resolution,
cosmological parameters, etc..) This is more attractive than #2 in
that it's very unlikely anyone will want to search over every field in
a dataset, so it keeps things more streamlined. But then we have to
agree to a reasonable set of parameters to include, and it makes
future changes a bit more difficult.
What do we all think?
- Once we have the above settled, and working, I would like to extend
the functionality to the cloud bzzzzzzz. Get it? It's a buzz word. So
it buzzes. Thanks, I'll be here all week.
There are three (four) ways to do this that I can think of
#1. Amazon Simple DB. The advantages of this is that it's offered
free to all up to 1GB of storage and some reasonable limit of
transactions per month. Each user sets up her own account on S3, and
no one else has to be involved. But the main disadvantage is that it
only supports storing things as strings, which makes numerical
searches and sorts less useful, more annoying, and slower.
#1.5. Amazon Relational DB. This is not free at any level, but it
offers all the usual DB functionality. Amazon does offer some
educational grants, so we could apply for that. This service is
targeted at usage levels that we will never reach, but if we get free
time, that's fine. I think in this case (and the next two) user
accounts on the database would have to be created for yt users by
"us".
#2. Google App Engine. Free right now in pre-beta invitation-only
phase. It will be similar or #1.5 above, as I understand things, and
not be free forever. Personally, I seriously doubt that we'd get in on
the pre-beta. I've looked at the application form [2] and I don't even
understand one of the questions.
#3. Host a MySQL (or similar) database on one of our own servers
(yt-project or similar). The advantage is that the cost should be no
more that Matt is paying now. The disadvantage is, again, we have to
set up accounts. Also, I don't know if Dreamhost (is that where
yt-project is still?) allows open MySQL databases. Another advantage
is that unlike #1.5 or #2 above, costs should never rise suddenly when
an educational grant or beta period ends.
Thanks for reading, and any and all comments are welcomed.
[0] http://yt-project.org/doc/advanced/parallel_computation.html#parallelizing-…
[1] http://vimeo.com/28797703
[2] https://docs.google.com/spreadsheet/viewform?formkey=dHBwRmpHV2VicFVVNi1PaF…
(*) There are issues with locks on parallel network file systems, but
most home partitions on supercomputers are NFS (not something like
Lustre) so this shouldn't be a problem.
--
Stephen Skory
s(a)skory.us
http://stephenskory.com/
510.621.3687 (google voice)
(We encourage you to forward this message to any other interested parties.)
Hello,
Just in time for the New Year, we’re happy to announce the release of
yt version 2.3! ( http://yt-project.org/ ) The new version includes
many new modules and enhancements, and the usual set of bug fixes over
the last point release. We encourage all users to upgrade to take
advantage of the changes.
yt is a community-developed analysis and visualization toolkit for
astrophysical simulation data. yt provides full support for Enzo,
Orion, Nyx, and FLASH codes, with preliminary support for the RAMSES
code (and a handful of others.) It can be used to create many common
types of data products, as well as serving as a library for developing
your own data reductions and processes.
Below is a non-comprehensive list of new features and enhancements:
* Improved and expanded documentation located at
http://yt-project.org/doc/.* Boolean logic data containers (joins,
intersections and nots) to select arbitrary data regions.* Multi-level
parallelism for subgroups of MPI tasks.* Extensive answer tests.*
Isocontouring and flux-over-surface calculations, with WebGL
interface.* A reorganized field system.* Adaptive resolution
HEALpix-based all-sky volume rendering.* Radial column density
calculations.* Memory usage, performance enhancements and bug fixes
throughout the code.
Everything, from installation, to development, to a cookbook, can be
found on the homepage: http://yt-project.org/
We have updated the libraries installed with the install script; for
more information, see the “Dependencies” section of the yt docs at
http://yt-project.org/doc/advanced/installing.html.
Development has been sponsored by the NSF, DOE, and various University
funding. We invite you to get involved with developing and using yt!
We’re also holding the FIRST YT WORKSHOP from January 24-26 at the
FLASH center in Chicago. See the workshop homepage for more
information! http://yt-project.org/workshop2012/
Please forward this announcement to interested parties.
Sincerely, The yt development team
Hi all,
As you may know, we're holding a yt workshop in late January in
Chicago. We're capping attendees, and approaching that limit, so if
you were putting off registering please go ahead and fill out the
form:
http://yt-project.org/workshop2012/
NOTE: if you filled out the survey of interest a couple months ago,
that does not count as a "binding" registration that secures your
spot. If you're not sure, email me directly and I'll confirm your
registration status; the registration form asks a few questions the
survey didn't, like "Which code do you use?" and "What's your
experience level with Python?" We're planning on putting all the
talks and materials online, so even if you can't make it you can still
participate.
-Matt
Hi all,
Just a reminder that yt-2.3 is set to be released on 15 December.
There are still some open tickets remaining:
https://bitbucket.org/yt_analysis/yt/issues?status=new&status=open&mileston…
all concerning documentation. Also, perhaps we should have another
google+ hangout soon to discuss what remains and any roadblocks. Let
me know if you're interested, and we'll try to set up a time.
thanks,
jeff
Hi all,
I think it escaped announcement, but the yt website repository is here:
https://bitbucket.org/yt_analysis/website/
If you want to make changes the easiest way is probably to fork it,
make your changes, and issue a PR. There have been a couple things
brought up recently that it could use:
* Better textual layout
* Better / different style
* More concise prose
* Removal of all the bitbucket wiki links
* Consolidation of information (i.e., develop & community?)
* Removal of unnecessary items or things that don't need to be on the
front page
Anyway, it's open season, so have at it if you want to see things
improved or changed.
-Matt
Hi All,
Per today's developer meeting, we decided on a 15 Dec 2011 release
date for 2.3. The only remaining outstanding tickets concern
documentation, and we want to ensure that the documentation undergoes
some kind of quality control before the release. Unlike the code
itself, which can (and will) be submitted to automated answer testing,
the documentation will need to be tested by actual humans. We decided
the best way to go about this would be to have a potential pull
request acceptor first test the documentation by reading it and
attempting to use the new feature before accepting. This shouldn't
take too long, and should ideally be done using the acceptor's own
data (to ensure that at least some modification of the script is
taking place).
If there are any questions or comments about this, let me know. If
people not in the meeting this morning are satisfied with this, we'll
just adopt it as we go to close the tickets sam has added here:
https://bitbucket.org/yt_analysis/yt/issues?status=new&status=open&mileston…
thanks,
j