Over the last little while, I've been looking at this issue and having
Back when yt-3.0 was first set up, it was built-in that the arrays
were all pre-allocated before being read. (An important note about
this is that this *pre-dates* the demeshening.) What this means is
that typically the process of reading data from any type of frontend
goes something like this:
* _identify_base_chunk -> figure out how big the whole thing is
* either read, or subdivide into chunks
When a chunk is read -- which includes the chunk style "all" that
reads all the data in a given data object in a single go -- the
destination buffer is preallocated. For grid objects, this can be
done without reading any data off of disk. The process is still
expensive, but we cache the most-recently-used grid mask every time we
count a given grid.
For particles, however, the case is different. Determining selections
of particles requires IO (our bitmap indices can only provide an upper
bound) and so calling identify_base_chunk, if it needs to size the
thing, will read data from disk. But, by the time we had implemented
a nice indexing scheme for particles, we had wedded ourselves to this,
and so it got implemented this way. It was one unrelated design
decision that was applied out of context.
Now, you might ask yourself, why do we do that anyway? Why know how
big something is? Well, it's because our old method (pre-3.0) was
along these lines:
* read each grid, one at a time (remember we only had grid support)
and then mask them
* at the end, concatenate the arrays all together
For reasonably sized data, this isn't so bad. The biggest problem is
that of fragmentation and copying -- we're making lots of little-ish
arrays in step 1, and in step 2, we make a single big array and copy
each one in, then de-allocate. This was the most painful when we read
a gigantic data object in all at once. We want to avoid the issue of
reading some 1024^3 dataset and then having a moment when the memory
But, in contrast to before, now almost all of our operations are
chunked (what we used to call "lazy") and so don't need to be able to
read a gigantic array for most things. Mind you, if you do
ds.r[:]["something"] it'll still read it in, but that is much, much
less common than it was before.
So what I have explored a bit is what happens if, for particle
datasets, we get rid of this notion that we need to know how big
something is (or rather, *exactly* how big) before we do any IO? And,
it turns out, it does make a difference -- a non-negligible one, in
fact, for times when the cost of reallocating and copying is not that
(There are some numbers on the issue I linked above.)
So that's a pretty long, rambly dive into things, but here's the thing
I wanted to bring up: I'd like to explore not pre-allocating memory or
pre-counting items in the chunking and IO systems. But it's
exploratory, and I don't know if it will pan out. So, I'd like to
invite if anyone is interested, to try it out with me, and if you have
objections, now would be a perfect time to raise them.
I'll post this email here as well:
https://github.com/yt-project/yt/issues/2412 so if you're interested,
subscribe to that issue.
If this turns out to be a worthwhile change, I will propose these
steps, all of which would be in the yt-4.0 branch:
1) Turn it off one frontend at a time and examine performance and
memory use in a few different cases
2) Once all frontends have been disabled, disable support for it in
the base chunking system itself, or at least make it optional and
consolidate references to it in the codebase
3) Update YTEP-0001 to reflect this new state of affairs
yt's performance needs to be improved, and this could be a good first
step at finding ways to do so that doesn't require a ton of surgery
and would overall *reduce* the complexity of yt.
I've been working on a PR (https://github.com/yt-project/yt/pull/2286) that converts yt's testing framework from nose to pytest. I believe that it's mostly ready to go, so I was writing to see what everyone's thoughts on this change are, if there is any feedback on my implementation, and if anyone would be willing to help review it, since there are quite a few changes. Thanks, and have a good weekend!
If anyone is interested in running GSOC, please see below. I'm happy to
help out with GSOC but don't have the bandwidth to run it. Also happy to
answer questions about it if you're interested. GSOC was very successful
for us a few years ago and it would be nice to do it again if we can.
---------- Forwarded message ---------
From: Nicole Foster <nicole(a)numfocus.org>
Date: Thu, Dec 12, 2019 at 2:51 PM
Subject: [NumFOCUS Projects] Google Summer of Code 2020
To: <projects(a)numfocus.org>, Affiliated Projects <affiliated(a)numfocus.org>
Hello, NumFOCUS project leaders!
NumFOCUS has participated in Google Summer of Code (GSoC) as a mentoring
organization since 2015 and will do so again for 2020.
*If you would like to participate in GSoC 2020 under the NumFOCUS umbrella*,
please email the NumFOCUS GSoC 2020 Coordinator: Mridul Seth <
*NumFOCUS will submit one application with as many projects underneath us
as would like to participate.*
Applications for GSoC mentoring organizations open on January 14th, and the
deadline to apply is Feb 5th. (full timeline is here
- - - - -
Below is some additional information about the program and considerations
*GSoC Last Year*
In 2019, the following projects participated in GSoC under NumFOCUS:
- Data Retriever
AstroPy, SunPy, Julia, Shogun and SymPy participated separately or under
other partner organizations.
We ran a blog series last summer highlighting all student participants
which you can access here (
https://numfocus.org/blog/meet-our-2019-gsoc-students-part-1) . It should
be enlightening in terms of student motivations and the sorts of projects
they take on through the program.
The GH repo that maintains information about the NumFOCUS process for GSoC
is here <https://github.com/numfocus/gsoc> and contains many details about
our internal processes and rules (not all of which are required by Google
but which we have found to improve outcomes for the program).
*Should My Project Participate?*
The main thing to keep in mind when considering whether to participate is
to ensure that you have a sufficient number of mentors with available
mentoring time. 2 or even 3 mentors per student helps to spread the work
around and keep things manageable for everyone.
Based on Matplotlib's experience taking on two students in 2017, they
recommend that projects who feel somewhat unsure about their mentoring
capacity be less ambitious in the number of students you accept.
Prior participants have generally had very good experiences with GSoC. It
is often cited as a primary driver behind finding new regular contributors
and eventual maintainers, so very good for the funnel of potential future
GSoC also offers a great opportunity to diversify the mix of your
contributors. Shogun, for example, has had great success in recruiting and
mentoring women through GSoC who then stayed on as project maintainers.
If you have any further questions about GSoC participation, please reach
out to Mridul Seth <seth.mridul(a)gmail.com>.
Executive Operations Administrator, NumFOCUS
You received this message because you are subscribed to the Google Groups
"Fiscally Sponsored Project Representatives" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to projects+unsubscribe(a)numfocus.org.
To view this discussion on the web visit
This message is a follow-up of a discussion from yt-users.
I was having problems understanding the operation of
camera.rotate() (and camera.yaw() which is just a
I took a look at the code again, no rotation
of any kind changes the camera.focus object, which again I
find counterintuitive. Focus appears to be set and used in e.g.
camera.__init__() but then never used later.
Thus, my idea would be to update camera.__repr__() to show
something more informative, such as camera.lens.origin
and/or possibly some information from camera.orientation, and
then explain somewhere in the docs what the difference between
that and camera.position is. It is important for me to be able
to determine what the camera is doing because I am plotting
text labels in the 3D volume and I want to move the text labels
while moving the camera. Of course, I'm sure other people
have reasons for wanting the camera properties as well.
Also, If there's a standard projective geometry reference
which has a description of these things in the same language as
the yt implementation somewhere I'd love to see it. I'm not
sure if modifications of the camera class are part of your
4.0 work or not...
Andrew W. Steiner
Joint Faculty Asst. Prof. at UTK/ORNL
It is my pleasure to announce that applications are now open for Python in Astronomy 2020, to be held 20 - 24 April 2020 at Trinity College, Dublin in Dublin, Ireland.
Though the application form will be open until 23:59 UTC on 6 January 2020, I encourage you to complete the form soon to make sure you don’t miss the deadline.
The application form is at: https://forms.gle/mtdm6QKENdY8Y1Ph9
More information about the conference, including links to past years, is available at: http://openastronomy.org/pyastro/2020/
Finally, a brief excerpt from the description of the conference:
In addition to sharing information about state-of-the art Python Astronomy packages, the workshop will focus on improving interoperability between astronomical Python packages, mentoring current open-source contributors, and developing educational materials for Python in Astronomy. The meeting is therefore not only aimed at current developers, but also educators and research group leaders who are interested in being involved in these efforts.
Participant selection will be made with the goal of enhancing the Python in Astronomy community and we encourage requests to attend from all career levels. Effort will also be made to select participants who have contributed meaningfully to the Python in Astronomy ecosystem via providing educational materials, documentation, and/or code contributions. This conference is neither intended to be an introduction to Python nor only for expert-level Python developers.
On behalf of the SOC: Monica Bobra (co-chair), Andrew Leonard (co-chair), Will Barnes, Clara Brasseur, Juan Luis Cano, Rebecca Lange, Sophie Murray
I wanted to share with you a job posting to work with us at the
University of Illinois. It’s available at:
The short description is that we’re looking to invest in the yt
infrastructure, and bring it more closely in line with the modern
pydata ecosystem. This would mean working to utilize and interoperate
with libraries like dask and xarray, and will include designing and
documenting software changes and infrastructure. Plus, it will also
involve working with libraries used throughout both the scientific
ecosystem and in industry in modern data science. And, you’ll get to
work with fun people in both yt community *and* the pydata community!
All the work will be open source and this project is committed to
contributing to the broader ecosystem *wherever* we can.
I hope you’ll consider either applying, or passing this along to
someone you know that might be interested. All of the information in
the job posting should be considered authoritative, rather than this
email. If you’ve got any questions (including about specifics of the
job, the application process, etc) please do reach out to the contact
person on the form!
The frontend for AMRVAC (http://amrvac.org/) was just merged, thanks to the
persistence of that frontend's authors. Both of them have been working with
us on github and the yt slack and have shown an interest in further
improving support and yt in general.
To empower that I'd like to nominate them as project members. We'll need
three other project members to reply with a +1. When that happens I'll add
them to the website and give them a commit bit.
What: yt user/developer workshop
Where: Edinburgh, UK
When: June 29 to July 3, 2020
I'm very pleased to announce that there will be a yt user/developer
workshop at the Higgs Centre for Theoretical Physics at University of
Edinburgh from June 29 to July 3, 2020. The workshop will begin with a
couple days of tutorials for new users and then transition into development
activities. This will be a good opportunity to meet and join the yt
community. More information to follow, but mark your calendars and diaries
now. I hope to see you next summer!
We used to have a weekly PR triage meeting, but then, we stopped
having one! I have re-created one, set up for 9:30AM Central time on
Fridays for the next couple weeks. They worked really well while we
did them. I've also left it be a bit long so that if there's any
non-PR-triage work we can or should do, we can do that too.
I know this isn't ideal for West Coast folks, but I'd be happy to set
up another one as well. Looking forward to seeing you! The
invitation is copied below, and there's a link to a URL that will add
it to your calendar. I'll try (but will likely forget) to send out
Matthew Turk is inviting you to a scheduled Zoom meeting.
Topic: yt PR triage and co-working
Time: Sep 20, 2019 09:30 AM Central Time (US and Canada)
Every week on Fri, until Nov 1, 2019, 7 occurrence(s)
Sep 20, 2019 09:30 AM
Sep 27, 2019 09:30 AM
Oct 4, 2019 09:30 AM
Oct 11, 2019 09:30 AM
Oct 18, 2019 09:30 AM
Oct 25, 2019 09:30 AM
Nov 1, 2019 09:30 AM
Please download and import the following iCalendar (.ics) files to
your calendar system.
Join Zoom Meeting
One tap mobile
+19292056099,,344730807# US (New York)
+16699006833,,344730807# US (San Jose)
Dial by your location
+1 929 205 6099 US (New York)
+1 669 900 6833 US (San Jose)
+1 647 558 0588 Canada
+49 30 3080 6188 Germany
+49 30 5679 5800 Germany
+49 69 7104 9922 Germany
+82 2 6105 4111 South Korea
+82 2 6022 2322 South Korea
+44 203 051 2874 United Kingdom
+44 203 481 5237 United Kingdom
+44 203 966 3809 United Kingdom
+44 131 460 1196 United Kingdom
+81 524 564 439 Japan
+81 3 4578 1488 Japan
+61 8 7150 1149 Australia
+61 2 8015 6011 Australia
+52 554 161 4288 Mexico
+52 229 910 0061 Mexico
Meeting ID: 344 730 807
Find your local number: https://zoom.us/u/abzljuNmCN
Join by SIP
Join by H.323
18.104.22.168 (US West)
22.214.171.124 (US East)
126.96.36.199 (Hong Kong)
Meeting ID: 344 730 807
Join by Skype for Business
As many of you know, the yt-project has been growing and evolving over the
past several years (to new domains, to new datasets, to support new
packages, etc.). To accompany that growth, we'd like to take the
opportunity to update our governance structure. We've been working on
updating our governance model for the yt project and I've submitted two
companion PRs with drafts of this updated governance:
Our existing governance structure is located at
The first PR is to a new governance repository, where our governance
documentation will live separately from the YTEPs. This will allow us to
maintain and minorly update our governance without having to update YTEP.
The second PR is to the YTEP repository and generally outlines the core
values and ideas we want our governance structure to reflect. Hopefully all
of the things I've listed in the YTEP are reflected in the governance docs.
As members of the community, I'd like to solicit feedback from all of you
about these governance documents. Do these reflect our community values?
Should we add anything? Do you feel everything is clear? Is this too much
governance for our community right now? Is there something that's missing?
Feel free to reply here or comment on the pull requests! Our governance
will be better with your feedback.
PS - I tried to build in a mentorship structure into our maintainer
structure to help with onboarding new maintainers. I'd especially like to
know if you all think this would be valuable to you or if it is adding
unnecessary constraints to our community.