I'm breaking Ranger... before I reply, do any of you have something intelligent to add that I can say? Because the .cpu files aren't spatially restricted, each thread of parallel HOP may have to access many of the .cpu files, which is how the MDS server gets pummeled. _______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________ ----- Forwarded Message ----
From: Tommy Minyard <minyard@tacc.utexas.edu> To: "sskory@physics.ucsd.edu" <sskory@physics.ucsd.edu> Sent: Thursday, April 30, 2009 12:54:49 PM Subject: X1024 jobs on Ranger
Hello Stephen,
In our monitoring of Ranger, we've noticed that some of your recent jobs named x1024 seem to be causing an abnormally high load on the /scratch filesystem meta-data server (MDS). From our monitoring, it appears that the MDS load goes way up when your job initially begins to run for up to the first 30 minutes to hour, but then it drops back down to a more reasonable load after the job has been running for a while.
Do you have any idea what may be triggering such a high load from the application you are running? It does not seem to cause any major problems or generate errors, however, the filesystem access becomes much more sluggish when the MDS load is so high. If you could give us a few more details that might help explain the high load or point us to the source code for your application, we want to check and confirm that the MDS is acting as it should.
Thanks, Tommy
____________________________________________________________________ Tommy Minyard, Ph.D. - Assoc. Director (512) 232-6578 Advanced Computing Systems Group (512) 475-9445 (fax) Texas Advanced Computing Center http://www.tacc.utexas.edu The University of Texas at Austin minyard@tacc.utexas.edu
Hi Stephen, Not sure what to tell you. However, I would like to note that HOP does not in any way pool file access. Projections did -- they do not any more as of yesterday or the day before -- but HOP does not. You can try to implement this via the preload command, which you will also see inside the profiling module. We could improve IO by making the DataQueue object aware of which grids will be accessed and then doing pool-on-demand, where you'd call 'preload', it'd know which grids to pool access to, and then when *one* is accessed, all the others in that CPU file would also get pulled. Unfortunately, I cannot give my time to this today, but maybe you could start on that path and see what you can come up with? -Matt On Thu, Apr 30, 2009 at 1:10 PM, Stephen Skory <stephenskory@yahoo.com> wrote:
I'm breaking Ranger...
before I reply, do any of you have something intelligent to add that I can say? Because the .cpu files aren't spatially restricted, each thread of parallel HOP may have to access many of the .cpu files, which is how the MDS server gets pummeled.
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
----- Forwarded Message ----
From: Tommy Minyard <minyard@tacc.utexas.edu> To: "sskory@physics.ucsd.edu" <sskory@physics.ucsd.edu> Sent: Thursday, April 30, 2009 12:54:49 PM Subject: X1024 jobs on Ranger
Hello Stephen,
In our monitoring of Ranger, we've noticed that some of your recent jobs named x1024 seem to be causing an abnormally high load on the /scratch filesystem meta-data server (MDS). From our monitoring, it appears that the MDS load goes way up when your job initially begins to run for up to the first 30 minutes to hour, but then it drops back down to a more reasonable load after the job has been running for a while.
Do you have any idea what may be triggering such a high load from the application you are running? It does not seem to cause any major problems or generate errors, however, the filesystem access becomes much more sluggish when the MDS load is so high. If you could give us a few more details that might help explain the high load or point us to the source code for your application, we want to check and confirm that the MDS is acting as it should.
Thanks, Tommy
____________________________________________________________________ Tommy Minyard, Ph.D. - Assoc. Director (512) 232-6578 Advanced Computing Systems Group (512) 475-9445 (fax) Texas Advanced Computing Center http://www.tacc.utexas.edu The University of Texas at Austin minyard@tacc.utexas.edu
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Matt,
Not sure what to tell you. However, I would like to note that HOP does not in any way pool file access. Projections did -- they do not any more as of yesterday or the day before -- but HOP does not. You can try to implement this via the preload command, which you will also see inside the profiling module. We could improve IO by making the DataQueue object aware of which grids will be accessed and then doing pool-on-demand, where you'd call 'preload', it'd know which grids to pool access to, and then when *one* is accessed, all the others in that CPU file would also get pulled.
I'll take a look to see if I can figure it out. I told him for now that I'd stay away unless I can come up with something better. I guess the MDS server doesn't like it when you read 440,000*(1+padding)^3 grids... _______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
I told him for now that I'd stay away unless I can come up with something better. I guess the MDS server doesn't like it when you read 440,000*(1+padding)^3 grids...
Yeah. The best we can do, I think, for spatially-decomposed systems, is to pool as many accesses as possible. We have the basics of the infrastructure in place, so it would not be the worst thing in the world to implement this. -Matt
participants (2)
-
Matthew Turk
-
Stephen Skory