[Numpy-discussion] A better median function?

David Goldsmith d_l_goldsmith at yahoo.com
Fri Aug 21 12:55:48 EDT 2009


Ouch, didn't check my to address first, sorry!!!

DG

--- On Fri, 8/21/09, David Goldsmith <d_l_goldsmith at yahoo.com> wrote:

> From: David Goldsmith <d_l_goldsmith at yahoo.com>
> Subject: Re: [Numpy-discussion] A better median function?
> To: "Discussion of Numerical Python" <numpy-discussion at scipy.org>
> Date: Friday, August 21, 2009, 9:50 AM
> Not to make you regret your post ;-)
> but, you having readily furnished your email address, I'm
> taking the liberty of forwarding you my resume - I'm the guy
> who introduced himself yesterday by asking if you knew Don
> Hall - in case you have need of an experienced CCD data
> reduction programmer who knows Python, numpy, and
> matplotlib, as well as IDL, matlab, C/C++, and, from the
> "distant past" FORTRAN (not to mention advanced math and a
> little advanced physics, to boot).  Caveat: I'm not
> presently in a position to relocate. :-(  Thanks for
> your time and consideration,
> 
> David Goldsmith
> 
> DAVID GOLDSMITH
> 2036 Lakemoor Dr. SW
> Olympia, WA 98512
> 360-753-2318
> dgoldsmith_89 at alumni.brown.edu
> 
> Career Interests: Support of research possessing a strong
> component of one or more of the following: mathematics,
> statistics, programming, modeling, physical sciences,
> engineering, etc.
> 
> Desired salary rate: $75,000/yr.
> 
> 
> Skills
> 
>     Computer
> 
> Operating Systems: Windows, Macintosh, Unix 
> 
> Programming/Technical: Python, C/C++, SWIG, numpy,
> matplotlib, wxmpl, wxWidgets, SPE, Visual Studio .NET 2003,
> Trac, TortoiseSVN, RapidSVN, WinCVS, LAPACK, Matlab,
> Scientific Workplace, IDL, FORTRAN, Splus, Django (learning
> in progress).
> 
> Office: MS Word, Excel, PowerPoint, Outlook, Publisher,
> etc.; Page Maker; etc. 
> 
> Communications: Firefox, Thunderbird, VPN, MS Explorer,
> Netscape, NCSA Telnet, Fetch, WS FTP, telnet, ftp, lynx,
> pine, etc. 
> 
>     Other
> 
> Advanced mathematics, statistics, physics, fluid dynamics,
> engineering, etc.; technical documentation.
> 
> 
> Programming Employment
> 
> Technical Editor (Research Manager); June, 2009 to present;
> Planetary Sciences Group, Dept. of Physics, University of
> Central Florida, Orlando, FL (but working out of Olympia,
> WA).  Write and review a broad range of docstrings for
> NumPy, the standard Python module for numerical computing,
> and manage the 2009 NumPy Documentation Summer Marathon,
> including volunteer recruitment and coordination, project
> promotion, grant writing for perpetuation of the project,
> etc. 
> 
> Programming Mathematical Modeler (Functional Analyst II);
> June, 2004 through February, 2008; Emergency Response
> Division, National Oceanic and Atmospheric Administration,
> Seattle, WA (under contract with General Dynamics
> Information Technology, Fairfax, VA).  Develop 3D
> enhancements to existing 2D estuarine circulation codes and
> data visualization and analysis tools in Python and C++,
> using SWIG, numpy, C/LAPACK, ATLAS, matplotlib, wxmpl, SPE,
> Visual Studio/Visual C++, wxWidgets, RapidSVN, TortoiseSVN,
> WinCVS, etc. as development tools; confer regularly with
> other physical scientists, mathematicians, and programmers
> about these tools and other issues/projects related to
> hazardous material emergency response.
> 
> Programming Statistician (Research Associate V); May, 1999
> to September, 2001; Institute for Astronomy, University of
> Hawai`i, Hilo.  Developed IDL-based software for
> analysis of data obtained in development of solid-state
> sensor technology for the Next Generation Space Telescope,
> and other related computer activities.
> 
> Programming Research Assistant; September to December,
> 1997; Physics Dept., Univ. of Montana, Missoula. 
> Assisted in the development of a FORTRAN computational model
> for optimization of toroidal plasma confinement. 
> 
> Programming Research Assistant; June to August, 1997;
> Physics Dept., Univ. of Montana, Missoula.  Assisted in
> FORTRAN computer modeling of passive scalar transport in the
> stratosphere. 
> 
> Programming Research Assistant; June to August, 1997;
> Mathematical Sciences Dept., Univ. of Montana,
> Missoula.  Developed, in MATLAB, a
> cellular-automata-based simulation of flow around windmill
> turbine blades. 
> 
> Programming Consultant; April, 1995; Earth Justice Legal
> Defense Fund, Honolulu, Hawai`i.  Developed Excel
> spreadsheet to determine sewage discharge violations from
> municipal wastewater facility records.
> 
> Programming Research Assistant; June to August, 1985 and
> 1986; Plasma Physics Branch, Naval Research Laboratory,
> Washington, DC.  Assisted in FORTRAN computer modeling
> of plasma switching devices.
> 
> 
> Publications (abridged)
> 
> 2000, w/ D. Hall (1st author) et al., "Characterization of
> lambda_c ~ 5 micron Hg:Cd:Te Arrays for Low-Background
> Astronomy", Optical and IR Telescope Instrumentation and
> Detectors, Proceedings of SPIE, Vol. 4008, Part 2. 
> 
> 2000, w/ D. Hall (1st author) et al., "Molecular Beam
> Epitaxial Mercury Cadmium Telluride: A Quiet, Warm FPA For
> NGST", Astr. Soc. Pacific Conf. Ser., Vol. 207. 
> 
> 1997, w/ A. Ware (1st author) et al., "Stability of Small
> Aspect Ratio Toroidal Hybrid Devices", American Physical
> Society, Plasma Physics Section, Semi-annual meeting. 
> 
> 
> Education (abridged)
> 
> Master of Arts, Mathematical Sciences, University of
> Montana, Missoula, awarded May, 1998. GPA: 4.0. 
> 
> Master of Science, Aquacultural Engineering, University of
> Hawai`i, Manoa, awarded August, 1993. GPA: 3.72. 
> 
> Bachelor of Arts, Mathematics, Brown University,
> Providence, Rhode Island, awarded May, 1989. GPA: Unreported
> (Brown does not routinely calculate GPA's; unofficially:
> 3.83).
> 
> 
> References
> 
> Prof. Joseph Harrington, Ph.D., Department of Physics,
> University of Central Florida, 321-696-9914, jh at physics.ucf.edu
> 
> Debbie Payton, Branch Chief and Oceanographer, Emergency
> Response Division, NOAA, 206-526-6320, debbie.payton at noaa.gov
> 
> Glen Watabayashi, Operations Manager and Oceanographer,
> ERD, NOAA, 206-526-6324, glen.watabayashi at noaa.gov
> 
> Chris Barker, Ph.D., Oceanographer, ERD, NOAA,
> 206-526-6959, chris.barker at noaa.gov
> 
> Don Hall, Ph.D., Institute for Astronomy, University of
> Hawai`i, 808-932-2360, hall at ifa.hawaii.edu
> --- On Fri, 8/21/09, Mike Ressler <mike.ressler at alum.mit.edu>
> wrote:
> 
> > From: Mike Ressler <mike.ressler at alum.mit.edu>
> > Subject: [Numpy-discussion] A better median function?
> > To: "Discussion of Numerical Python" <numpy-discussion at scipy.org>
> > Date: Friday, August 21, 2009, 8:47 AM
> > I presented this during a lightning
> > talk at the scipy conference
> > yesterday, so again, at the risk of painting myself as
> a
> > flaming
> > idiot:
> > 
> > ---------------------
> > Wanted: A Better/Faster median() Function
> > 
> > numpy implementation uses simple sorting algorithm:
> > Sort all the data using the .sort() method
> > Return middle value (or mean of two middle values)
> > 
> > One doesn’t have to sort all data – need only the
> > middle value
> > 
> > Nicolas Devillard discusses several algorithms at
> > http://ndevilla.free.fr/median/median/index.html
> > 
> > Implemented Devillard’s version of the Numerical
> Recipes
> > select()
> > function using ctypes: 2 to 20 times faster on the
> large
> > (> 10^6
> > points) arrays I tested
> > --- Caveat: I don’t have all the bells and whistles
> of
> > the built-in
> > median function (multiple dimensions, non-contiguous,
> > etc.)
> > 
> > Any of the numpy developers interested in pursuing
> this
> > further?
> > -----------------------
> > 
> > I got a fairly loud "yes" from the back of the room
> which a
> > few of us
> > guessed was Robert Kern. I take that as generic
> interest at
> > least in
> > checking this out.
> > 
> > The background on this is that I am doing some glitch
> > finding
> > algorithms where I call median frequently. I think my
> > ultimate problem
> > is not in median(), but how I loop through the data,
> but
> > that is a
> > different discussion. What I noticed as I was
> investigating
> > was what I
> > noted in the slide above. Returning the middle of a
> sorted
> > vector is
> > not a bad thing to do (admit it, we've all done it at
> some
> > point), but
> > it does too much work. Things that are lower or higher
> than
> > the median
> > don't need to be in a perfectly sorted order if all we
> are
> > after is
> > the median value.
> > 
> > I did some googling and came up with the web page
> noted
> > above. I used
> > his modified NumRec select() function as an excuse to
> learn
> > ctypes,
> > and my initial weak attempts were successful. The
> speed ups
> > depend
> > highly on the length of the data and the randomness -
> > things that are
> > correlated or partially sorted already go quickly. My
> > caveat is that
> > my select-based median is too simple; it must have
> 1-d
> > contiguous data
> > of a predefined type. It also moves the data in
> place,
> > affecting the
> > original variable. I have no idea how this will blow
> up if
> > implemented
> > in a general purpose way.
> > 
> > Anyway, I'm not enough of a C-coder to have any hope
> of
> > improving this
> > to the point where it can be included in numpy
> itself.
> > However, if
> > someone is willing to take up the torch, I will
> volunteer
> > to assist
> > with discussion, prototyping a few routines, and
> testing (I
> > have lots
> > of real-world data). One could argue that the current
> > median
> > implementation is good enough (and it probably is for
> 99%
> > of all
> > usage), but I view this as a chance to add an
> industrial
> > strength
> > routine to the numpy base.
> > 
> > Thanks for listening.
> > 
> > Mike
> > 
> > -- 
> > mike.ressler at alum.mit.edu
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> > 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam
> protection around 
> http://mail.yahoo.com 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


      



More information about the NumPy-Discussion mailing list