[Numpy-discussion] A better median function?

David Goldsmith d_l_goldsmith at yahoo.com
Fri Aug 21 12:50:00 EDT 2009


Not to make you regret your post ;-) but, you having readily furnished your email address, I'm taking the liberty of forwarding you my resume - I'm the guy who introduced himself yesterday by asking if you knew Don Hall - in case you have need of an experienced CCD data reduction programmer who knows Python, numpy, and matplotlib, as well as IDL, matlab, C/C++, and, from the "distant past" FORTRAN (not to mention advanced math and a little advanced physics, to boot).  Caveat: I'm not presently in a position to relocate. :-(  Thanks for your time and consideration,

David Goldsmith

DAVID GOLDSMITH
2036 Lakemoor Dr. SW
Olympia, WA 98512
360-753-2318
dgoldsmith_89 at alumni.brown.edu

Career Interests: Support of research possessing a strong component of one or more of the following: mathematics, statistics, programming, modeling, physical sciences, engineering, etc.

Desired salary rate: $75,000/yr.


Skills

    Computer

Operating Systems: Windows, Macintosh, Unix 

Programming/Technical: Python, C/C++, SWIG, numpy, matplotlib, wxmpl, wxWidgets, SPE, Visual Studio .NET 2003, Trac, TortoiseSVN, RapidSVN, WinCVS, LAPACK, Matlab, Scientific Workplace, IDL, FORTRAN, Splus, Django (learning in progress).

Office: MS Word, Excel, PowerPoint, Outlook, Publisher, etc.; Page Maker; etc. 

Communications: Firefox, Thunderbird, VPN, MS Explorer, Netscape, NCSA Telnet, Fetch, WS FTP, telnet, ftp, lynx, pine, etc. 

    Other

Advanced mathematics, statistics, physics, fluid dynamics, engineering, etc.; technical documentation.


Programming Employment

Technical Editor (Research Manager); June, 2009 to present; Planetary Sciences Group, Dept. of Physics, University of Central Florida, Orlando, FL (but working out of Olympia, WA).  Write and review a broad range of docstrings for NumPy, the standard Python module for numerical computing, and manage the 2009 NumPy Documentation Summer Marathon, including volunteer recruitment and coordination, project promotion, grant writing for perpetuation of the project, etc. 

Programming Mathematical Modeler (Functional Analyst II); June, 2004 through February, 2008; Emergency Response Division, National Oceanic and Atmospheric Administration, Seattle, WA (under contract with General Dynamics Information Technology, Fairfax, VA).  Develop 3D enhancements to existing 2D estuarine circulation codes and data visualization and analysis tools in Python and C++, using SWIG, numpy, C/LAPACK, ATLAS, matplotlib, wxmpl, SPE, Visual Studio/Visual C++, wxWidgets, RapidSVN, TortoiseSVN, WinCVS, etc. as development tools; confer regularly with other physical scientists, mathematicians, and programmers about these tools and other issues/projects related to hazardous material emergency response.

Programming Statistician (Research Associate V); May, 1999 to September, 2001; Institute for Astronomy, University of Hawai`i, Hilo.  Developed IDL-based software for analysis of data obtained in development of solid-state sensor technology for the Next Generation Space Telescope, and other related computer activities.

Programming Research Assistant; September to December, 1997; Physics Dept., Univ. of Montana, Missoula.  Assisted in the development of a FORTRAN computational model for optimization of toroidal plasma confinement. 

Programming Research Assistant; June to August, 1997; Physics Dept., Univ. of Montana, Missoula.  Assisted in FORTRAN computer modeling of passive scalar transport in the stratosphere. 

Programming Research Assistant; June to August, 1997; Mathematical Sciences Dept., Univ. of Montana, Missoula.  Developed, in MATLAB, a cellular-automata-based simulation of flow around windmill turbine blades. 

Programming Consultant; April, 1995; Earth Justice Legal Defense Fund, Honolulu, Hawai`i.  Developed Excel spreadsheet to determine sewage discharge violations from municipal wastewater facility records.

Programming Research Assistant; June to August, 1985 and 1986; Plasma Physics Branch, Naval Research Laboratory, Washington, DC.  Assisted in FORTRAN computer modeling of plasma switching devices.


Publications (abridged)

2000, w/ D. Hall (1st author) et al., "Characterization of lambda_c ~ 5 micron Hg:Cd:Te Arrays for Low-Background Astronomy", Optical and IR Telescope Instrumentation and Detectors, Proceedings of SPIE, Vol. 4008, Part 2. 

2000, w/ D. Hall (1st author) et al., "Molecular Beam Epitaxial Mercury Cadmium Telluride: A Quiet, Warm FPA For NGST", Astr. Soc. Pacific Conf. Ser., Vol. 207. 

1997, w/ A. Ware (1st author) et al., "Stability of Small Aspect Ratio Toroidal Hybrid Devices", American Physical Society, Plasma Physics Section, Semi-annual meeting. 


Education (abridged)

Master of Arts, Mathematical Sciences, University of Montana, Missoula, awarded May, 1998. GPA: 4.0. 

Master of Science, Aquacultural Engineering, University of Hawai`i, Manoa, awarded August, 1993. GPA: 3.72. 

Bachelor of Arts, Mathematics, Brown University, Providence, Rhode Island, awarded May, 1989. GPA: Unreported (Brown does not routinely calculate GPA's; unofficially: 3.83).


References

Prof. Joseph Harrington, Ph.D., Department of Physics, University of Central Florida, 321-696-9914, jh at physics.ucf.edu

Debbie Payton, Branch Chief and Oceanographer, Emergency Response Division, NOAA, 206-526-6320, debbie.payton at noaa.gov

Glen Watabayashi, Operations Manager and Oceanographer, ERD, NOAA, 206-526-6324, glen.watabayashi at noaa.gov

Chris Barker, Ph.D., Oceanographer, ERD, NOAA, 206-526-6959, chris.barker at noaa.gov

Don Hall, Ph.D., Institute for Astronomy, University of Hawai`i, 808-932-2360, hall at ifa.hawaii.edu
--- On Fri, 8/21/09, Mike Ressler <mike.ressler at alum.mit.edu> wrote:

> From: Mike Ressler <mike.ressler at alum.mit.edu>
> Subject: [Numpy-discussion] A better median function?
> To: "Discussion of Numerical Python" <numpy-discussion at scipy.org>
> Date: Friday, August 21, 2009, 8:47 AM
> I presented this during a lightning
> talk at the scipy conference
> yesterday, so again, at the risk of painting myself as a
> flaming
> idiot:
> 
> ---------------------
> Wanted: A Better/Faster median() Function
> 
> numpy implementation uses simple sorting algorithm:
> Sort all the data using the .sort() method
> Return middle value (or mean of two middle values)
> 
> One doesn’t have to sort all data – need only the
> middle value
> 
> Nicolas Devillard discusses several algorithms at
> http://ndevilla.free.fr/median/median/index.html
> 
> Implemented Devillard’s version of the Numerical Recipes
> select()
> function using ctypes: 2 to 20 times faster on the large
> (> 10^6
> points) arrays I tested
> --- Caveat: I don’t have all the bells and whistles of
> the built-in
> median function (multiple dimensions, non-contiguous,
> etc.)
> 
> Any of the numpy developers interested in pursuing this
> further?
> -----------------------
> 
> I got a fairly loud "yes" from the back of the room which a
> few of us
> guessed was Robert Kern. I take that as generic interest at
> least in
> checking this out.
> 
> The background on this is that I am doing some glitch
> finding
> algorithms where I call median frequently. I think my
> ultimate problem
> is not in median(), but how I loop through the data, but
> that is a
> different discussion. What I noticed as I was investigating
> was what I
> noted in the slide above. Returning the middle of a sorted
> vector is
> not a bad thing to do (admit it, we've all done it at some
> point), but
> it does too much work. Things that are lower or higher than
> the median
> don't need to be in a perfectly sorted order if all we are
> after is
> the median value.
> 
> I did some googling and came up with the web page noted
> above. I used
> his modified NumRec select() function as an excuse to learn
> ctypes,
> and my initial weak attempts were successful. The speed ups
> depend
> highly on the length of the data and the randomness -
> things that are
> correlated or partially sorted already go quickly. My
> caveat is that
> my select-based median is too simple; it must have 1-d
> contiguous data
> of a predefined type. It also moves the data in place,
> affecting the
> original variable. I have no idea how this will blow up if
> implemented
> in a general purpose way.
> 
> Anyway, I'm not enough of a C-coder to have any hope of
> improving this
> to the point where it can be included in numpy itself.
> However, if
> someone is willing to take up the torch, I will volunteer
> to assist
> with discussion, prototyping a few routines, and testing (I
> have lots
> of real-world data). One could argue that the current
> median
> implementation is good enough (and it probably is for 99%
> of all
> usage), but I view this as a chance to add an industrial
> strength
> routine to the numpy base.
> 
> Thanks for listening.
> 
> Mike
> 
> -- 
> mike.ressler at alum.mit.edu
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 



More information about the NumPy-Discussion mailing list