[Matplotlib-devel] Units discussion...

Thu Feb 8 16:47:17 EST 2018

FYI for anyone interested - we already submitted (around the time of the first unit submit code in 2009) a mock of up our unit and time classes, with converters and tickers which is located in matplotlib/testing/jpl_units/.  It doesn't appear to be used in any tests anymore but it's there if anyone wants to look at it and was used in the original unit API testing.  

I think everyone isn't in as much disagreement as it appears.  The way MPL works right now, it's easy for dev's who aren't familiar with units to write code that works and appears correct, but fails for some cases like units.  And they won't know that until a user runs into that case.  So we should work to improve this situation.  The solution will most likely be some combination of code changes, clearer dev docs, and more and better test cases.

I think a big problem is that the plots have no defined internal data representation.  Since Python is untyped, it's easy to write code that works for one test case but fails others you might not think of.  It also means that inside a plot method, a developer really doesn't know what functions they're allowed to use.  Is the data variable a list?  Is it unitized?  Is it integers? floats? a numpy array?  

That's why I'd propose that for any numeric data type, the unit converter must return a numpy array of floats.  Then the plot code (and dev docs) can be very explicit about what functionality can be used and you can be sure that after the external->internal converter is run, you know what the data type is.  If done properly, I think this actually makes the existing code simpler.  We can have a sequence of converters that try to run on the input which would include "standard" types like lists and integers.  So if a user puts in a Python list of integers, floats, numpy, or their own type, etc, the developer knows that once the converter at the top of the method runs, they have a numpy array of floats to work with and there is no guess as to what functions will work or not work.

If this works, then it can be "the one way" to write a plot function for numeric data and every method can have the conversion as the first step.

Ted
ps: I think this dev list is the best forum for this discussion unless you can arrange a conference where we can all meet up.  I find gitter is too hard to follow unless you're watching it in real time.  A forum thread would be better IMO, but we don't have that.  

________________________________________
From: Nathan Goldbaum <nathan12343 at gmail.com>
Sent: Thursday, February 8, 2018 12:13 PM
To: Drain, Theodore R (392P)
Cc: matplotlib development list
Subject: Re: [Matplotlib-devel] Units discussion...

On Thu, Feb 8, 2018 at 1:08 PM, Drain, Theodore R (392P) <theodore.r.drain at jpl.nasa.gov<mailto:theodore.r.drain at jpl.nasa.gov>> wrote:
Does numpy subclassing really matter?  If the docs say the unit converter must convert from the external type to the internal type, then as long as the converter does that, it doesn't matter what the external type is or what it inherits from right?  The point is that the converter class is the only class manipulating the external data objects - MPL shouldn't care what they are or what they inherit from.

To make my statement more concrete, here's a matplotlib pull request that fixed a bug that only triggered for astropy and yt but not for pint:

https://github.com/matplotlib/matplotlib/pull/6622

In this case it was an issue because of difference in how NumPy's masked array deals with ndarray subclasses versus array wrapper classes.

I think one issue is that data types are malleable in the API right now.  Lists, tuples, numpy, ints, floats, etc are all possible inputs in many/most cases.  IMO, the unit API should not be malleable at all.  The unit converter API should say that the return type of external->internal conversion is always a specific value type (e.g. list of float, numpy float 64 array).

Jody: IMO, your example should plot the data in inches in the first plot call, then convert the second input to inches and plot that.  The plot calls supports the xunits keyword argument which tells the converter what floating point unit conversion to apply.  If that keyword is not specified, then it defaults to the type of the input.  The example that needs to be more clear is if I do this:

ax.plot( x1, y1, xunits="km" )
ax.plot( x2, y2, xunits="miles" )

IMO, either the floats are km or miles, not both.  So either the first call sticks the converter to using km and the second xunits is ignored.  Or the second input overrides the first and requires that the first artists go back through a conversion to miles.  Either is a reasonable choice for behavior (but the first is much easier to implement).