[Matplotlib-devel] Units discussion...

Drain, Theodore R (392P) theodore.r.drain at jpl.nasa.gov
Thu Feb 8 18:18:54 EST 2018


David,
What exactly is the proposal?  I'm not sure I fully understand that.  The existing unit system basically already does conversion to float, it's just not applied very evenly or in quite the same way in all the methods.

But - I wonder if this is the wrong first step to take.  At this point, do we know which methods are working fine as is and which ones are not?  Maybe it would be better to start writing a comprehensive set of unit test cases that pass different unitized data to each Axes method.  That could serve to define what we expect to happen and would help identify methods that don't currently work.  Failing unit tests are a nice way to identify code that needs to change and would help others know exactly what the "right" behavior is supposed to be.  Then code changes could be made to start correctly those issues.

Then a similar set of unit tests with unitized data could be written for artists and the same process could be repeated.  Once artists handle unitized data, it may also simplify the plot methods as well - at least for the ones whose primary role is to just build an artists.

Ted

________________________________________
From: David Stansby <dstansby at gmail.com>
Sent: Thursday, February 8, 2018 2:11 PM
To: Drain, Theodore R (392P)
Cc: matplotlib development list
Subject: Re: [Matplotlib-devel] Units discussion...

I agree with everything you've said there.

I propose to have a go at implementing what I proposed in the next few weeks - on the surface it seems to me like it will simplify things a lot, but I guess I'll see as I go how hard it actually is! If it works it will be a bit of an upheaval for 3rd parties who use units at the moment, but should be worth it in the long run.

David

On 8 February 2018 at 21:47, Drain, Theodore R (392P) <theodore.r.drain at jpl.nasa.gov<mailto:theodore.r.drain at jpl.nasa.gov>> wrote:
FYI for anyone interested - we already submitted (around the time of the first unit submit code in 2009) a mock of up our unit and time classes, with converters and tickers which is located in matplotlib/testing/jpl_units/.  It doesn't appear to be used in any tests anymore but it's there if anyone wants to look at it and was used in the original unit API testing.

I think everyone isn't in as much disagreement as it appears.  The way MPL works right now, it's easy for dev's who aren't familiar with units to write code that works and appears correct, but fails for some cases like units.  And they won't know that until a user runs into that case.  So we should work to improve this situation.  The solution will most likely be some combination of code changes, clearer dev docs, and more and better test cases.

I think a big problem is that the plots have no defined internal data representation.  Since Python is untyped, it's easy to write code that works for one test case but fails others you might not think of.  It also means that inside a plot method, a developer really doesn't know what functions they're allowed to use.  Is the data variable a list?  Is it unitized?  Is it integers? floats? a numpy array?

That's why I'd propose that for any numeric data type, the unit converter must return a numpy array of floats.  Then the plot code (and dev docs) can be very explicit about what functionality can be used and you can be sure that after the external->internal converter is run, you know what the data type is.  If done properly, I think this actually makes the existing code simpler.  We can have a sequence of converters that try to run on the input which would include "standard" types like lists and integers.  So if a user puts in a Python list of integers, floats, numpy, or their own type, etc, the developer knows that once the converter at the top of the method runs, they have a numpy array of floats to work with and there is no guess as to what functions will work or not work.

If this works, then it can be "the one way" to write a plot function for numeric data and every method can have the conversion as the first step.

Ted
ps: I think this dev list is the best forum for this discussion unless you can arrange a conference where we can all meet up.  I find gitter is too hard to follow unless you're watching it in real time.  A forum thread would be better IMO, but we don't have that.

________________________________________
From: Nathan Goldbaum <nathan12343 at gmail.com<mailto:nathan12343 at gmail.com>>
Sent: Thursday, February 8, 2018 12:13 PM
To: Drain, Theodore R (392P)
Cc: matplotlib development list
Subject: Re: [Matplotlib-devel] Units discussion...

On Thu, Feb 8, 2018 at 1:08 PM, Drain, Theodore R (392P) <theodore.r.drain at jpl.nasa.gov<mailto:theodore.r.drain at jpl.nasa.gov><mailto:theodore.r.drain at jpl.nasa.gov<mailto:theodore.r.drain at jpl.nasa.gov>>> wrote:
Does numpy subclassing really matter?  If the docs say the unit converter must convert from the external type to the internal type, then as long as the converter does that, it doesn't matter what the external type is or what it inherits from right?  The point is that the converter class is the only class manipulating the external data objects - MPL shouldn't care what they are or what they inherit from.

To make my statement more concrete, here's a matplotlib pull request that fixed a bug that only triggered for astropy and yt but not for pint:

https://github.com/matplotlib/matplotlib/pull/6622

In this case it was an issue because of difference in how NumPy's masked array deals with ndarray subclasses versus array wrapper classes.

I think one issue is that data types are malleable in the API right now.  Lists, tuples, numpy, ints, floats, etc are all possible inputs in many/most cases.  IMO, the unit API should not be malleable at all.  The unit converter API should say that the return type of external->internal conversion is always a specific value type (e.g. list of float, numpy float 64 array).

Jody: IMO, your example should plot the data in inches in the first plot call, then convert the second input to inches and plot that.  The plot calls supports the xunits keyword argument which tells the converter what floating point unit conversion to apply.  If that keyword is not specified, then it defaults to the type of the input.  The example that needs to be more clear is if I do this:

ax.plot( x1, y1, xunits="km" )
ax.plot( x2, y2, xunits="miles" )

IMO, either the floats are km or miles, not both.  So either the first call sticks the converter to using km and the second xunits is ignored.  Or the second input overrides the first and requires that the first artists go back through a conversion to miles.  Either is a reasonable choice for behavior (but the first is much easier to implement).
_______________________________________________
Matplotlib-devel mailing list
Matplotlib-devel at python.org<mailto:Matplotlib-devel at python.org>
https://mail.python.org/mailman/listinfo/matplotlib-devel



More information about the Matplotlib-devel mailing list