Guidance regarding what counts as breaking backwards compatibility
Hi all, Over on the Python-ideas list, there's a thread about the new statistics module, and as the author of that module, I'm looking for a bit of guidance regarding backwards compatibility. Specifically two issues: (1) With numeric code, what happens if the module become more[1] accurate in the future? Does that count as breaking backwards compatibility? E.g. Currently I use a particular algorithm for calculating variance. Suppose that for a particular data set, this algorithm is accurate to (say) seven decimal places: # Python 3.4 variance(some_data) == 1.23456700001 Later, I find a better algorithm, which improves the accuracy of the result: # Python 3.5 or 3.6 variance(some_data) == 1.23456789001 Would this count as breaking backwards compatibility? If so, how should I handle this? I don't claim that the current implementation of the statistics module is optimal, as far as precision and accuracy is concerned. It may improve in the future. Or would that count as a bug-fix? "Variance function was inaccurate, now less wrong", perhaps. I suppose the math module has the same issue, except that it just wraps the C libraries, which are mature and stable and unlikely to change. The random module has a similar issue: http://docs.python.org/3/library/random.html#notes-on-reproducibility (2) Mappings[2] are iterable. That means that functions which expect sequences or iterators may also operate on mappings by accident. For example, sum({1: 100, 2: 200}) returns 3. If one wanted to reserve the opportunity to handle mappings specifically in the future, without being locked in by backwards-compatibility, how should one handle it? a) document that behaviour with mappings is unsupported and may change in the future; b) raise a warning when passed a mapping, but still iterate over it; c) raise an exception and refuse to iterate over the mapping; d) something else? Question (2) is of course a specific example of a more general question, to what degree is the library author responsible for keeping backwards compatibility under circumstances which are not part of the intended API, but just work by accident? [1] Or, for the sake of the argument, less accurate. [2] And sets. -- Steven
On 2/1/2014 8:06 PM, Steven D'Aprano wrote:
Hi all,
Over on the Python-ideas list, there's a thread about the new statistics module, and as the author of that module, I'm looking for a bit of guidance regarding backwards compatibility. Specifically two issues:
(1) With numeric code, what happens if the module become more[1] accurate in the future? Does that count as breaking backwards compatibility?
E.g. Currently I use a particular algorithm for calculating variance. Suppose that for a particular data set, this algorithm is accurate to (say) seven decimal places:
# Python 3.4 variance(some_data) == 1.23456700001
Later, I find a better algorithm, which improves the accuracy of the result:
# Python 3.5 or 3.6 variance(some_data) == 1.23456789001
Would this count as breaking backwards compatibility? If so, how should I handle this? I don't claim that the current implementation of the statistics module is optimal, as far as precision and accuracy is concerned. It may improve in the future.
Or would that count as a bug-fix? "Variance function was inaccurate, now less wrong", perhaps.
That is my inclination.
I suppose the math module has the same issue, except that it just wraps the C libraries, which are mature and stable and unlikely to change.
Because C libraries differ, math results differ even in the same version, so they can certainly change (hopefully improve) in future versions. I think the better analogy is cmath, which I believe is more than just a wrapper.
The random module has a similar issue:
http://docs.python.org/3/library/random.html#notes-on-reproducibility
(2) Mappings[2] are iterable. That means that functions which expect sequences or iterators may also operate on mappings by accident.
I think 'accident' is the key. (Working with sets is not an accident.) Anyone who really wants the mean of keys should be explicit: mean(d.keys())
example, sum({1: 100, 2: 200}) returns 3. If one wanted to reserve the opportunity to handle mappings specifically in the future, without being locked in by backwards-compatibility, how should one handle it?
a) document that behaviour with mappings is unsupported and may change in the future;
I think the doc should in any case specify the proper domain. In this case, I think it should exclude mappings: 'non-empty non-mapping iterable of numbers', or 'an iterable of numbers that is neither empty nor a mapping'. That makes the behavior at best undefined and subject to change. There should also be a caveat about mixing types, especially Decimals, if not one already. Perhaps rewrite the above as 'an iterable that is neither empty nor a mapping of numbers that are mutually summable'.
b) raise a warning when passed a mapping, but still iterate over it;
c) raise an exception and refuse to iterate over the mapping;
This, if possible. An empty iterable will raise at '/ 0'. Most anything that is not an iterable of number will eventually raise at '/ n' Testing both that an exception is raised and that it is one we want is why why unittest has assertRaises.
Question (2) is of course a specific example of a more general question, to what degree is the library author responsible for keeping backwards compatibility under circumstances which are not part of the intended API, but just work by accident?
[1] Or, for the sake of the argument, less accurate.
[2] And sets.
-- Terry Jan Reedy
On 2 February 2014 02:11, Terry Reedy <tjreedy@udel.edu> wrote:
example, sum({1: 100, 2: 200}) returns 3. If one wanted to reserve the opportunity to handle mappings specifically in the future, without being locked in by backwards-compatibility, how should one handle it?
a) document that behaviour with mappings is unsupported and may change in the future;
I think the doc should in any case specify the proper domain. In this case, I think it should exclude mappings: 'non-empty non-mapping iterable of numbers', or 'an iterable of numbers that is neither empty nor a mapping'. That makes the behavior at best undefined and subject to change. There should also be a caveat about mixing types, especially Decimals, if not one already. Perhaps rewrite the above as 'an iterable that is neither empty nor a mapping of numbers that are mutually summable'.
Generally, my view would be that users should not rely on undocumented behaviour. But as documentation is sometimes less than 100% precise, it's worth in a case like this, documenting that a particular behaviour is not defined (yet). Then, picking and implementing the desired behaviour will be a new feature and hence totally acceptable. Changing the way mappings are treated in a bugfix release (as opposed to a feature release) is unlikely to be acceptable no matter how you do it, as there's no way you can reasonably deliberately implement one behaviour now and claim it's a bug later :-) Paul
On 2 February 2014 11:06, Steven D'Aprano <steve@pearwood.info> wrote:
Hi all,
Over on the Python-ideas list, there's a thread about the new statistics module, and as the author of that module, I'm looking for a bit of guidance regarding backwards compatibility. Specifically two issues:
(1) With numeric code, what happens if the module become more[1] accurate in the future? Does that count as breaking backwards compatibility?
E.g. Currently I use a particular algorithm for calculating variance. Suppose that for a particular data set, this algorithm is accurate to (say) seven decimal places:
# Python 3.4 variance(some_data) == 1.23456700001
Later, I find a better algorithm, which improves the accuracy of the result:
# Python 3.5 or 3.6 variance(some_data) == 1.23456789001
Would this count as breaking backwards compatibility? If so, how should I handle this? I don't claim that the current implementation of the statistics module is optimal, as far as precision and accuracy is concerned. It may improve in the future.
For this kind of case, we tend to cover it in the "Porting to Python X.Y" section of the What's New guide. User code *shouldn't* care about this kind of change, but it *might*, so we split the difference and say "It's OK in a feature release, but not in a maintenance release". There have been multiple changes along these lines in our floating handling as Tim Peters, Mark Dickinson et al have made various improvements to reduce platform dependent behaviour (especially around overflow handling, numeric precision, infinity and NaN handling, etc). However, we also sometimes have module specific disclaimers - the decimal module, for example, has an explicit caveat that updates to the General Decimal Arithmetic Specification will be treated as bug fixes, even if they would normally not be allowed in maintenance releases. For a non-math related example, a comment from Michael Foord at the PyCon US 2013 sprints made me realise that the implementation of setting the __wrapped__ attribute in functools was just flat out broken - when applied multiple times it was supposed to create a chain of references that eventually terminated in a callable without the attribute set, but due to the bug every layer actually referred directly to the innermost callable (the one without the attribute set). Unfortunately, the docs I wrote for it were also ambiguous, so a lot of folks (including Michael) assumed it was working as intended. I have fixed the bug in 3.4, but there *is* a chance it will break introspection code that assumed the old behaviour was intentional and doesn't correctly unravel __wrapped__ chains.
Or would that count as a bug-fix? "Variance function was inaccurate, now less wrong", perhaps.
I suppose the math module has the same issue, except that it just wraps the C libraries, which are mature and stable and unlikely to change.
They may look that way *now*, but that's only after Tim, Mark et al did a lot of work on avoiding platform specific issues and inconsistencies
The random module has a similar issue:
http://docs.python.org/3/library/random.html#notes-on-reproducibility
I think a disclaimer in the statistics module similar to the ones in the math module and this one in the random module would be appropriate - one of the key purposes of the library/language reference is to let us distinguish between "guaranteed behaviour user code can rely on" and "implementation details that user code should not assume will remain unchanged forever". In this case, it would likely be appropriate to point out that the algorithms used internally may change over time, thus potentially changing the error bounds in the module output.
(2) Mappings[2] are iterable. That means that functions which expect sequences or iterators may also operate on mappings by accident. For example, sum({1: 100, 2: 200}) returns 3. If one wanted to reserve the opportunity to handle mappings specifically in the future, without being locked in by backwards-compatibility, how should one handle it?
a) document that behaviour with mappings is unsupported and may change in the future;
b) raise a warning when passed a mapping, but still iterate over it;
c) raise an exception and refuse to iterate over the mapping;
d) something else?
Question (2) is of course a specific example of a more general question, to what degree is the library author responsible for keeping backwards compatibility under circumstances which are not part of the intended API, but just work by accident?
In this particular case, I consider having a single API treat mappings differently from other iterables is a surprising anti-pattern and providing a separate API specifically for mappings is clearer (cf format() vs format_map()). However, if you want to preserve maximum flexibility, the best near term option is typically c (just disallow the input you haven't decided how to handle yet entirely), but either a or b would also be an acceptable way of achieving the same end (they're just less user friendly, since they let people do something that you're already considering changing in the future). In the case where you allow an API to escape into the wild without even a documented caveat that it isn't covered by the normal standard library backwards compatibility guarantees, then you're pretty much stuck. This is why you tend to see newer stdlib APIs often exposing functions that return private object types, rather than exposing the object type itself: exposed a function just promises a callable() API, while exposing a class directly promises a lot more in terms of supporting inheritance, isinstance(), issubclass(), etc, which can make future evolution of that API substantially more difficult. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Sat, Feb 1, 2014 at 9:14 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 2 February 2014 11:06, Steven D'Aprano <steve@pearwood.info> wrote:
Hi all,
Over on the Python-ideas list, there's a thread about the new statistics module, and as the author of that module, I'm looking for a bit of guidance regarding backwards compatibility. Specifically two issues:
(1) With numeric code, what happens if the module become more[1] accurate in the future? Does that count as breaking backwards compatibility?
E.g. Currently I use a particular algorithm for calculating variance. Suppose that for a particular data set, this algorithm is accurate to (say) seven decimal places:
# Python 3.4 variance(some_data) == 1.23456700001
Later, I find a better algorithm, which improves the accuracy of the result:
# Python 3.5 or 3.6 variance(some_data) == 1.23456789001
Would this count as breaking backwards compatibility? If so, how should I handle this? I don't claim that the current implementation of the statistics module is optimal, as far as precision and accuracy is concerned. It may improve in the future.
For this kind of case, we tend to cover it in the "Porting to Python X.Y" section of the What's New guide. User code *shouldn't* care about this kind of change, but it *might*, so we split the difference and say "It's OK in a feature release, but not in a maintenance release". There have been multiple changes along these lines in our floating handling as Tim Peters, Mark Dickinson et al have made various improvements to reduce platform dependent behaviour (especially around overflow handling, numeric precision, infinity and NaN handling, etc).
I agree with Nick that it's a feature release change, not a bugfix one. I think the key way to rationalize this particular case is the use of the words "better" and "improves". Notice you never said "broken" or "wrong", just that you were making the estimate better. Since the previous behaviour was not fundamentally broken and going to cause errors in correct code it then should only go in a feature release. -Brett
However, we also sometimes have module specific disclaimers - the decimal module, for example, has an explicit caveat that updates to the General Decimal Arithmetic Specification will be treated as bug fixes, even if they would normally not be allowed in maintenance releases.
For a non-math related example, a comment from Michael Foord at the PyCon US 2013 sprints made me realise that the implementation of setting the __wrapped__ attribute in functools was just flat out broken - when applied multiple times it was supposed to create a chain of references that eventually terminated in a callable without the attribute set, but due to the bug every layer actually referred directly to the innermost callable (the one without the attribute set). Unfortunately, the docs I wrote for it were also ambiguous, so a lot of folks (including Michael) assumed it was working as intended. I have fixed the bug in 3.4, but there *is* a chance it will break introspection code that assumed the old behaviour was intentional and doesn't correctly unravel __wrapped__ chains.
Or would that count as a bug-fix? "Variance function was inaccurate, now less wrong", perhaps.
I suppose the math module has the same issue, except that it just wraps the C libraries, which are mature and stable and unlikely to change.
They may look that way *now*, but that's only after Tim, Mark et al did a lot of work on avoiding platform specific issues and inconsistencies
The random module has a similar issue:
http://docs.python.org/3/library/random.html#notes-on-reproducibility
I think a disclaimer in the statistics module similar to the ones in the math module and this one in the random module would be appropriate - one of the key purposes of the library/language reference is to let us distinguish between "guaranteed behaviour user code can rely on" and "implementation details that user code should not assume will remain unchanged forever".
In this case, it would likely be appropriate to point out that the algorithms used internally may change over time, thus potentially changing the error bounds in the module output.
(2) Mappings[2] are iterable. That means that functions which expect sequences or iterators may also operate on mappings by accident. For example, sum({1: 100, 2: 200}) returns 3. If one wanted to reserve the opportunity to handle mappings specifically in the future, without being locked in by backwards-compatibility, how should one handle it?
a) document that behaviour with mappings is unsupported and may change in the future;
b) raise a warning when passed a mapping, but still iterate over it;
c) raise an exception and refuse to iterate over the mapping;
d) something else?
Question (2) is of course a specific example of a more general question, to what degree is the library author responsible for keeping backwards compatibility under circumstances which are not part of the intended API, but just work by accident?
In this particular case, I consider having a single API treat mappings differently from other iterables is a surprising anti-pattern and providing a separate API specifically for mappings is clearer (cf format() vs format_map()).
However, if you want to preserve maximum flexibility, the best near term option is typically c (just disallow the input you haven't decided how to handle yet entirely), but either a or b would also be an acceptable way of achieving the same end (they're just less user friendly, since they let people do something that you're already considering changing in the future).
In the case where you allow an API to escape into the wild without even a documented caveat that it isn't covered by the normal standard library backwards compatibility guarantees, then you're pretty much stuck. This is why you tend to see newer stdlib APIs often exposing functions that return private object types, rather than exposing the object type itself: exposed a function just promises a callable() API, while exposing a class directly promises a lot more in terms of supporting inheritance, isinstance(), issubclass(), etc, which can make future evolution of that API substantially more difficult.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/brett%40python.org
participants (5)
-
Brett Cannon
-
Nick Coghlan
-
Paul Moore
-
Steven D'Aprano
-
Terry Reedy