[Twisted-Python] Accepting/merging patches for unsupported platforms without CI
![](https://secure.gravatar.com/avatar/c194a4d2f2f8269aa052942e87985198.jpg?s=120&d=mm&r=g)
Hi, What is the procedure for accepting patches for platforms for which we don't have a continuous testing system ? This is a follow up of the review from https://twistedmatrix.com/trac/ticket/9323 There is a patch for Cygwin but AFAIK there is no current builder for running Twisted with Python on Cygwin. My first reaction was that before accepting such patches we should set up a continuous testing system. With such a system in place we can run the tests to see that the reported bug is present in trunk and that it is fixed in a branch. If that can't be done, we need at least someone to manually run the tests to check the patch. Twisted is supported on so many systems and I don't think that is OK to ask reviewers to have each supported system at hand, ready to fire a manual test. Any comments? Thanks, -- Adi Roiban
![](https://secure.gravatar.com/avatar/e1554622707bedd9202884900430b838.jpg?s=120&d=mm&r=g)
To support a platform—i.e. to promise we will not break it in the future—we have to have a builder capable of running Twisted on that platform. To accept a patch though, the important thing is to have test coverage. If the change in question is a platform-specific feature that there’s no way to test in a different environment, we would indeed need to block on the availability of a builder. Quite often—as I believe is the case for the patch you’re referring to here—patches are adjusting behaviors which would be difficult to cleanly integration-test on the platform in question anyway, and the appropriate thing to do is to make some mock match the observed behavior of the platform in question. In this case it would be nice if the reviewer could verify this, but if the patch submitter’s description of observed behavior is clear, and their test mock implements it correctly, I would say that we can give them the benefit of the doubt and accept such patches even if the reviewer doesn’t have such a platform available. Assuming all the changed lines are covered adequately for our supported platforms’ behavior as well, of course, and we aren’t risking breaking them. How do you feel about this answer? -g
![](https://secure.gravatar.com/avatar/607cfd4a5b41fe6c886c978128b9c03e.jpg?s=120&d=mm&r=g)
On Sun, Nov 26, 2017 at 5:30 PM, Glyph <glyph@twistedmatrix.com> wrote:
Just because there's enough misunderstanding about how to do this kind of thing and the words that describe it, I wanted to point out that this isn't really a "mock" - and it's certainly not something you would do with a library with "mock" in its name. For those who don't know (not Glyph), a mock is an object that somehow records an exact sequence of operations (like "my foo method was called with bar" or "my baz attribute was accessed"). These tend to result in very fragile tests because they operate at such a specific level. Quite often entirely legitimate implementation changes invalidate the expectations of some mock and force corresponding test suite maintenance. A better approach is to make sure the platform-specific interface being tested has been reduced to its smallest possible scope and then create a verified fake which implements that interface. The verification is performed by a test suite which can run against both implementations (the fake and the real - with the real version only being tested in the necessary environment). All other related tests can use the verified fake instead of the real implementation and will run on all platforms. Jean-Paul
![](https://secure.gravatar.com/avatar/e1554622707bedd9202884900430b838.jpg?s=120&d=mm&r=g)
Martin Fowler has the authoritative glossary on this - https://www.martinfowler.com/bliki/TestDouble.html <https://www.martinfowler.com/bliki/TestDouble.html>. I have my own entry in this genre which echoes and expands upon some of his definitions: https://thoughtstreams.io/glyph/test-patterns/ <https://thoughtstreams.io/glyph/test-patterns/>. However, I eventually acquiesced to the colloquial meaning of "mock" to mean "test double"; in most contexts where people are talking about "mocks" they're really talking about fakes, and the real actual mock-module-style mocks are increasingly universally understood to be technical debt.
For those who don't know (not Glyph), a mock is an object that somehow records an exact sequence of operations (like "my foo method was called with bar" or "my baz attribute was accessed"). These tend to result in very fragile tests because they operate at such a specific level. Quite often entirely legitimate implementation changes invalidate the expectations of some mock and force corresponding test suite maintenance.
(As far as I know, we don't use the 'mock' module anywhere in Twisted, and we shouldn't start :).)
A better approach is to make sure the platform-specific interface being tested has been reduced to its smallest possible scope and then create a verified fake which implements that interface. The verification is performed by a test suite which can run against both implementations (the fake and the real - with the real version only being tested in the necessary environment). All other related tests can use the verified fake instead of the real implementation and will run on all platforms.
The most important thing here is to reduce the number of duplicate fakes. Completely verified fakes are unfortunately almost impossible to build. If the faked API has good enough testing support that you can programmatically provoke all of its failure modes, then the only reason to build a fake is performance, which is rarely our most pressing problem :-). So quite often even the best verified fakes will only be partially verified. Even partial verification of things like kernel APIs can be a major challenge. So it's often necessary to accept a fake which is not directly verified against the real API. However, ad-hoc one-off fakes are at extremely high risk of drift. If you are trying to verify the system under test's response to behavior X in a test, and you build a fake that implements X, Y, and Z for the sake of the test's set-up, naturally that new fake's Y and Z will be flaky and half-implemented (at best: its X may be quite half-baked as well). Moreover, if the real X, Y or Z changes in the future, it's highly unlikely anyone will come along and update the mock for just that one test. If that fake were shared among a large number of tests, though, or even better, distributed along with the real X, Y, and Z, there is at least a fighting chance that someone might update the fake in the future and notice that the system under test has an issue. I believe the most important principle here is that test code is just code like any other code, and following generally good code-organization principles like well-defined interfaces, planning for maintenance over time, "don't repeat yourself", clear locus of responsibility, etc etc, will prevent test suites and their attendant doubles from decaying into a sprawling and unmaintainable mess. We have generally been making good progress on this over time, but it's important for everybody involved with Twisted to continue trying to reduce duplication in fakes, and improve the verification of the fakes we have, as we maintain our test suite. -glyph
![](https://secure.gravatar.com/avatar/c194a4d2f2f8269aa052942e87985198.jpg?s=120&d=mm&r=g)
On 26 November 2017 at 22:30, Glyph <glyph@twistedmatrix.com> wrote:
[snip]
Thanks for your comment and I think that I know what a reviewer should do for https://twistedmatrix.com/trac/ticket/9323
A. Get a builder for cygwin and use the existing tests. B. Write new unit/integration tests with a fake Cygwin system. ----- I understand the high-level process, but I have questions regarding the details. How can a reviewer (which does not have experience and access to an unsupported platform) verify whether the fake/surrogate implementation is correct? That is without putting significant effort and reading the details about the unsupported platform, and in this process becoming an expert in that platform? -- Adi Roiban
![](https://secure.gravatar.com/avatar/e1554622707bedd9202884900430b838.jpg?s=120&d=mm&r=g)
At some level, I think it is OK to take the submitter's word for it. The important thing about the code review is not like, courtroom-style adversarial disbelief, but rather, the idea that it is super easy to make mistakes while writing programs. If the submitter clearly describes the same behavior in tests, in their implementation, in their ticket description and in their NEWS fragment, and the reviewer can verify that they're all consistent, then I don't think they need to be able to personally reproduce the behavior if it's on an obscure platform. Ideally, as Jean-Paul suggests, the submitter would provide a verified fake <https://thoughtstreams.io/glyph/test-patterns/5952/> implementation. In practice they may just provide a well-documented fake, since verifying the real implementation may be impossible programmatically, or impractically difficult. (However, a genuine mock, or a dummy, is definitely insufficient for testing.)
In this case, the subtle difference in pipe-handling behavior, which is shared at least in part with other platforms, should not require becoming a total expert in Cygwin's runtime. If it does, we may in fact need to wait for a reviewer to come along who already has this expertise. -glyph
![](https://secure.gravatar.com/avatar/e1554622707bedd9202884900430b838.jpg?s=120&d=mm&r=g)
To support a platform—i.e. to promise we will not break it in the future—we have to have a builder capable of running Twisted on that platform. To accept a patch though, the important thing is to have test coverage. If the change in question is a platform-specific feature that there’s no way to test in a different environment, we would indeed need to block on the availability of a builder. Quite often—as I believe is the case for the patch you’re referring to here—patches are adjusting behaviors which would be difficult to cleanly integration-test on the platform in question anyway, and the appropriate thing to do is to make some mock match the observed behavior of the platform in question. In this case it would be nice if the reviewer could verify this, but if the patch submitter’s description of observed behavior is clear, and their test mock implements it correctly, I would say that we can give them the benefit of the doubt and accept such patches even if the reviewer doesn’t have such a platform available. Assuming all the changed lines are covered adequately for our supported platforms’ behavior as well, of course, and we aren’t risking breaking them. How do you feel about this answer? -g
![](https://secure.gravatar.com/avatar/607cfd4a5b41fe6c886c978128b9c03e.jpg?s=120&d=mm&r=g)
On Sun, Nov 26, 2017 at 5:30 PM, Glyph <glyph@twistedmatrix.com> wrote:
Just because there's enough misunderstanding about how to do this kind of thing and the words that describe it, I wanted to point out that this isn't really a "mock" - and it's certainly not something you would do with a library with "mock" in its name. For those who don't know (not Glyph), a mock is an object that somehow records an exact sequence of operations (like "my foo method was called with bar" or "my baz attribute was accessed"). These tend to result in very fragile tests because they operate at such a specific level. Quite often entirely legitimate implementation changes invalidate the expectations of some mock and force corresponding test suite maintenance. A better approach is to make sure the platform-specific interface being tested has been reduced to its smallest possible scope and then create a verified fake which implements that interface. The verification is performed by a test suite which can run against both implementations (the fake and the real - with the real version only being tested in the necessary environment). All other related tests can use the verified fake instead of the real implementation and will run on all platforms. Jean-Paul
![](https://secure.gravatar.com/avatar/e1554622707bedd9202884900430b838.jpg?s=120&d=mm&r=g)
Martin Fowler has the authoritative glossary on this - https://www.martinfowler.com/bliki/TestDouble.html <https://www.martinfowler.com/bliki/TestDouble.html>. I have my own entry in this genre which echoes and expands upon some of his definitions: https://thoughtstreams.io/glyph/test-patterns/ <https://thoughtstreams.io/glyph/test-patterns/>. However, I eventually acquiesced to the colloquial meaning of "mock" to mean "test double"; in most contexts where people are talking about "mocks" they're really talking about fakes, and the real actual mock-module-style mocks are increasingly universally understood to be technical debt.
For those who don't know (not Glyph), a mock is an object that somehow records an exact sequence of operations (like "my foo method was called with bar" or "my baz attribute was accessed"). These tend to result in very fragile tests because they operate at such a specific level. Quite often entirely legitimate implementation changes invalidate the expectations of some mock and force corresponding test suite maintenance.
(As far as I know, we don't use the 'mock' module anywhere in Twisted, and we shouldn't start :).)
A better approach is to make sure the platform-specific interface being tested has been reduced to its smallest possible scope and then create a verified fake which implements that interface. The verification is performed by a test suite which can run against both implementations (the fake and the real - with the real version only being tested in the necessary environment). All other related tests can use the verified fake instead of the real implementation and will run on all platforms.
The most important thing here is to reduce the number of duplicate fakes. Completely verified fakes are unfortunately almost impossible to build. If the faked API has good enough testing support that you can programmatically provoke all of its failure modes, then the only reason to build a fake is performance, which is rarely our most pressing problem :-). So quite often even the best verified fakes will only be partially verified. Even partial verification of things like kernel APIs can be a major challenge. So it's often necessary to accept a fake which is not directly verified against the real API. However, ad-hoc one-off fakes are at extremely high risk of drift. If you are trying to verify the system under test's response to behavior X in a test, and you build a fake that implements X, Y, and Z for the sake of the test's set-up, naturally that new fake's Y and Z will be flaky and half-implemented (at best: its X may be quite half-baked as well). Moreover, if the real X, Y or Z changes in the future, it's highly unlikely anyone will come along and update the mock for just that one test. If that fake were shared among a large number of tests, though, or even better, distributed along with the real X, Y, and Z, there is at least a fighting chance that someone might update the fake in the future and notice that the system under test has an issue. I believe the most important principle here is that test code is just code like any other code, and following generally good code-organization principles like well-defined interfaces, planning for maintenance over time, "don't repeat yourself", clear locus of responsibility, etc etc, will prevent test suites and their attendant doubles from decaying into a sprawling and unmaintainable mess. We have generally been making good progress on this over time, but it's important for everybody involved with Twisted to continue trying to reduce duplication in fakes, and improve the verification of the fakes we have, as we maintain our test suite. -glyph
![](https://secure.gravatar.com/avatar/c194a4d2f2f8269aa052942e87985198.jpg?s=120&d=mm&r=g)
On 26 November 2017 at 22:30, Glyph <glyph@twistedmatrix.com> wrote:
[snip]
Thanks for your comment and I think that I know what a reviewer should do for https://twistedmatrix.com/trac/ticket/9323
A. Get a builder for cygwin and use the existing tests. B. Write new unit/integration tests with a fake Cygwin system. ----- I understand the high-level process, but I have questions regarding the details. How can a reviewer (which does not have experience and access to an unsupported platform) verify whether the fake/surrogate implementation is correct? That is without putting significant effort and reading the details about the unsupported platform, and in this process becoming an expert in that platform? -- Adi Roiban
![](https://secure.gravatar.com/avatar/e1554622707bedd9202884900430b838.jpg?s=120&d=mm&r=g)
At some level, I think it is OK to take the submitter's word for it. The important thing about the code review is not like, courtroom-style adversarial disbelief, but rather, the idea that it is super easy to make mistakes while writing programs. If the submitter clearly describes the same behavior in tests, in their implementation, in their ticket description and in their NEWS fragment, and the reviewer can verify that they're all consistent, then I don't think they need to be able to personally reproduce the behavior if it's on an obscure platform. Ideally, as Jean-Paul suggests, the submitter would provide a verified fake <https://thoughtstreams.io/glyph/test-patterns/5952/> implementation. In practice they may just provide a well-documented fake, since verifying the real implementation may be impossible programmatically, or impractically difficult. (However, a genuine mock, or a dummy, is definitely insufficient for testing.)
In this case, the subtle difference in pipe-handling behavior, which is shared at least in part with other platforms, should not require becoming a total expert in Cygwin's runtime. If it does, we may in fact need to wait for a reviewer to come along who already has this expertise. -glyph
participants (3)
-
Adi Roiban
-
Glyph
-
Jean-Paul Calderone