Best way to ensure user calls methods in correct order?

Sat Jun 24 05:53:45 EDT 2017

On 22/06/17 17:31, Thomas Nyberg wrote:
> Thanks for the response! I see the reasoning, but I can't entirely
> square it with what I'm thinking. Probably it's either because I was
> hiding some of my intention (sorry if I wasted your time due to lack of
> details) or that I'm naive and regardless of intention doing something
> silly.
> 
> On 06/22/2017 04:40 PM, Steve D'Aprano wrote:
>> On Thu, 22 Jun 2017 11:53 pm, Thomas Nyberg wrote:
>>
>> Don't do that. It's fragile and an anti-pattern. Your methods have too much
>> coupling. If c() relies on b() being called first, then either b() or c()
>> aren't good methods. They don't do enough:
>>
>> - calling b() alone doesn't do enough, so you have to call c() next to get the
>> job done;
>>
>> - calling c() alone doesn't do enough, because it relies on b() being called
>> first.
>>
>>
>> There are a very few exceptions to this rule of thumb, such as opening
>> connections to databases or files or similar. They are mostly enshrined from
>> long practice, or justified by low-level APIs (that's how file systems and
>> databases work, and it's not practical to change that). But you should try very
>> hard to avoid creating new examples.
>>
>> Of course, I'm talking about your *public* methods. Private methods can be a bit
>> more restrictive, since it's only *you* who suffers if you do it wrong.
>>
>> Ideally, your methods should be written in a functional style with as little
>> shared state as possible. (Shared state introduces coupling between components,
>> and excessive coupling is *the* ultimate evil in programming.)
>>  
> 
> This makes perfect sense in general. An important detail I left out
> earlier was what exactly I meant by "users" of my code. Really all I
> meant was myself calling this code as a library and avoiding making
> mistakes. Currently what I'm doing is kind of a data storage, data
> processing/conversion, data generation pipeline. This different
> variables at the different steps in the pipeline are specified in a
> script. The functions in this script have _no coupling at all_ for all
> the reasons you stated. In fact, the only reason I'm introducing a class
> is because I'm trying to force their ordering in the way I described.
> 
> The ultimate goal here is to instead put a basic http server wrapper
> around the whole process. The reason I want to do this is to allow
> others (internal to the company) to make changes in the process
> interactively at different steps in the process (the steps are "natural"
> for the problem at hand) without editing the python scripts explicitly.
> 
> Of course I could force them to execute everything in one go, but due to
> the time required for the different steps, it's nicer to allow for some
> eye-balling of results (and possible changes and re-running) before
> continuing. Before this the main way of enforcing this was all done by
> the few of use editing the scripts, but now I'd like to make it an error
> for corrupted data to propagate down through the pipeline. Having a
> run_all() method or something similar will definitely exist (usually
> once you figure out the "right" parameters you don't change them much on
> re-runs), but having things broken apart is still very nice.
> 
> Would you still say this is a bad way of going about it? Thanks for the
> feedback. It's very helpful.

If a() does some processing, and then b() does something else to the
result of a(), then the natural way of calling the functions is probably
c(b(a(initial_data))), rather than a sequence of method or function
calls that hide some internal state. If the user wants to jump in and
look at what's going on, they can:

a_result = a()
# peruse the data
b_result = b(a_result)
# have some coffee
c_result = c(b_result)
# and so on.

If the data is modified in-place, maybe it makes sense to to use a class
like the one you have, but then it's a bit bizarre to make your methods
create an artificial side effect (self._a_dirty) - why don't you simply
check for the actual effect of a(), whatever it is. If a() did some
calculations and added a column to your dataset with the results, check
for the existence of that column in b()! If calling b() invalidates some
calculation results generated in c(), delete them! (In the functional
setup above, b() would refuse to process a dataset that it has already
processed)

-- Thomas