How to test?
Manfred Lotz
ml_news at posteo.de
Sun Apr 26 06:46:01 EDT 2020
On Sun, 26 Apr 2020 15:26:58 +1200
DL Neil <PythonList at DancesWithMice.info> wrote:
> On 25/04/20 7:53 PM, Manfred Lotz wrote:
> > On Sat, 25 Apr 2020 18:41:37 +1200
> > DL Neil <PythonList at DancesWithMice.info> wrote:
> >
> >> On 25/04/20 5:16 PM, Manfred Lotz wrote:
> >>> On Fri, 24 Apr 2020 19:12:39 -0300
> >>> Cholo Lennon <chololennon at hotmail.com> wrote:
> >>>
> >>>> On 24/4/20 15:40, Manfred Lotz wrote:
> >>>>> I have a command like application which checks a directory tree
> >>>>> for certain things. If there are errors then messages will be
> >>>>> written to stdout.
>
> > What I do here specifically is to check directory trees' file
> > objects for user and group ownerships as well as permissions
> > according to a given policy. There is a general policy and there
> > could be exceptions which specify a specific policy for a certain
> > file.
>
> If I have understood correctly, the objective is to check a dir-tree
> to ensure that specific directory/file-permissions are in-effect/have
> not been changed. The specifications come from a .JSON file and may
> be over-ridden by command-line arguments. Correct?
>
Yes.
> There must be a whole 'genre' of programs which inspect a
> directory-tree and doing 'something' to the files-contained. I had a
> few, and needing another, decided to write a generic 'scanner' which
> would then call a tailored-function to perform the particular
> 'something' - come to think of it, am not sure if that was ever quite
> finished. Sigh!
>
>
> >>>>> One idea was for the error situations to write messages to
> >>>>> files and then later when running the tests to compare the
> >>>>> error messages output to the previously saved output.
> >>>>>
> >>>>> Is there anything better?
>
> The next specification appears to be that you want a list of files
> and their stats, perhaps reporting only any exceptions which weren't
> there 'last time'.
>
I just want to report violations against the policy(ies) given in the
JSON file.
> In which case, an exception could be raised, or it might be simpler
> to call a reporting-function when a file should be 'reported'.
>
> I'm still a little confused about the difference between
> printing/logging/reporting some sort of exception, and the need to
> compare with 'history'.
>
There is no compare with history. I just want to see current violations
(I think I phrased things badly in my previous post so that it looked
like I want to see history.)
>
> The problem with the "previously saved output" is the process of
> linking 'today's data' with that from the last run. I would use a
> database (but then, that's my background/bias) to store each file's
> stat.
>
As said above I phrased things badly here in my previous post.
> Alternately, if you only need to store exceptions, and that number is
> likely to be small, perhaps output only the exceptions from 'this
> run' to a .JSON/.yaml file - which would become input (and a dict for
> easy look-ups) next time?
>
>
> >>>> Maybe I am wrong because I don't understand your scenario: If
> >>>> your application is like a command, it has to return an error
> >>>> code to the system, a distinct number for each error condition.
> >>>> The error code is easier to test than the stdout/stderr.
>
> Either way, you might decrease the amount of data to be stored by
> reducing the file's stat to a code.
>
>
> >>>>> How to test this in the best way?
>
> The best way to prepare for unit-testing is to have 'units' of code
> (apologies!).
>
> An effective guide is to break the code into functions and methods,
> so that each performs exactly one piece/unit of work - and only the
> one. A better guide is that if you cannot name the procedure using
> one description and want to add an "and" an "or" or some other
> conjunction, perhaps there should be more than one procedure!
>
> For example:
>
> > for cat in cats:
> > ...
> > for d in scantree(cat.dir):
> > # if `keep_fs` was specified then we must
> > # make sure the file is on the same device
> > if cat.keep_fs and devid != get_devid(d.path):
> > continue
> >
> > cat.check(d)
>
Above is part of the main() function. But I could make some part of
the main() function into its own function. Then the call to check a
directory could be a function argument. For testing I could reuse that
function by providing a different function for the check (now it is
cat.checkid(d) )
> If this were a function, how would you name it? First of all we are
> processing every category, then we are scanning a dir-tree, and
> finally we are doing something to each directory found.
>
> If we split these into separate routines, eg (sub-setting the above):
>
> > for d in scantree(cat.dir):
> do_something_with_directory( d )
>
> and you devise a (much) more meaningful name than mine, it will
> actually help readers (others - but also you!) to understand the
> steps within the logic.
>
> Now if we have a function which checks a single fileNM for
> 'whatever', the code will start something like:
>
> def stat_file( fileNM ):
> '''Gather stat information for nominated file.'''
> etc
>
> So, we can now write a test-function because we don't need any
> "categories", we don't need all the dir-tree, and we don't need to
> have performed a scan - all we need is a (valid and previously
> inspected) file-path. Thus (using Pytest):
>
> def test_stat_file( ... ):
> '''Test file stat.'''
> assert stat_file( "...file-path" ) == ...its
> known-stat
>
> def test_stat_dir( ... ):
> '''Test file stat.'''
> assert stat_file( "...dir-path" ) == ...its known-stat
>
> There is no need to test for a non-existent file, if you are the only
> user!
>
> In case you hadn't thought about it, make the test path, part of your
> test directory - not part of 'the real world'!
>
>
> This is why TDD ("Test-Driven Development) theory says that one
> should devise tests 'first' (and the actual code, 'second'). You must
> design the logic first, and then start bottom-up, by writing tests to
> 'prove' the most simple functionality (and then the code to 'pass'
> those tests), gradually expanding the scope of the tests as single
> functions are combined in calls from larger/wider/controlling
> functions...
>
> This takes the previous time-honored practice of writing the code
> first, and then testing it (or even, 'throwing it over the wall' to a
> test department), and turns it on its head. (no wall necessary)
>
> I find it a very good way of making sure that when coding a system
> comprising many 'moving parts', when I make "a little change" to some
> component 'over here', the test-regime will quickly reveal any
> unintended consequences (effects, artifacts), etc, 'over there' -
> even though my (very) small brain hadn't realised the impact! ("it's
> just one, little, change!")
>
>
> > The policy file is a JSON file and could have different categories.
> > Each category defines a policy for a certain directory tree. Comand
> > line args could overwrite directory names, as well as user and
> > group. Or if for example a directory is not specified in the JSON
> > file I am required to specify it via command line. Otherwise no
> > check can take place.
> > default":
> > {
> > "match_perms": "644",
> > "match_permsd": "755",
> > "match_permsx": "755",
> > "owner": "manfred",
> > "group": "manfred"
> > }
>
> Won't a directory-path be required, to tie 'policy' to 'tree'?
>
Yes, a directory path is required. If not present here then it must be
supplied by command line arguments.
>
> Is that enough feedback to help you take a few more steps?
Yes, I think so. Thanks very much.
The comments/advices I got were pretty helpful. I already have
started to make improvements in my code to be able to test things
better.
For example I have changed the check() method in the Policy class to
get called like this:
def check(self, fpath, user, group, mode):
This means I can test the checks and its result without requiring any
external test data.
Thanks a lot to all for your help.
--
Manfred
More information about the Python-list
mailing list