[Tutor] a few question about my evolving program

Steven D'Aprano steve at pearwood.info
Wed Aug 12 15:43:27 CEST 2015


On Tue, Aug 11, 2015 at 08:23:00PM -0700, Clayton Kirkwood wrote:

> Question 2:
> My current code:
> See "Look here" below.


There's an art to picking good variable names, neither too short nor too 
long, just descriptive enough without being too verbose.

Or to put it another way... 

The existence of a systematic application of knowledge or skill in 
effecting a desired result in such a way as to be in accordance with the 
actual state of reality is a factual statement when applied to the 
technique of making a agreeable selection between hypothetical candidate 
appellations for variables.

*wink*

I'm afraid that reading your code gives me a headache. You have 
master_directory_file_list (which is a dict, not a list) and 
directory_file_list; also target_filename_size (a dict, not a size), 
current_directory_path, target_directory, current_file_list, file_list, 
and file_file_path_list, although I admit that my head is spinning at 
the moment and that last one might be inaccurate :-)

I don't know how you can keep track of all those so-very-similar 
variable names, especially since some of them lie about what they are! 
(Claiming to be a list when they are actually dicts, etc.) Better names 
will help. 

It is rare that stating what *type* a variable is will be helpful. E.g. 
you normally wouldn't say "count_int" or "width_float", rather "count" 
and "width".

You need to find a balance between variable names which are too generic 
and non-descriptive, and those which are too verbose and detailed. That 
will come with experience, and from reading other people's code to see 
what works and what doesn't work.

It will also help if you can break your code up into small, 
self-contained functions which can be digested by the reader in 
isolation. For example, if I were writing your code, I might do 
something like this:


def name_is_seen_before(filename, already_seen):
    """Returns whether or not filename has already been seen.

    If the filename has not been seen before, it is added to the
    set of already seen filenames, and False is returned; otherwise
    True is returned.
    """
    if filename in already_seen:
        return True
    already_seen.add(filename)
    return False


Then use it something like this:

already_seen = set()
duplicates = set()
for name in bunch_of_names:
    if name_is_seen_before(name, already_seen):
         duplicates.add(name)


Obviously you have to get the bunch_of_names somehow first. To do that, 
you can also simplify the process of iterating over files with a filter 
that discards the files you don't care about:

def is_media_file(filename):
    extensions = ['jpg', 'gif', 'png']  # etc.
    base, ext = os.path.splitext(filename)
    return ext[1:].lower() in extensions

def filter_media(filenames):
    media_files = []
    for name in filenames:
        if is_media_file(name):
            media_files.append(name)
    return media_files


The advantage of this is that you can write these small functions 
independently of the rest of your loop, you can test them easily and 
satisfy yourself that they work correctly, and then forget all the 
internal details of how they work when you go to use them:


for dir_path, directories, filenames in os.walk(main_dir):
    filenames = filter_media(filenames)
    # ... continue processing ...


And you don't need to care that variable names are duplicated in 
different functions, because each function is self-contained. There's no 
confusion between variable "name" in one function and "name" in another, 
so there's no need to try to come up with distinctive names for them 
both.

The point being, the *details* of how the media files are filtered are 
not important. You can "look inside" the filter_media() function when 
you care about the details, and when you don't, you can just treat it as 
a black-box that takes a list of file names and returns only those which 
are media files.

Try redesigning your code to be a bit more hierarchical in this way, and 
see if that makes it easier for you to solve the problem.



-- 
Steve


More information about the Tutor mailing list