[Tutor] a few question about my evolving program
Steven D'Aprano
steve at pearwood.info
Wed Aug 12 15:43:27 CEST 2015
On Tue, Aug 11, 2015 at 08:23:00PM -0700, Clayton Kirkwood wrote:
> Question 2:
> My current code:
> See "Look here" below.
There's an art to picking good variable names, neither too short nor too
long, just descriptive enough without being too verbose.
Or to put it another way...
The existence of a systematic application of knowledge or skill in
effecting a desired result in such a way as to be in accordance with the
actual state of reality is a factual statement when applied to the
technique of making a agreeable selection between hypothetical candidate
appellations for variables.
*wink*
I'm afraid that reading your code gives me a headache. You have
master_directory_file_list (which is a dict, not a list) and
directory_file_list; also target_filename_size (a dict, not a size),
current_directory_path, target_directory, current_file_list, file_list,
and file_file_path_list, although I admit that my head is spinning at
the moment and that last one might be inaccurate :-)
I don't know how you can keep track of all those so-very-similar
variable names, especially since some of them lie about what they are!
(Claiming to be a list when they are actually dicts, etc.) Better names
will help.
It is rare that stating what *type* a variable is will be helpful. E.g.
you normally wouldn't say "count_int" or "width_float", rather "count"
and "width".
You need to find a balance between variable names which are too generic
and non-descriptive, and those which are too verbose and detailed. That
will come with experience, and from reading other people's code to see
what works and what doesn't work.
It will also help if you can break your code up into small,
self-contained functions which can be digested by the reader in
isolation. For example, if I were writing your code, I might do
something like this:
def name_is_seen_before(filename, already_seen):
"""Returns whether or not filename has already been seen.
If the filename has not been seen before, it is added to the
set of already seen filenames, and False is returned; otherwise
True is returned.
"""
if filename in already_seen:
return True
already_seen.add(filename)
return False
Then use it something like this:
already_seen = set()
duplicates = set()
for name in bunch_of_names:
if name_is_seen_before(name, already_seen):
duplicates.add(name)
Obviously you have to get the bunch_of_names somehow first. To do that,
you can also simplify the process of iterating over files with a filter
that discards the files you don't care about:
def is_media_file(filename):
extensions = ['jpg', 'gif', 'png'] # etc.
base, ext = os.path.splitext(filename)
return ext[1:].lower() in extensions
def filter_media(filenames):
media_files = []
for name in filenames:
if is_media_file(name):
media_files.append(name)
return media_files
The advantage of this is that you can write these small functions
independently of the rest of your loop, you can test them easily and
satisfy yourself that they work correctly, and then forget all the
internal details of how they work when you go to use them:
for dir_path, directories, filenames in os.walk(main_dir):
filenames = filter_media(filenames)
# ... continue processing ...
And you don't need to care that variable names are duplicated in
different functions, because each function is self-contained. There's no
confusion between variable "name" in one function and "name" in another,
so there's no need to try to come up with distinctive names for them
both.
The point being, the *details* of how the media files are filtered are
not important. You can "look inside" the filter_media() function when
you care about the details, and when you don't, you can just treat it as
a black-box that takes a list of file names and returns only those which
are media files.
Try redesigning your code to be a bit more hierarchical in this way, and
see if that makes it easier for you to solve the problem.
--
Steve
More information about the Tutor
mailing list