Help me improve my Python Skills, please

Sat Jan 15 05:11:50 EST 2000

tarude at whitestar.localdomain (Trebor A. Rude) writes:

> I just started picking up Python a few days ago, and am rather
> enjoying it.  To help me learn it, I converted one of my old TCL
> scripts into Python, which I've attached below. However, this being my 
> first serious python script, I'm pretty sure I'm doing some things
> less efficiently than necessary, through lack of familiarity with the
> library and language in general. So I'd like to enlist the group's
> help in improving the code. Logic, style, better ways of doing things, 
> whatever.  Thanks in advance.
> 
> A little explanation before the code.  The script recursively descends 
> a directory structure, building a .jdb (Jukebox database) file for
> each directory based on the mp3 tag info of the mp3s in that
> directory and all its subdirectories. It also repairs damaged comment
> fields caused by a slightly buggy encoder program.

First imporessions: you're reinventing the wheel a lot. This is fair
enough, as you don't yet know what kind of wheels come with Python,
but after a while you should get the idea of the sort of things that
come built in to the standard libraries.

Also, you're using re where string would do. This is also pretty
common... remember the Tom Christiansen (I think) quote:

"When faced with a problem, soem people think 'I'll use regular
expressions.' Now they have two problems."

Also, if I were writing this, I'd break the task up more and have
fewer smaller functions. (or methods. see below).

> #!/usr/bin/env python
> 
> # Does this look strange? My basic philosophy on
> # using "import" vs. "from <foo> import <bar>" is
> # based on my conception of how easy it might be
> # to misinterpret the meaning of the function. Also,
> # if I need a bunch of names from a package, I'll
> # tend toward "import".
> 
> import os
> import re
> import select
> 
> from os.path import join, basename, isdir, splitext

There's string.join as well, which is one reason to leave
out-of-module references qualified.

> from glob import glob
> from fnmatch import fnmatch
> from operator import concat
> from commands import getoutput
> 
> # Only reason this is a function is because it gets called recursively.
> 
> def Make_Playlist ():
>     pwd = os.getcwd()
>     filename = join(pwd, basename(pwd) + '.jbd')
> 
>     try:
>         listfile = open(filename, 'w')
>     except IOError:
>         sys.exit("Cannot open " + filename + " for write: " + IOError)
> 
>     # Since sort() returns None, need to go through these extra
>     # steps instead of just putting all on the for line.

Hmm; if you don't care about the order records turn up in the file,
why sort the list? OTOH, you process directories mixed up with files,
so the order's going to be a little odd anyway.

>     files = glob(join(pwd, '*'))

There's os.listdir(os.curdir), which is probably easier...

>     files.sort()
>     
>     for file in files:
>         if isdir(file):   # Recursively descend directory.
>             os.chdir(file)
>             Make_Playlist()
>             os.chdir(pwd)

This use of chdir is yucky (IMHO). I'd make the directory a parameter
to Make_Playlist.

>             # Concatente the just-created .jbd onto the current one.
>             
>             infile = open(join(file, basename(file) + '.jbd'))
>             listfile.writelines(infile.readlines())
>             infile.close()

This makes me cringe a bit too.

>         elif fnmatch(file, '*.mp3'):
>             newfilename = re.sub(' ', '_', basename(file))

newfilename = string.replace(basename(file),' ', '_')

>             if (newfilename != basename(file)):
>                 os.rename(file, newfile)
>                 file = join(pwd, newfile)
> 
>             # Don't really need this since getting a better encoder,
>             # but still might want to make it optional.
>             
>             #artist = re.sub('_', ' ', basename(pwd))
>             #title = re.sub('_', ' ', splitext(basename(file))[0])
>             #os.system("mp3info -t " + title + " -a " + artist \
>             #+ " -g 1 " + file)
> 
>             # Yes, I really do have filenames which use all the
>             # "bad for the shell" characters below.
>             
>             escaped_filename = re.sub('(["\'()& ])', r'\\\1', file)

In these circumstances I'd consider using one of the
os.exec{l,v}{p,}{e,} functions and not letting a shell get near the
filenames. This is more work (more mucking with filehandles than it's
worth, maybe).

You've forgotten '{}$' btw. Or are these not a problem?

You can make string.translate do this, but it's a bit unwieldy.

>             # Deleted the actual format because it was too long.
>             # If anyone's interested, just ask.

This would be why the next line is a syntax error, yes?

>             mp3info = getoutput('mp3info -f "..." + escaped_filename)
> 
>             # The encoder program I use is slightly buggy and tends to 
>             # leave spaces followed by one random control character at 
>             # the end of the comment field of the mp3 tag. Generally,
>             # this isn't a problem, but it IS if that control
>             # character is ^J, so I fix it here.
> 
>             # Notice the (\B|\b) construct.  What I really wanted to
>             # do was make the whole of \1 optional, but then it causes 
>             # an error if \1 doesn't match, which is bad because the
>             # point is to delete the stuff after it.  Is there a
>             # better way to do this?
> 
>             fixedinfo = re.sub(\
>             r'\|((\B|\b)([A-Za-z \'/.]*[A-Za-z])?) +[\1-\26]\|', \
>             r'|\1|', \
>             reduce(concat, mp3info))

Hello? mp3info at this point is a string. So reduce(concat,mp3info) is
a no-op. If by something you've omitted mp3info is a list of strings,
"string.join(mp3info)" is a better bet (clearer & faster - what more
could you want?).

Also string.rstrip *might* do what you want.

> 
>             # Hmm, come to think of it, I might want to repair the
>             # actual mp3 file here.  I'll have to look into the
>             # "Matcher" objects. (Would that be the right way to save
>             # \1 from above?)
> 
>             listfile.write(fixedinfo)
>             listfile.write('\n')
>             
>     listfile.close()
> 
> # os.path.walk() might simplfy things mightily, but would require some
> # information to be saved outside the function to keep track of
> # just-constructed .jbd files for inclusion in the current one.

That's what classes and bound methods are for...

> Make_Playlist()
> 

You're aware of the 

if __name__ = '__main__':
    main_func()

idiom? You never know, you might want to use this script as a module
one day...

Let's have a go at this.

A little top-down design:

The top function:

def descend_directory(dir): ...

I think I'll have this return a list of info strings.

We want a function to extract the info (fixed if necessary) from an
.mp3:

def get_info(mp3file): ...

I'll let you write that one.

Then descend_directory looks somewhat like this:

def descend_directory(dir):
    mp3s = glob.glob(os.path.join(dir,"*.mp3"))

    info = []

    listfile = open(os.path.join(dir, 
                                 os.path.basename(dir) + '.jbd'),"w")

    for mp3 in mp3s:
        info.append(get_info(mp3file))

    dirs = filter(os.path.isdir,os.listdir(dir))

    for ndir in dirs:
        info = info + descend_directory(ndir)

    listfile.writelines(info)

    return info

Top your module with:

import os,glob

and tail it with:

if __name__ = '__main__':
    descend_directory(os.curdir)

and you should be sorted!

I know I've written a lot, but I think you're doing pretty well!

HTH,
Michael