[Tutor] decorators

Fri Jul 23 21:33:27 CEST 2010

On Sat, 24 Jul 2010 04:23:41 am Mary Morris wrote:
> I'm trying to compile a list of decorators from the source code at my
> office.
> I did this by doing a
>
> candidate_line.find("@")
>
> because all of our decorators start with the @ symbol.  The problem
> I'm having is that the email addresses that are included in the
> comments are getting included in the list that is getting returned.

First of all, to solve this problem *properly* you will need a proper 
parser to walk over the code and look for decorators, ignoring 
comments, skipping over strings, and similar. But that's hard, or at 
least I have no idea how to do it, so the alternative is a basic filter 
like you are doing.

If you're using Linux, Mac or some other Unix, the fastest solution 
would be to use grep. But ignoring that, think about what a decorator 
line is. You suggest above that a candidate line is a decorator if it 
has a @ sign in it. But that's incorrect. This is not a decorator:

    # send an email to steve at something.net or george at example.gov.au

But this might be:

    @decorator

So let's start with a simple little generator to return lines as a 
candidate decorator only if it *starts* with an ampersand:

def find_decorators(lines):
    """Return likely decorators from lines of text."""
    for line in lines:
        line = line.lstrip()  # ignore leading spaces
        if line.startswith('@'):
            yield line

That's still not fool-proof, only a proper Python parser will be 
fool-proof. This will be fooled by the *second* line in something like:

instructions = """If you have a problem with this, please call Fred
    @ accounts and tell him to reset the modem, then try again.
    If it still doesn't work blah blah blah """

So, not fool-proof, but it does the job.

You use find_decorators like this:

# Process them one at a time.
for decorator_line in find_decorators(open("source.py")):
    print decorator_line

To get them all at once, use:

list_of_decorators = list(find_decorators(open("source.py")))

How can we improve this? At the moment, find_decorators happily returns 
a line like this:

@decorator # This is a comment

but you probably don't care about the comment. So let's make a second 
filter to throw it away:

def remove_comments(lines):
    for line in lines:
        p = line.find('#')
        if p > -1:
            # Keep characters up to but not including p, 
            # ignoring trailing spaces
            yield line[:p].rstrip()
         else:
            yield line

And now apply this filter only to decorator lines:

f = open("source.py")
for decorator in remove_comments(find_decorators(f)):
    print decorator

To get them all at once:

f = open("source.py")
results = list(remove_comments(find_decorators(f)))

Again, this is not foolproof. If you have a decorator like this:

@decorator("this takes a string argument with a # inside it")

the filter will return:

@decorator("this takes a string argument with a

But, and I repeat myself like a broken record, if you want fool-proof, 
you need a proper parser, and that's hard.

-- 
Steven D'Aprano