[Tutor] decorators
Steven D'Aprano
steve at pearwood.info
Fri Jul 23 21:33:27 CEST 2010
On Sat, 24 Jul 2010 04:23:41 am Mary Morris wrote:
> I'm trying to compile a list of decorators from the source code at my
> office.
> I did this by doing a
>
> candidate_line.find("@")
>
> because all of our decorators start with the @ symbol. The problem
> I'm having is that the email addresses that are included in the
> comments are getting included in the list that is getting returned.
First of all, to solve this problem *properly* you will need a proper
parser to walk over the code and look for decorators, ignoring
comments, skipping over strings, and similar. But that's hard, or at
least I have no idea how to do it, so the alternative is a basic filter
like you are doing.
If you're using Linux, Mac or some other Unix, the fastest solution
would be to use grep. But ignoring that, think about what a decorator
line is. You suggest above that a candidate line is a decorator if it
has a @ sign in it. But that's incorrect. This is not a decorator:
# send an email to steve at something.net or george at example.gov.au
But this might be:
@decorator
So let's start with a simple little generator to return lines as a
candidate decorator only if it *starts* with an ampersand:
def find_decorators(lines):
"""Return likely decorators from lines of text."""
for line in lines:
line = line.lstrip() # ignore leading spaces
if line.startswith('@'):
yield line
That's still not fool-proof, only a proper Python parser will be
fool-proof. This will be fooled by the *second* line in something like:
instructions = """If you have a problem with this, please call Fred
@ accounts and tell him to reset the modem, then try again.
If it still doesn't work blah blah blah """
So, not fool-proof, but it does the job.
You use find_decorators like this:
# Process them one at a time.
for decorator_line in find_decorators(open("source.py")):
print decorator_line
To get them all at once, use:
list_of_decorators = list(find_decorators(open("source.py")))
How can we improve this? At the moment, find_decorators happily returns
a line like this:
@decorator # This is a comment
but you probably don't care about the comment. So let's make a second
filter to throw it away:
def remove_comments(lines):
for line in lines:
p = line.find('#')
if p > -1:
# Keep characters up to but not including p,
# ignoring trailing spaces
yield line[:p].rstrip()
else:
yield line
And now apply this filter only to decorator lines:
f = open("source.py")
for decorator in remove_comments(find_decorators(f)):
print decorator
To get them all at once:
f = open("source.py")
results = list(remove_comments(find_decorators(f)))
Again, this is not foolproof. If you have a decorator like this:
@decorator("this takes a string argument with a # inside it")
the filter will return:
@decorator("this takes a string argument with a
But, and I repeat myself like a broken record, if you want fool-proof,
you need a proper parser, and that's hard.
--
Steven D'Aprano
More information about the Tutor
mailing list