RFC: Assignment as expression (pre-PEP)
Gabriel Genellina
gagsl-py2 at yahoo.com.ar
Thu Apr 5 19:25:04 EDT 2007
En Thu, 05 Apr 2007 18:08:46 -0300, darklord at timehorse.com
<TimeHorse at gmail.com> escribió:
> I am trying to write a parser for a text string. Specifically, I am
> trying to take a filename that contains meta-data about the content of
> the A/V file (mpg, mp3, etc.).
>
> I first split the filename into fields separated by spaces and dots.
>
> Then I have a series of regular expression matches. I like
> Cartesian's 'event-based' parser approach though the even table gets a
> bit unwieldy as it grows. Also, I would prefer to have the 'action'
> result in a variable assignment specific to the test. E.g.
>
> def parseName(name):
> fields = sd.split(name)
> fields, ext = fields[:-1], fields[-1]
> year = ''
> capper = ''
> series = None
> episodeNum = None
> programme = ''
> episodeName = ''
> past_title = false
> for f in fields:
> if year_re.match(f):
> year = f
> past_title = True
> else:
> my_match = capper_re.match(f):
> if my_match:
> capper = capper_re.match(f).group(1)
> if capper == 'JJ' or capper == 'JeffreyJacobs':
> capper = 'Jeffrey C. Jacobs'
> past_title = True
> else:
> my_match = epnum_re.match(f):
> if my_match:
> series, episodeNum = my_match.group('series',
> 'episode')
> past_title = True
> else:
> # If I think of other parse elements, they go
> here.
> # Otherwise, name is part of a title; check for
> capitalization
> if f[0] >= 'a' and f[0] <= 'z' and f not in
> do_not_capitalize:
> f = f.capitalize()
> if past_title:
> if episodeName: episodeName += ' '
> episodeName += f
> else:
> if programme: programme += ' '
> programme += f
>
> return programme, series, episodeName, episodeNum, year, capper,
> ext
>
> Now, the problem with this code is that it assumes only 2 pieces of
> free-form meta-data in the name (i.e. Programme Name and Episode
> Name). Also, although this is not directly adaptable to Cartesian's
> approach, you COULD rewrite it using a dictionary in the place of
> local variable names so that the event lookup could consist of 3
> properties per event: compiled_re, action_method, dictionary_string.
> But even with that, in the case of the epnum match, two assignments
> are required so perhaps a convoluted scheme such that if
> dictionary_string is a list, for each of the values returned by
> action_method, bind the result to the corresponding ith dictionary
> element named in dictionary_string, which seems a bit convoluted. And
> the fall-through case is state-dependent since the 'unrecognized
> field' should be shuffled into a different variable dependent on
> state. Still, if there is a better approach I am certainly up for
> it. I love event-based parsers so I have no problem with that
> approach in general.
Maybe it's worth using a class instance. Define methods to handle each
matching regex, and keep state in the instance.
class NameParser:
def handle_year(self, field, match):
self.year = field
self.past_title = True
def handle_capper(self, field, match):
capper = match.group(1)
if capper == 'JJ' or capper == 'JeffreyJacobs':
capper = 'Jeffrey C. Jacobs'
self.capper = capper
self.past_title = True
def parse(self, name):
fields = sd.split(name)
for field in fields:
for regex,handler in self.handlers:
match = regex.match(field)
if match:
handler(self, field, match)
break
You have to build the handlers list, containing (regex, handler) items;
the "unknown" case might be a match-all expression at the end.
Well, after playing a bit with decorators I got this:
class NameParser:
year = ''
capper = ''
series = None
episodeNum = None
programme = ''
episodeName = ''
past_title = False
handlers = []
def __init__(self, name):
self.name = name
self.parse()
def handle_this(regex, handlers=handlers):
# A decorator; associates the function to the regex
# (Not intended to be used as a normal method! not even a static
method!)
def register(function, regex=regex):
handlers.append((re.compile(regex), function))
return function
return register
@handle_this(r"\(?\d+\)?")
def handle_year(self, field, match):
self.year = field
self.past_title = True
@handle_this(r"(expression)")
def handle_capper(self, field, match):
capper = match.group(1)
if capper == 'JJ' or capper == 'JeffreyJacobs':
capper = 'Jeffrey C. Jacobs'
self.capper = capper
self.past_title = True
@handle_this(r".*")
def handle_unknown(self, field, match):
if field[0] >= 'a' and field[0] <= 'z' and field not in
do_not_capitalize:
field = field.capitalize()
if self.past_title:
if self.episodeName: self.episodeName += ' '
self.episodeName += field
else:
if self.programme: self.programme += ' '
self.programme += field
def parse(self):
fields = sd.split(self.name)
for field in fields:
for regex,handler in self.handlers:
match = regex.match(field)
if match:
handler(self, field, match)
break
--
Gabriel Genellina
More information about the Python-list
mailing list