RFC: Assignment as expression (pre-PEP)
Steven Bethard
steven.bethard at gmail.com
Thu Apr 5 18:51:51 EDT 2007
darklord at timehorse.com wrote:
> On Apr 5, 4:22 pm, Duncan Booth <duncan.bo... at invalid.invalid> wrote:
>> Can you come up with a real example where this happens and which cannot be
>> easily rewritten to provide better, clearer code without the indentation?
>>
>> I'll admit to having occasionally had code not entirely dissimilar to this
>> when first written, but I don't believe it has ever survived more than a
>> few minutes before being refactored into a cleaner form. I would claim that
>> it is a good thing that Python makes it obvious that code like this should
>> be refactored.
>
> I am trying to write a parser for a text string. Specifically, I am
> trying to take a filename that contains meta-data about the content of
> the A/V file (mpg, mp3, etc.).
>
> I first split the filename into fields separated by spaces and dots.
>
> Then I have a series of regular expression matches. I like
> Cartesian's 'event-based' parser approach though the even table gets a
> bit unwieldy as it grows. Also, I would prefer to have the 'action'
> result in a variable assignment specific to the test. E.g.
>
> def parseName(name):
> fields = sd.split(name)
> fields, ext = fields[:-1], fields[-1]
> year = ''
> capper = ''
> series = None
> episodeNum = None
> programme = ''
> episodeName = ''
> past_title = false
> for f in fields:
> if year_re.match(f):
> year = f
> past_title = True
> else:
> my_match = capper_re.match(f):
> if my_match:
> capper = capper_re.match(f).group(1)
> if capper == 'JJ' or capper == 'JeffreyJacobs':
> capper = 'Jeffrey C. Jacobs'
> past_title = True
> else:
> my_match = epnum_re.match(f):
> if my_match:
> series, episodeNum = my_match.group('series',
> 'episode')
> past_title = True
> else:
> # If I think of other parse elements, they go
> here.
> # Otherwise, name is part of a title; check for
> capitalization
> if f[0] >= 'a' and f[0] <= 'z' and f not in
> do_not_capitalize:
> f = f.capitalize()
> if past_title:
> if episodeName: episodeName += ' '
> episodeName += f
> else:
> if programme: programme += ' '
> programme += f
>
> return programme, series, episodeName, episodeNum, year, capper,
> ext
Why can't you combine your regular expressions into a single expression,
e.g. something like::
>>> exp = r'''
... (?P<year>\d{4})
... |
... by\[(?P<capper>.*)\]
... |
... S(?P<series>\d\d)E(?P<episode>\d\d)
... '''
>>> matcher = re.compile(exp, re.VERBOSE)
>>> matcher.match('1990').groupdict()
{'series': None, 'capper': None, 'episode': None, 'year': '1990'}
>>> matcher.match('by[Jovev]').groupdict()
{'series': None, 'capper': 'Jovev', 'episode': None, 'year': None}
>>> matcher.match('S01E12').groupdict()
{'series': '01', 'capper': None, 'episode': '12', 'year': None}
Then your code above would look something like::
for f in fields:
match = matcher.match(f)
if match is not None:
year = match.group('year')
capper = match.group('capper')
if capper == 'JJ' or capper == 'JeffreyJacobs':
capper = 'Jeffrey C. Jacobs'
series = match.group('series')
episodeNum = match.group('episode')
past_title = True
else:
if 'a' <= f[0] <= 'z' and f not in do_not_capitalize:
f = f.capitalize()
if past_title:
if episodeName:
episodeName += ' '
episodeName += f
else:
if programme:
programme += ' '
programme += f
STeVe
More information about the Python-list
mailing list