RFC: Assignment as expression (pre-PEP)

Thu Apr 5 19:09:43 EDT 2007

Steven Bethard wrote:
> darklord at timehorse.com wrote:
>> On Apr 5, 4:22 pm, Duncan Booth <duncan.bo... at invalid.invalid> wrote:
>>> Can you come up with a real example where this happens and which 
>>> cannot be
>>> easily rewritten to provide better, clearer code without the 
>>> indentation?
>>>
>>> I'll admit to having occasionally had code not entirely dissimilar to 
>>> this
>>> when first written, but I don't believe it has ever survived more than a
>>> few minutes before being refactored into a cleaner form. I would 
>>> claim that
>>> it is a good thing that Python makes it obvious that code like this 
>>> should
>>> be refactored.
>>
>> I am trying to write a parser for a text string.  Specifically, I am
>> trying to take a filename that contains meta-data about the content of
>> the A/V file (mpg, mp3, etc.).
>>
>> I first split the filename into fields separated by spaces and dots.
>>
>> Then I have a series of regular expression matches.  I like
>> Cartesian's 'event-based' parser approach though the even table gets a
>> bit unwieldy as it grows.  Also, I would prefer to have the 'action'
>> result in a variable assignment specific to the test.  E.g.
>>
>> def parseName(name):
>>     fields = sd.split(name)
>>     fields, ext = fields[:-1], fields[-1]
>>     year = ''
>>     capper = ''
>>     series = None
>>     episodeNum = None
>>     programme = ''
>>     episodeName = ''
>>     past_title = false
>>     for f in fields:
>>         if year_re.match(f):
>>             year = f
>>             past_title = True
>>         else:
>>             my_match = capper_re.match(f):
>>             if my_match:
>>                 capper = capper_re.match(f).group(1)
>>                 if capper == 'JJ' or capper == 'JeffreyJacobs':
>>                     capper = 'Jeffrey C. Jacobs'
>>                 past_title = True
>>             else:
>>                 my_match = epnum_re.match(f):
>>                 if my_match:
>>                     series, episodeNum = my_match.group('series',
>> 'episode')
>>                     past_title = True
>>                 else:
>>                     # If I think of other parse elements, they go
>> here.
>>                     # Otherwise, name is part of a title; check for
>> capitalization
>>                     if f[0] >= 'a' and f[0] <= 'z' and f not in
>> do_not_capitalize:
>>                         f = f.capitalize()
>>                     if past_title:
>>                         if episodeName: episodeName += ' '
>>                         episodeName += f
>>                     else:
>>                         if programme: programme += ' '
>>                         programme += f
>>
>>     return programme, series, episodeName, episodeNum, year, capper,
>> ext
> 
> Why can't you combine your regular expressions into a single expression, 
> e.g. something like::
> 
>     >>> exp = r'''
>     ... (?P<year>\d{4})
>     ... |
>     ... by\[(?P<capper>.*)\]
>     ... |
>     ... S(?P<series>\d\d)E(?P<episode>\d\d)
>     ... '''
>     >>> matcher = re.compile(exp, re.VERBOSE)
>     >>> matcher.match('1990').groupdict()
>     {'series': None, 'capper': None, 'episode': None, 'year': '1990'}
>     >>> matcher.match('by[Jovev]').groupdict()
>     {'series': None, 'capper': 'Jovev', 'episode': None, 'year': None}
>     >>> matcher.match('S01E12').groupdict()
>     {'series': '01', 'capper': None, 'episode': '12', 'year': None}
> 
> Then your code above would look something like::
> 
>     for f in fields:
>         match = matcher.match(f)
>         if match is not None:
>             year = match.group('year')
>             capper = match.group('capper')
>             if capper == 'JJ' or capper == 'JeffreyJacobs':
>                 capper = 'Jeffrey C. Jacobs'
>             series = match.group('series')
>             episodeNum = match.group('episode')
>             past_title = True

I guess you need to be a little more careful here not to overwrite your 
old values, e.g. something like::

     year = match.group('year') or year
     capper = match.group('capper') or capper
     ...

STeVe