RFC: Assignment as expression (pre-PEP)

Thu Apr 5 18:51:51 EDT 2007

darklord at timehorse.com wrote:
> On Apr 5, 4:22 pm, Duncan Booth <duncan.bo... at invalid.invalid> wrote:
>> Can you come up with a real example where this happens and which cannot be
>> easily rewritten to provide better, clearer code without the indentation?
>>
>> I'll admit to having occasionally had code not entirely dissimilar to this
>> when first written, but I don't believe it has ever survived more than a
>> few minutes before being refactored into a cleaner form. I would claim that
>> it is a good thing that Python makes it obvious that code like this should
>> be refactored.
> 
> I am trying to write a parser for a text string.  Specifically, I am
> trying to take a filename that contains meta-data about the content of
> the A/V file (mpg, mp3, etc.).
> 
> I first split the filename into fields separated by spaces and dots.
> 
> Then I have a series of regular expression matches.  I like
> Cartesian's 'event-based' parser approach though the even table gets a
> bit unwieldy as it grows.  Also, I would prefer to have the 'action'
> result in a variable assignment specific to the test.  E.g.
> 
> def parseName(name):
>     fields = sd.split(name)
>     fields, ext = fields[:-1], fields[-1]
>     year = ''
>     capper = ''
>     series = None
>     episodeNum = None
>     programme = ''
>     episodeName = ''
>     past_title = false
>     for f in fields:
>         if year_re.match(f):
>             year = f
>             past_title = True
>         else:
>             my_match = capper_re.match(f):
>             if my_match:
>                 capper = capper_re.match(f).group(1)
>                 if capper == 'JJ' or capper == 'JeffreyJacobs':
>                     capper = 'Jeffrey C. Jacobs'
>                 past_title = True
>             else:
>                 my_match = epnum_re.match(f):
>                 if my_match:
>                     series, episodeNum = my_match.group('series',
> 'episode')
>                     past_title = True
>                 else:
>                     # If I think of other parse elements, they go
> here.
>                     # Otherwise, name is part of a title; check for
> capitalization
>                     if f[0] >= 'a' and f[0] <= 'z' and f not in
> do_not_capitalize:
>                         f = f.capitalize()
>                     if past_title:
>                         if episodeName: episodeName += ' '
>                         episodeName += f
>                     else:
>                         if programme: programme += ' '
>                         programme += f
> 
>     return programme, series, episodeName, episodeNum, year, capper,
> ext

Why can't you combine your regular expressions into a single expression, 
e.g. something like::

     >>> exp = r'''
     ... (?P<year>\d{4})
     ... |
     ... by\[(?P<capper>.*)\]
     ... |
     ... S(?P<series>\d\d)E(?P<episode>\d\d)
     ... '''
     >>> matcher = re.compile(exp, re.VERBOSE)
     >>> matcher.match('1990').groupdict()
     {'series': None, 'capper': None, 'episode': None, 'year': '1990'}
     >>> matcher.match('by[Jovev]').groupdict()
     {'series': None, 'capper': 'Jovev', 'episode': None, 'year': None}
     >>> matcher.match('S01E12').groupdict()
     {'series': '01', 'capper': None, 'episode': '12', 'year': None}

Then your code above would look something like::

     for f in fields:
         match = matcher.match(f)
         if match is not None:
             year = match.group('year')
             capper = match.group('capper')
             if capper == 'JJ' or capper == 'JeffreyJacobs':
                 capper = 'Jeffrey C. Jacobs'
             series = match.group('series')
             episodeNum = match.group('episode')
             past_title = True
         else:
             if 'a' <= f[0] <= 'z' and f not in do_not_capitalize:
                 f = f.capitalize()
             if past_title:
                 if episodeName:
                     episodeName += ' '
                 episodeName += f
             else:
                 if programme:
                     programme += ' '
                 programme += f

STeVe