[Tutor] Iterating over a long list with regular expressions and changing each item?

Mon May 4 18:17:53 CEST 2009

Original:
 'case_def_gen':['case_def','gen','null'],
 'nsuff_fem_pl':['nsuff','null', 'null'],
 'abbrev': ['abbrev, null, null'],
 'adj': ['adj, null, null'],
 'adv': ['adv, null, null'],}

Note the values for 'abbrev', 'adj' and 'adv' are not lists, but strings
containing comma-separated lists.

Should be:
 'case_def_gen':['case_def','gen','null'],
 'nsuff_fem_pl':['nsuff','null', 'null'],
 'abbrev': ['abbrev', 'null', 'null'],
 'adj': ['adj', 'null', 'null'],
 'adv': ['adv', 'null', 'null'],}

For much of my own code, I find lists of string literals to be tedious to
enter, and easy to drop a ' character.  This style is a little easier on the
eyes, and harder to screw up.

 'case_def_gen':['case_def gen null'.split()],
 'nsuff_fem_pl':['nsuff null null'.split()],
 'abbrev': ['abbrev null null'.split()],
 'adj': ['adj null null'.split()],
 'adv': ['adv null null'.split()],}

Since all that your code does at runtime with the value strings is
"\t".join() them, then you might as well initialize the dict with these
computed values, for at least some small gain in runtime performance:

 T = lambda s : "\t".join(s.split())
 'case_def_gen' : T('case_def gen null'),
 'nsuff_fem_pl' : T('nsuff null null'),
 'abbrev' :       T('abbrev null null'),
 'adj' :          T('adj null null'),
 'adv' :          T('adv null null'),}
 del T

(Yes, I know PEP8 says *not* to add spaces to line up assignments or other
related values, but I think there are isolated cases where it does help to
see what's going on.  You could even write this as:

 T = lambda s : "\t".join(s.split())
 'case_def_gen' : T('case_def  gen  null'),
 'nsuff_fem_pl' : T('nsuff     null null'),
 'abbrev' :       T('abbrev    null null'),
 'adj' :          T('adj       null null'),
 'adv' :          T('adv       null null'),}
 del T

and the extra spaces help you to see the individual subtags more easily,
with no change in the resulting values since split() splits on multiple
whitespace the same as a single space.)

Of course you could simply code as:

 'case_def_gen' : T('case_def\tgen\t null'),
 'nsuff_fem_pl' : T('nsuff\tnull\tnull'),
 'abbrev' :       T('abbrev\tnull\tnull'),
 'adj' :          T('adj\tnull\tnull'),
 'adv' :          T('adv\tnull\tnull'),}

But I think readability definitely suffers here, I would probably go with
the penultimate version.

-- Paul