Cleaning up conditionals
Peter Otten
__peter__ at web.de
Sat Dec 31 21:16:00 EST 2016
Deborah Swanson wrote:
> Peter Otten wrote:
>> Deborah Swanson wrote:
>>
>> > Here I have a real mess, in my opinion:
>>
>> [corrected code:]
>>
>> > if len(l1[st]) == 0:
>> > if len(l2[st]) > 0:
>> > l1[st] = l2[st]
>> > elif len(l2[st]) == 0:
>> > if len(l1[st]) > 0:
>> > l2[st] = l1[st]
>>
>> > Anybody know or see an easier (more pythonic) way to do
>> this? I need
>> > to do it for four fields, and needless to say, that's a really long
>> > block of ugly code.
>>
>> By "four fields", do you mean four values of st, or four
>> pairs of l1, l2, or
>> more elif-s with l3 and l4 -- or something else entirely?
>>
>> Usually the most obvious way to avoid repetition is to write
>> a function, and
>> to make the best suggestion a bit more context is necessary.
>>
>
> I did write a function for this, and welcome any suggestions for
> improvement.
>
> The context is comparing 2 adjacent rows of data (in a list of real
> estate listings sorted by their webpage titles and dates) with the
> assumption that if the webpage titles are the same, they're listings for
> the same property. This assumption is occasionally bad, but in far less
> than one per 1000 unique listings. I'd rather just hand edit the data in
> those cases so one webpage title is slightly different, than writing and
> executing all the code needed to find and handle these corner cases.
> Maybe that will be a future refinement, but right now I don't really
> need it.
>
> Once two rows of listing data have been identified as different dates
> for the same property, there are 4 fields that will be identical for
> both rows. There can be up to 10 (or even more) listings identical
> except for the date, but typically I'm just adding a new one and want to
> copy the field data from its previous siblings, so the copying is just
> from the last listing to the new one.
>
> Here's the function I have so far:
>
> def comprows(l1,l2,st,ki,no):
> ret = ''
> labels = {st: 'st/co', ki: 'kind', no: 'notes'}
> for v in (st,ki,no):
> if len(l1[v]) == 0 and len(l2[v]) != 0:
> l1[v] = l2[v]
> elif len(l2[v]) == 0 and len(l1[v]) != 0:
> l2[v] = l1[v]
> elif l1[v] != l2[v]:
> ret += ", " + labels[v] + " diff" if len(ret) > 0 else
> labels[v] + " diff"
> return ret
>
> The 4th field is a special case and easily dispatched in one line of
> code before this function is called for the other 3.
>
> l1 and l2 are the 2 adjacent rows of listing data, with st,ki,no holding
> codes for state/county, kind (of property) and notes. I want the
> checking and copying to go both ways because sometimes I'm backfilling
> old listings that I didn't pick up in my nightly copies on their given
> dates, but came across them later.
>
> ret is returned to a field with details to look at when I save the list
> to csv and open it in Excel. The noted diffs will need to be reconciled.
>
> I tried to use Jussi Piitulainen's suggestion to chain the conditionals,
> but just couldn't make it work for choosing list elements to assign to,
> although the approach is perfect if you're computing a value.
>
> Hope this is enough context... ;)
At least the code into which I translate your description differs from the
suggestions you have got so far. The main differences:
- Look at the whole group, not just two lines
- If there is more than one non-empty value in the group don't change any
value.
from collections import defaultdict
def get_title(row):
return row[...]
def complete(group, label):
"""For every row in the group set row[label] to a non-empty value
if there is exactly one such value.
Returns True if values can be set consistently.
group is supposed to be a list of dicts.
>>> def c(g):
... gg = [{"whatever": value} for value in g]
... if not complete(gg, "whatever"):
... print("fixme", end=" ")
... return [row["whatever"] for row in gg]
>>> c(["", "a", ""])
['a', 'a', 'a']
>>> c(["", "a", "a"])
['a', 'a', 'a']
>>> c(["", "a", "b"])
fixme ['', 'a', 'b']
>>> c(["a"])
['a']
>>> c([''])
fixme ['']
"""
values = {row[label] for row in group}
has_empty = not min(values, key=len)
if len(values) - has_empty != 1:
# no value or multiple values; manual intervention needed
return False
elif has_empty:
for row in group:
row[label] = max(values, key=len)
return True
if __name__ == "__main__":
# read rows
rows = ...
# group rows by title
groups = collections.defaultdict(list)
for row in rows:
groups[get_title(row)].append(row)
LABELS = ['st/co', 'kind', 'notes']
# add missing values
for group in groups.values():
for label in LABELS:
complete(group, label)
# write rows
...
More information about the Python-list
mailing list