Namedtuples: some unexpected inconveniences
MRAB
python at mrabarnett.plus.com
Fri Apr 14 17:18:30 EDT 2017
On 2017-04-14 20:34, Deborah Swanson wrote:
> Peter,
>
> Retracing my steps to rewrite the getattr(row, label) code, this is what
> sent me down the rabbit hole in the first place. (I changed your 'rows'
> to 'records' just to use the same name everywhere, but all else is the
> same as you gave me.) I'd like you to look at it and see if you still
> think complete(group, label) should work. Perhaps seeing why it fails
> will clarify some of the difficulties I'm having.
>
> I ran into problems with values and has_empty. values has a problem
> because
> row[label] gets a TypeError. has_empty has a problem because a list of
> field values will be shorter with missing values than a full list, but a
> namedtuple with missing values will be the same length as a full
> namedtuple since missing values have '' placeholders. Two more
> unexpected inconveniences.
>
In the line:
values = {row[label] for row in group}
'group' is a list of records; row is a record (namedtuple).
You can get the members of a namedtuple (also 'normal' tuple) by numeric
index, e.g. row[0], but the point of a namedtuple is that you can get
them by name, as an attribute, e.g. row.Location.
As the name of the attribute isn't fixed, but passed by name, use
getattr(row, label) instead:
values = {getattr(row, label) for row in group}
As for the values:
# Remove the missing value, if present.
values.discard('')
# There's only 1 value left, so fill in the empty places.
if len(values) == 1:
...
The next point is that namedtuples, like normal tuples, are immutable.
You can't change the value of an attribute.
> A short test csv is at the end, for you to read in and attempt to
> execute the following code, and I'm still working on reconstructing the
> lost getattr(row, label) code.
>
> import csv
> from collections import namedtuple, defaultdict
>
> def get_title(row):
> return row.title
>
> def complete(group, label):
> values = {row[label] for row in group}
> # get "TypeError: tuple indices must be integers, not str"
> has_empty = not min(values, key=len)
> if len(values) - has_empty != 1:
> # no value or multiple values; manual intervention needed
> return False
> elif has_empty:
> for row in group:
> row[label] = max(values, key=len)
> return True
>
> infile = open("E:\\Coding projects\\Pycharm\\Moving\\Moving 2017 in -
> test.csv")
> rows = csv.reader(infile)
> fieldnames = next(rows)
> Record = namedtuple("Record", fieldnames)
> records = [Record._make(fieldnames)]
> records.extend(Record._make(row) for row in rows)
>
> # group rows by title
> groups = defaultdict(list)
> for row in records:
> groups[get_title(row)].append(row)
>
> LABELS = ['Location', 'Kind', 'Notes']
>
> # add missing values
> for group in groups.values():
> for label in LABELS:
> complete(group, label)
>
> Moving 2017 in - test.csv:
> (If this doesn't come through the mail system correctly, I've also
> uploaded the file to
> http://deborahswanson.net/python/Moving%202017%20in%20-%20test.csv.
> Permissions should be set correctly, but let me know if you run into
> problems downloading the file.)
>
[snip]
More information about the Python-list
mailing list