
On 26 May 2007, at 12.06, Steve Howell wrote:
Here is the Python code that I wrote:
def groupDictBy(lst, keyField): dct = {} for item in lst: keyValue = item[keyField] if keyValue not in dct: dct[keyValue] = [] dct[keyValue].append(item) return dct
I think this is basically equivalent to itertools.groupby. You could do: for name, events in itertools.groupby(events, lambda e: e ['name']): ... Untested, but I'm pretty sure that's what you're doing here. The main difference is that it returns an iterable of 2-tuples instead of a dictionary (but you can just pass it straight to dict() if need be), and that it requires you to pass it a key function (here a simple lambda) instead of a dictionary key.
dct = groupDictBy(events, 'name') for name in dct: events = dct[name] charges = {} for bucket in ('setup', 'install'): charges[bucket] = sum( [float(event['charge']) for event in events if event['bucket'] == bucket])
I'm not quite sure what the intended function of this is. You group dct by the 'name' field, but for each iteration of the loop, you're setting the `charges` variable anew, basically throwing away the value produced by any previous iteration. That's not intended, is it? Second, it looks almost like the inner loop could also be an itertools.groupby call. Grouping by 'bucket'. Ignoring the previous paragraph, I think it would be something like this: for name, events in itertools.groupby(events, lambda e: e['name']): charges = dict([(bucket, sum([float(event['charge']) for event in v])) for bucket, v in itertools.groupby(events, lambda e: e ['bucket'])]) And of course if you meant to have 'charges' be an array or a dict or something, with a value for each iteration of the outer loop, then this could be made into a one-liner by changing the for loop into a list comprehension. It's not as terse as SQL, but it gets the job done.
Comments are welcome on improving the code itself, but I wonder if Python 3k (or 4k?) couldn't have some kind of native SQL-like ways of manipulating lists and dictionaries. I don't have a proposal myself, just wonder if others have felt this kind of pain, and maybe it will spark a Pythonic solution.
I don't really have an opinion for now, but if this does happen, I think the best way would be to have a new 'table' type supporting operations like this. (I think a stdlib module would do; I don't think this is quite important enough to warrant a global builtin type. And filling `dict` itself with these methods could get cluttersome.) I started writing a paragraph here about what types of methods it could have, returning various types of iterators, but then I realized that I was basically describing SQLAlchemy. So I think we shouldn't reinvent the wheel there; it would be an interesting exercise to see if SQLAlchemy could be made to operate on native Python data structures, though.