[Python-ideas] SQL-like way to manipulate Python data structures

Adam Atlas adam at atlas.st
Tue May 29 02:29:49 CEST 2007


On 26 May 2007, at 12.06, Steve Howell wrote:
> Here is the Python code that I wrote:
>
>     def groupDictBy(lst, keyField):
>         dct = {}
>         for item in lst:
>             keyValue = item[keyField]
>             if keyValue not in dct:
>                 dct[keyValue] = []
>             dct[keyValue].append(item)
>         return dct

I think this is basically equivalent to itertools.groupby. You could do:

   for name, events in itertools.groupby(events, lambda e: e 
['name']): ...

Untested, but I'm pretty sure that's what you're doing here. The main  
difference is that it returns an iterable of 2-tuples instead of a  
dictionary (but you can just pass it straight to dict() if need be),  
and that it requires you to pass it a key function (here a simple  
lambda) instead of a dictionary key.

>     dct = groupDictBy(events, 'name')
>     for name in dct:
>         events = dct[name]
>         charges = {}
>         for bucket in ('setup', 'install'):
>             charges[bucket] = sum(
>                     [float(event['charge']) for
>                     event in events
>                     if event['bucket'] == bucket])

I'm not quite sure what the intended function of this is. You group  
dct by the 'name' field, but for each iteration of the loop, you're  
setting the `charges` variable anew, basically throwing away the  
value produced by any previous iteration. That's not intended, is it?

Second, it looks almost like the inner loop could also be an  
itertools.groupby call. Grouping by 'bucket'. Ignoring the previous  
paragraph, I think it would be something like this:

for name, events in itertools.groupby(events, lambda e: e['name']):
     charges = dict([(bucket, sum([float(event['charge']) for event  
in v]))
         for bucket, v in itertools.groupby(events, lambda e: e 
['bucket'])])

And of course if you meant to have 'charges' be an array or a dict or  
something, with a value for each iteration of the outer loop, then  
this could be made into a one-liner by changing the for loop into a  
list comprehension. It's not as terse as SQL, but it gets the job done.

> Comments are welcome on improving the code itself, but
> I wonder if Python 3k (or 4k?) couldn't have some kind
> of native SQL-like ways of manipulating lists and
> dictionaries.  I don't have a proposal myself, just
> wonder if others have felt this kind of pain, and
> maybe it will spark a Pythonic solution.

I don't really have an opinion for now, but if this does happen, I  
think the best way would be to have a new 'table' type supporting  
operations like this. (I think a stdlib module would do; I don't  
think this is quite important enough to warrant a global builtin  
type. And filling `dict` itself with these methods could get  
cluttersome.)

I started writing a paragraph here about what types of methods it  
could have, returning various types of iterators, but then I realized  
that I was basically describing SQLAlchemy. So I think we shouldn't  
reinvent the wheel there; it would be an interesting exercise to see  
if SQLAlchemy could be made to operate on native Python data  
structures, though.



More information about the Python-ideas mailing list