SQL-like way to manipulate Python data structures
![](https://secure.gravatar.com/avatar/b60683e6dacd4556ae15430a7c9b6f13.jpg?s=120&d=mm&r=g)
I find myself often writing somewhat tedious code in Python that is just slicing/aggregating fairly simple data structures like a list of dictionaries. Here is something that I'm trying to express, roughly: charges = sum float(charge) from events on setup, install group by name Here is the Python code that I wrote: def groupDictBy(lst, keyField): dct = {} for item in lst: keyValue = item[keyField] if keyValue not in dct: dct[keyValue] = [] dct[keyValue].append(item) return dct dct = groupDictBy(events, 'name') for name in dct: events = dct[name] charges = {} for bucket in ('setup', 'install'): charges[bucket] = sum( [float(event['charge']) for event in events if event['bucket'] == bucket]) Comments are welcome on improving the code itself, but I wonder if Python 3k (or 4k?) couldn't have some kind of native SQL-like ways of manipulating lists and dictionaries. I don't have a proposal myself, just wonder if others have felt this kind of pain, and maybe it will spark a Pythonic solution. ____________________________________________________________________________________Sick sense of humor? Visit Yahoo! TV's Comedy with an Edge to see what's on, when. http://tv.yahoo.com/collections/222
![](https://secure.gravatar.com/avatar/b350a41f20160c20ff0bddf629f5955d.jpg?s=120&d=mm&r=g)
On 26 May 2007, at 17:06, Steve Howell wrote:
I find myself often writing somewhat tedious code in Python that is just slicing/aggregating fairly simple data structures like a list of dictionaries.
Here is something that I'm trying to express, roughly:
charges = sum float(charge) from events on setup, install group by name
I don't really understand that :)
Here is the Python code that I wrote:
def groupDictBy(lst, keyField): dct = {} for item in lst: keyValue = item[keyField] if keyValue not in dct: dct[keyValue] = [] dct[keyValue].append(item) return dct
isn't that itertools.groupby?
dct = groupDictBy(events, 'name') for name in dct: events = dct[name] charges = {} for bucket in ('setup', 'install'): charges[bucket] = sum( [float(event['charge']) for event in events if event['bucket'] == bucket])
from itertools import groupby from operator import itemgetter charges = {} for name, n_evts in groupby(events, itemgetter('name')): charges[name] = dict((bucket, sum(float(e['charge']) for e in b_evts) for bucket, b_evts in groupby(n_evts, itemgetter('bucket')) if bucket in ('setup', 'install')) The last line can be omitted if the only two existing buckets are 'setup' and 'install'. Untested! I've changed your code slightly as in your snippet 'charges' is reset to {} at each iteration of the loop. -- Arnaud
![](https://secure.gravatar.com/avatar/b60683e6dacd4556ae15430a7c9b6f13.jpg?s=120&d=mm&r=g)
--- Arnaud Delobelle <arno@marooned.org.uk> wrote:
On 26 May 2007, at 17:06, Steve Howell wrote:
I find myself often writing somewhat tedious code in Python that is just slicing/aggregating fairly simple data structures like a list of dictionaries.
Here is something that I'm trying to express, roughly:
charges = sum float(charge) from events on setup, install group by name
I don't really understand that :)
Let me try again: select name, bucket, sum(float(charge)) from events where bucket in ('install', 'setup') group by name, bucket The idea here is that events is a list of Python dictionaries keyed by fields "name", "bucket", and "charge", and I only care about install/setup charges. Here is a rough Python statement of what I'm trying to do: charges = {} for name, n_evts in groupby(events, itemgetter('name')): print name print dict([(bucket, sum([float(e['charge']) for e in b_evts])) for bucket, b_evts in groupby(n_evts, itemgetter('bucket')) if bucket in ('setup', 'install')])
Here is the Python code that I wrote:
def groupDictBy(lst, keyField): dct = {} for item in lst: keyValue = item[keyField] if keyValue not in dct: dct[keyValue] = [] dct[keyValue].append(item) return dct
isn't that itertools.groupby?
The problem with itertools.groupby is that it assumes sorted lists. So the code below does not behave how you might expect. It reports Joe's charges twice, and it loses Steve's $50 install charge. from itertools import groupby from operator import itemgetter events = [ {'name': 'Joe', 'bucket': 'install', 'charge': 100}, {'name': 'Joe', 'bucket': 'setup', 'charge': 20}, {'name': 'Steve', 'bucket': 'install', 'charge': 50}, {'name': 'Steve', 'bucket': 'setup', 'charge': 30}, {'name': 'Steve', 'bucket': 'install', 'charge': 1000}, {'name': 'Steve', 'bucket': 'upgrade', 'charge': 440}, {'name': 'Joe', 'bucket': 'setup', 'charge': 40}, ] charges = {} for name, n_evts in groupby(events, itemgetter('name')): print name print dict([(bucket, sum([float(e['charge']) for e in b_evts])) for bucket, b_evts in groupby(n_evts, itemgetter('bucket')) if bucket in ('setup', 'install')]) ____________________________________________________________________________________Pinpoint customers who are looking for what you sell. http://searchmarketing.yahoo.com/
![](https://secure.gravatar.com/avatar/7e4e7569d64e14de784aca9f9a8fffb4.jpg?s=120&d=mm&r=g)
On Sat, May 26, 2007, Steve Howell wrote:
Comments are welcome on improving the code itself, but I wonder if Python 3k (or 4k?) couldn't have some kind of native SQL-like ways of manipulating lists and dictionaries. I don't have a proposal myself, just wonder if others have felt this kind of pain, and maybe it will spark a Pythonic solution.
What's wrong with sqlite? -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "Look, it's your affair if you want to play with five people, but don't go calling it doubles." --John Cleese anticipates Usenet
![](https://secure.gravatar.com/avatar/b60683e6dacd4556ae15430a7c9b6f13.jpg?s=120&d=mm&r=g)
--- Aahz <aahz@pythoncraft.com> wrote:
What's wrong with sqlite?
Nothing, but I want Python itself to give me the SQL syntax to manipulate lists-of-dictionaries like lists of dictionaries, without having to go through some a module. In the example that I gave earlier in the thread, what would the sqlite solution look like? To transform any particular list of dicitionaries into another list of dictionaries, would I have to do a create-table in sqlite, or does it already have some kind of mode where it can work directly on a Python dataset? ____________________________________________________________________________________ Be a PS3 game guru. Get your game face on with the latest PS3 news and previews at Yahoo! Games. http://videogames.yahoo.com/platform?platform=120121
![](https://secure.gravatar.com/avatar/7e4e7569d64e14de784aca9f9a8fffb4.jpg?s=120&d=mm&r=g)
On Mon, May 28, 2007, Steve Howell wrote:
--- Aahz <aahz@pythoncraft.com> wrote:
What's wrong with sqlite?
Nothing, but I want Python itself to give me the SQL syntax to manipulate lists-of-dictionaries like lists of dictionaries, without having to go through some a module.
In the example that I gave earlier in the thread, what would the sqlite solution look like? To transform any particular list of dicitionaries into another list of dictionaries, would I have to do a create-table in sqlite, or does it already have some kind of mode where it can work directly on a Python dataset?
You'd have to convert your data into sqlite tables, of course, but: * It's not much work * Your specific needs for operating on data are likely to be different from everyone else's -- the reason SQL works is precisely because it imposes constraints on data format To the extent that Python provides anything roughly comparable to SQL, use listcomps and genexps. Think of it this way: if Python requires you to use a library for regex manipulation, what makes SQL-like data processing sufficiently special that it belongs in the language itself? -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "Look, it's your affair if you want to play with five people, but don't go calling it doubles." --John Cleese anticipates Usenet
![](https://secure.gravatar.com/avatar/b60683e6dacd4556ae15430a7c9b6f13.jpg?s=120&d=mm&r=g)
--- Aahz <aahz@pythoncraft.com> wrote:
Think of it this way: if Python requires you to use a library for regex manipulation, what makes SQL-like data processing sufficiently special that it belongs in the language itself?
There is a difference. Regexes would never leverage python expressions *inside* the regex, even if they were more directly integrated into the language like Perl. (And, for the record, I prefer Python's philosophy to make regexes a module, but I do appreciate the short name, re, and the fact that it's battery included). In the case of SQL, I can imagine something like this: def convert_to_euros(salary): ... result_set = [[[ select deptno, total(convert_to_euros(salary)) from emp where country = 'France' group by deptno ]]] # where emp is a list of dictionaries Obviously, I understand that you can solve this problem in Python now, but you either have to do this: 1) Use more bulky way to describe the expression: sql(..., criteria=lambda row: row['country'] == 'France') 2) Rely on some library to parse the SQL for you, which is good enough for the year 2007, but I wonder why the Python interpreter itself couldn't interpret the SQL to bytecode (at startup time) just as easily as, say, SQLAlchemy or sqlite, does at runtime. ____________________________________________________________________________________Pinpoint customers who are looking for what you sell. http://searchmarketing.yahoo.com/
![](https://secure.gravatar.com/avatar/dc7c3f884a2a10c0b32b95dd00dae51e.jpg?s=120&d=mm&r=g)
Steve Howell schrieb:
--- Aahz <aahz@pythoncraft.com> wrote:
What's wrong with sqlite?
Nothing, but I want Python itself to give me the SQL syntax to manipulate lists-of-dictionaries like lists of dictionaries, without having to go through some a module.
Are there any non-SQL use cases for SQL syntax? And how would all those DBs adapt? With yet another interface? And if you allow me to add, I consider all your examples kinda unpythonic.
![](https://secure.gravatar.com/avatar/b60683e6dacd4556ae15430a7c9b6f13.jpg?s=120&d=mm&r=g)
--- Stargaming <stargaming@gmail.com> wrote:
Are there any non-SQL use cases for SQL syntax?
Isn't that kind of a silly question? Of course not. SQL syntax only applies to relational data. Of course, such data doesn't have to exist in a "database" for the syntax to be useful. It just has to be relational. I would ask the opposite question--are there any cases where you manipulate basically relational data without the power of SQL? There are many answers there, and one them is Python. ____________________________________________________________________________________ Park yourself in front of a world of choices in alternative vehicles. Visit the Yahoo! Auto Green Center. http://autos.yahoo.com/green_center/
![](https://secure.gravatar.com/avatar/b60683e6dacd4556ae15430a7c9b6f13.jpg?s=120&d=mm&r=g)
--- Stargaming <stargaming@gmail.com> wrote:
--- Aahz <aahz@pythoncraft.com> wrote:
What's wrong with sqlite?
Nothing, but I want Python itself to give me the SQL syntax to manipulate lists-of-dictionaries like
Steve Howell schrieb: lists
of dictionaries, without having to go through some a module.
And how would all those DBs adapt? With yet another interface?
Where did I ever propose eliminating the interfaces that DB adapters already have? Or where did I propose any requirement to interface to the SQL syntax? The DB adapters would continue to move data from external/local databases to lists-of-dictionaries, objects, etc., as they do now. I'm just saying that once you pull data into Python, you should continue to have the option to manipulate relational data with SQL. Presumbably most people who use SQL databases already have to understand the syntax anyway. And those that don't can simply ignore it. ____________________________________________________________________________________ Moody friends. Drama queens. Your life? Nope! - their life, your story. Play Sims Stories at Yahoo! Games. http://sims.yahoo.com/
![](https://secure.gravatar.com/avatar/144a61ea3d5d30f93e5c67e54fe51d63.jpg?s=120&d=mm&r=g)
Steve Howell wrote:
Presumbably most people who use SQL databases already have to understand the syntax anyway. And those that don't can simply ignore it.
That simply will never be true. If there is new syntax added to Python, than the burden is on everyone to understand other people's code that uses such a feature. I'm very familiar with SQL and I am still not convinced that I want to see it in my Python. First of all, the concept of a relational table is ill-defined. I gather from your code that you believe this to be a list containing dictionaries, but why could this not be a tuple of tuples? For that matter, any iterable of iterables? Or how about using tuple'd indices, and so on.. It would be impossible to graft the SQL language onto all of the variations on the idea of a "table" in Python. Python does not have a relational table data structure, thus there cannot be a relational language used to address such a structure. It would not be wise to create a syntax that relies on a particular composition of data structures. -- Scott Dial scott@scottdial.com scodial@cs.indiana.edu
![](https://secure.gravatar.com/avatar/b60683e6dacd4556ae15430a7c9b6f13.jpg?s=120&d=mm&r=g)
--- Scott Dial <scott+python-ideas@scottdial.com> wrote:
Steve Howell wrote:
Presumbably most people who use SQL databases already have to understand the syntax anyway. And those that don't can simply ignore it.
That simply will never be true. If there is new syntax added to Python, than the burden is on everyone to understand other people's code that uses such a feature. I'm very familiar with SQL and I am still not convinced that I want to see it in my Python.
I fully concede to working under the premise that most Python programmers know SQL, and you're a perfect example, I guess, but I realize there are exceptions, and I don't deny that as a tradeoff, just like I don't deny that any expansion of a language's functionality reduces comprehension for average users. Written any decorators lately?
First of all, the concept of a relational table is ill-defined.
SQL's been around for at least 20 years, as has the concept of a relational table. So it's not exactly ill-defined. Start here, if you disagree: http://en.wikipedia.org/wiki/Relational_model
I gather from your code that you believe this to be a list containing dictionaries, but why could this not be a tuple of tuples?
In the code that I posted, I don't "believe" that I'm dealing with a list of dictionaries, I "know" because I wrote it. You can check it out yourself. But I agree with your implied premise that the Python interpreter doesn't know what it's dealing with, and I fully accept in my proposal that Python, not some third-party library, would properly throw an exception when it found my data not to fit the relational model, just as generator expressions sometimes throw run-time expressions, just as lambdas throw run-time expressions, etc. I'm ok with that.
For that matter, any iterable of iterables? Or how about using tuple'd indices, and so on.. It would be impossible to graft the SQL language onto all of the variations on the idea of a "table" in Python.
I'm not proposing to graft SQL on to all variations of the "table" in Python, just the ones that conform to a relational model.
Python does not have a relational table data structure, thus there cannot be a relational language used to address such a structure.
Python certainly expresses data structures that don't fit the relational model, but please explain to me how a list of dictionaries doesn't fit the relational model. And don't bore me with the explanation that dictionaries can be heterogeneous; I'd just have Python throw an exception there. ____________________________________________________________________________________ Expecting? Get great news right away with email Auto-Check. Try the Yahoo! Mail Beta. http://advision.webevents.yahoo.com/mailbeta/newmail_tools.html
![](https://secure.gravatar.com/avatar/49e6a1325c24b53539004a0b7984bf55.jpg?s=120&d=mm&r=g)
On 26 May 2007, at 12.06, Steve Howell wrote:
Here is the Python code that I wrote:
def groupDictBy(lst, keyField): dct = {} for item in lst: keyValue = item[keyField] if keyValue not in dct: dct[keyValue] = [] dct[keyValue].append(item) return dct
I think this is basically equivalent to itertools.groupby. You could do: for name, events in itertools.groupby(events, lambda e: e ['name']): ... Untested, but I'm pretty sure that's what you're doing here. The main difference is that it returns an iterable of 2-tuples instead of a dictionary (but you can just pass it straight to dict() if need be), and that it requires you to pass it a key function (here a simple lambda) instead of a dictionary key.
dct = groupDictBy(events, 'name') for name in dct: events = dct[name] charges = {} for bucket in ('setup', 'install'): charges[bucket] = sum( [float(event['charge']) for event in events if event['bucket'] == bucket])
I'm not quite sure what the intended function of this is. You group dct by the 'name' field, but for each iteration of the loop, you're setting the `charges` variable anew, basically throwing away the value produced by any previous iteration. That's not intended, is it? Second, it looks almost like the inner loop could also be an itertools.groupby call. Grouping by 'bucket'. Ignoring the previous paragraph, I think it would be something like this: for name, events in itertools.groupby(events, lambda e: e['name']): charges = dict([(bucket, sum([float(event['charge']) for event in v])) for bucket, v in itertools.groupby(events, lambda e: e ['bucket'])]) And of course if you meant to have 'charges' be an array or a dict or something, with a value for each iteration of the outer loop, then this could be made into a one-liner by changing the for loop into a list comprehension. It's not as terse as SQL, but it gets the job done.
Comments are welcome on improving the code itself, but I wonder if Python 3k (or 4k?) couldn't have some kind of native SQL-like ways of manipulating lists and dictionaries. I don't have a proposal myself, just wonder if others have felt this kind of pain, and maybe it will spark a Pythonic solution.
I don't really have an opinion for now, but if this does happen, I think the best way would be to have a new 'table' type supporting operations like this. (I think a stdlib module would do; I don't think this is quite important enough to warrant a global builtin type. And filling `dict` itself with these methods could get cluttersome.) I started writing a paragraph here about what types of methods it could have, returning various types of iterators, but then I realized that I was basically describing SQLAlchemy. So I think we shouldn't reinvent the wheel there; it would be an interesting exercise to see if SQLAlchemy could be made to operate on native Python data structures, though.
participants (6)
-
Aahz
-
Adam Atlas
-
Arnaud Delobelle
-
Scott Dial
-
Stargaming
-
Steve Howell