Mailman 3 REG: Scope of a better API for JSON parser - Python-ideas

REG: Scope of a better API for JSON parser

Anoop Thomas Mathew

5 Apr 2014 5 Apr '14

7:58 p.m.

I went through the state of cpython JSON parser, and found that there is a scope for improvement in the parser interface. I have been thinking of implementing a class called QuickParse inside json module, which will tidy up the process of finding node, and *cuts down a number of for loops, if .. else and try ... except blocks.* For eg. sample_json = { 'name' : 'John Doe', 'age' : 45, 'level': 'Expert', 'languages': [ 'C', 'C++', 'Python', 'Clojure' ], 'account details': { 'account number': 1233312312, 'account balance': 1000000000.00 } } sample_json can be a file or string *json_doc = QuickParse(sample_json)* *json_doc.get(["name"])* 'John Doe' *json_doc.get(["languages", 2])* 'C++' *json_doc.get(["account details", "account balance"])* 1000000000.00 *json_doc.get(["notpresent"]) * None This is something I've been using in many number of projects, due to the complexity of the json documents these days. Also, there is a plan to provide option to use functions for iterating over the document, as well as selection ranges for expected values. Looking forward to hear everyone's views on this. regards, ATM

Attachments:

attachment.htm (text/html — 6.4 KB)

Show replies by date

Andrew Barnert

5 Apr 5 Apr

10:26 p.m.

From: Anoop Thomas Mathew Sent: Saturday, April 5, 2014 7:58 PM

...

I went through the state of cpython JSON parser, and found that there is a scope for improvement in the parser interface.

...

I have been thinking of implementing a class called QuickParse inside json module, which will tidy up the process of finding node, and cuts down a number of for loops, if .. else and try ... except blocks.

For eg.

sample_json = { 'name' : 'John Doe', 'age' : 45, 'level': 'Expert', 'languages': [ 'C', 'C++', 'Python', 'Clojure' ], 'account details': { 'account number': 1233312312, 'account balance': 1000000000.00 } }

That's not a JSON object, that's just a plain old Python dict. It doesn't even have anything to do with JSON (except the variable name, which is misleading). And you don't need to "parse" a dict; you just use [] or get on it. Even if you actually had a JSON string or file serializing this dict, you could get exactly the dict by just calling json.loads or json.load on it. It's hard to imagine anything simpler than that. Compare your desired code to the equivalents that already work today:

...

json_doc = QuickParse(sample_json)

json_doc.get(["name"]) 'John Doe'

sample_json['name'] ... or ... sample_json.get('name') Note that, besides the simple and more flexible access, you already didn't need that extra step of building a QuickParse object.

...

json_doc.get(["languages", 2])

...

'C++'

sample_json['languages'][2]

...

json_doc.get(["account details", "account balance"]) 1000000000.00

sample_json['account details']['account balance']

...

json_doc.get(["notpresent"]) None

sample_json.get('notpresent')

...

This is something I've been using in many number of projects, due to the complexity of the json documents these days. Also, there is a plan to provide option to use functions for iterating over the document, as well as selection ranges for expected values.

You can already iterate over a dict, and again, I can't imagine how anything you design will be simpler than what you can already do. Also, at the start, you suggested that this would be an improvement to the JSON _parser_. But the stuff you showed has nothing to do with the parser. Unless the result of calling QuickParse is not JSON parsed into something dict-like, but rather something that holds the original JSON and re-parses it for each "get" request? I'm going to assume that the whole bit about parsing is a red herring, and the part you actually want is the ability to call the "get" function with multiple keys and recursively walk the structure. I'm sure you can find recipes for that on ActiveState or PyPI, but it's also very trivial to write yourself: def get(obj, *keys, default=None): for key in keys: try: obj = obj[key] except LookupError: return default return obj Now, without needing any kind of special object, you can write this: get(sample_json, 'languages', 2)

Anoop Thomas Mathew

6 Apr 6 Apr

2:55 a.m.

On 6 April 2014 10:56, Andrew Barnert wrote:

...

From: Anoop Thomas Mathew Sent: Saturday, April 5, 2014 7:58 PM

...
I went through the state of cpython JSON parser, and found that there is a scope for improvement in the parser interface.

...
I have been thinking of implementing a class called QuickParse inside json module, which will tidy up the process of finding node, and cuts down a number of for loops, if .. else and try ... except blocks.

For eg.

sample_json = { 'name' : 'John Doe', 'age' : 45, 'level': 'Expert', 'languages': [ 'C', 'C++', 'Python', 'Clojure' ], 'account details': { 'account number': 1233312312, 'account balance': 1000000000.00 } }

That's not a JSON object, that's just a plain old Python dict. It doesn't even have anything to do with JSON (except the variable name, which is misleading).

And you don't need to "parse" a dict; you just use [] or get on it.

Even if you actually had a JSON string or file serializing this dict, you could get exactly the dict by just calling json.loads or json.load on it. It's hard to imagine anything simpler than that.

Compare your desired code to the equivalents that already work today:

...
json_doc = QuickParse(sample_json)

json_doc.get(["name"]) 'John Doe'

sample_json['name'] ... or ... sample_json.get('name')

Note that, besides the simple and more flexible access, you already didn't need that extra step of building a QuickParse object.

...
json_doc.get(["languages", 2])

...
'C++'

sample_json['languages'][2]

...
json_doc.get(["account details", "account balance"]) 1000000000.00

sample_json['account details']['account balance']

...
json_doc.get(["notpresent"]) None

sample_json.get('notpresent')

I completely agree to your point. But what if the document is of something like this: person['country']['state']['city']['county']

...

...
This is something I've been using in many number of projects, due to the complexity of the json documents these days. Also, there is a plan to provide option to use functions for iterating over the document, as well as selection ranges for expected values.

You can already iterate over a dict, and again, I can't imagine how anything you design will be simpler than what you can already do.

Also, at the start, you suggested that this would be an improvement to the JSON _parser_. But the stuff you showed has nothing to do with the parser. Unless the result of calling QuickParse is not JSON parsed into something dict-like, but rather something that holds the original JSON and re-parses it for each "get" request?

I'm going to assume that the whole bit about parsing is a red herring, and the part you actually want is the ability to call the "get" function with multiple keys and recursively walk the structure. I'm sure you can find recipes for that on ActiveState or PyPI, but it's also very trivial to write yourself:

...

def get(obj, *keys, default=None): for key in keys: try: obj = obj[key] except LookupError: return default return obj

I apologize for making the previous mail complicated. You have pinned down what I was suggesting. My suggestion is to integrate something similar as you suggested above to the cpython json module. Won't that be a good feature for developers to be able to write code without worrying if it going to fail and writing may try.. except or if else to handle those edge cases? I definitely feel that it would be a great feature which is common for any program which is parsing multilevel json documents.

...

Now, without needing any kind of special object, you can write this:

...

get(sample_json, 'languages', 2)

Andrew Barnert

3:33 a.m.

From: Anoop Thomas Mathew Sent: Sunday, April 6, 2014 2:55 AM

...

On 6 April 2014 10:56, Andrew Barnert wrote:

From: Anoop Thomas Mathew

[snip]

...

...
...
json_doc.get(["account details", "account balance"])

...

...
...
1000000000.00

sample_json['account details']['account balance']

I completely agree to your point. But what if the document is of something like this:

person['country']['state']['city']['county']

So what? Do you not understand that? It's simple, explicit, readable, and identical to code you could write in other languages like JavaScript, Ruby, etc. This is a familiar, easily-recognizable pattern from code dealing with complex heterogeneous objects in all of these languages. Do you really want to do away with that for the sake of saving a few keystrokes? [snip]

...

...
I'm going to assume that the whole bit about parsing is a red herring, and the part you actually want is the ability to call the "get" function with multiple keys and recursively walk the structure. I'm sure you can find recipes for that on ActiveState or PyPI, but it's also very trivial to write yourself:

...

...
def get(obj, *keys, default=None): for key in keys: try: obj = obj[key] except LookupError: return default return obj

I apologize for making the previous mail complicated. You have pinned down what I was suggesting.

My suggestion is to integrate something similar as you suggested above to the cpython json module.

Again: Why would you put it in the json module? You don't have any JSON anything anywhere; you have a dict with dicts inside it. If you want a simple function that works on any indexable object full of indexable objects, that function will work the same way whether you give it something you parsed from a JSON string you pulled off the wire, something you built up yourself out of database queries, a multi-dimensional array, a tree, etc.

...

Won't that be a good feature for developers to be able to write code without worrying if it going to fail and writing may try.. except or if else to handle those edge cases?

No, because Python already has features for that. First, you don't need many try/excepts, just a single one, no matter how many possibly-raising lookups you're doing: >>> try: ... val = d['stuff']['things'][7] ... except LookupError: ... val = None It doesn't matter whether d['stuff'] was missing, or d['stuff']['things'], or d['stuff']['things'][7]; they'll all be handled by a single try. And if you don't want to worry about try/except (or if/else, but you really shouldn't ever be using if/else for cases like this), dict already has a get method: >>> d = {'a': 1, 'b': 2} >>> d['c'] KeyError: 'c' >>> d.get('c') None >>> d.get('c', 0) 0 Also, different projects are likely to need different variations on the same theme. Sometimes you want a Cocoa-KV-style dotted string of keys, or a Windows-registry-style slashed string, instead of an iterable. Sometimes you only want to handle recursive mappings or recursive sequences, sometimes recursive mappings-and-sequences. Sometimes you want to return a default value other than None, maybe even one that's dependent on where the lookup failed. Sometimes you want to look up multiple key-paths at once ala operator.itemgetter. And so on. All of these are dead-simple to write yourself, so any project that needs some version of this should just include the version that it needs.

3670

Age (days ago)

3670

Last active (days ago)

List overview

Download

3 comments

2 participants

participants (2)

Andrew Barnert
Anoop Thomas Mathew

REG: Scope of a better API for JSON parser

Anoop Thomas Mathew

Andrew Barnert

Anoop Thomas Mathew

Andrew Barnert

tags

participants (2)