JSON Object to CSV File Troubleshooting
Sahlusar
ahlusar.ahluwalia at gmail.com
Thu Jun 18 21:47:30 EDT 2015
Good Evening,
I have a conundrum regarding JSON objects and converting them to CSV:
Context
I am converting XML files to a JSON object (please see snippet below) and then finally producing a CSV file. Here is a an example JSON object:
"PAC": {
"Account": [{
"PC": "0",
"CMC": "0",
"WC": "0",
"DLA": "0",
"CN": null,
"FC": {
"Int32": ["0",
"0",
"0",
"0",
"0"]
},
"F": {
"Description": null,
"Code": "0"
}
In general, when I convert any of the files from JSON to CSV, I have been successful when using the following strategy (credit to Peter Otten):
import csv
import json
import sys
def hook(obj):
return obj
def flatten(obj):
for k, v in obj:
if isinstance(v, list):
yield from flatten(v)
else:
yield k, v
if __name__ == "__main__":
with open("somefileneame.json") as f:
data = json.load(f, object_pairs_hook=hook)
pairs = list(flatten(data))
writer = csv.writer(sys.stdout)
header = writer.writerow([k for k, v in pairs])
row = writer.writerow([v for k, v in pairs]) #writer.writerows for any other iterable object
However with the example JSON object (above) i receive the following error when applying this function:
ValueError: too many values to unpack
Here are some more samples.
"FC": {"Int32": ["0","0","0","0","0","0"]}
"PBA": {"Double": ["0","0","0","0","0","0","0","0"]}
3. "PBDD": {
"DateTime": ["1/1/0001 12:00:00 AM",
"1/1/0001 12:00:00 AM",
"1/1/0001 12:00:00 AM",
"1/1/0001 12:00:00 AM",
"1/1/0001 12:00:00 AM",
"1/1/0001 12:00:00 AM",
"1/1/0001 12:00:00 AM",
"1/1/0001 12:00:00 AM"]
},
In the above example, I would like to remove the keys Int32, Double and DateTime. I am wondering if there is a function or methodology that would allow me to remove such nested keys and reassign the new keys to the outer key (in this case above FC, PBA and PBDD) as column headers in a CSV and concatenate all of the values within the list (as corresponding fields).
Also, here is how I strategized my XML to CSV conversion (if this is of any use):
import xml.etree.cElementTree as ElementTree
from xml.etree.ElementTree import XMLParser
import json
import csv
import tokenize
import token
try:
from collections import OrderedDict
import json
except ImportError:
from ordereddict import OrderedDict
import simplejson as json
import itertools
import six
import string
from csvkit import CSVKitWriter
class XmlListConfig(list):
def __init__(self, aList):
for element in aList:
if element:
# treat like dict
if len(element) == 1 or element[0].tag != element[1].tag:
self.append(XmlDictConfig(element))
# treat like list
elif element[0].tag == element[1].tag:
self.append(XmlListConfig(element))
elif element.text:
text = element.text.strip()
if text:
self.append(text)
class XmlDictConfig(dict):
'''
Example usage:
>>> tree = ElementTree.parse('your_file.xml')
>>> root = tree.getroot()
>>> xmldict = XmlDictConfig(root)
Or, if you want to use an XML string:
>>> root = ElementTree.XML(xml_string)
>>> xmldict = XmlDictConfig(root)
And then use xmldict for what it is..a dictionary.
'''
def __init__(self, parent_element):
if parent_element.items():
self.update(dict(parent_element.items()))
for element in parent_element:
if element:
# treat like dict - we assume that if the first two tags
# in a series are different, then they are all different.
if len(element) == 1 or element[0].tag != element[1].tag:
aDict = XmlDictConfig(element)
# treat like list - we assume that if the first two tags
# in a series are the same, then the rest are the same.
else:
# here, we put the list in dictionary; the key is the
# tag name the list elements all share in common, and
# the value is the list itself
aDict = {element[0].tag: XmlListConfig(element)}
# if the tag has attributes, add those to the dict
if element.items():
aDict.update(dict(element.items()))
self.update({element.tag: aDict})
# this assumes that if you've got an attribute in a tag,
# you won't be having any text. This may or may not be a
# good idea -- time will tell. It works for the way we are
# currently doing XML configuration files...
elif element.items():
self.update({element.tag: dict(element.items())})
# finally, if there are no child tags and no attributes, extract
# the text
else:
self.update({element.tag: element.text})
def main():
#Lines 88-89stantiate the class Elementree
#and applies the method to recursively traverse from the root node
#XmlDictConfig is instantiated in line 90
with open('C:\\Users\\wynsa2\\Desktop\\Python Folder\\PCSU\\Trial2_PCSU\\2-Response.xml', 'r', encoding='utf-8') as f:
xml_string = f.read()
xml_string= xml_string.replace('�', '')
root = ElementTree.XML(xml_string)
xmldict = XmlDictConfig(root)
json_str = json.dumps(xmldict, sort_keys=True, indent=4, separators=(',', ': '))
newly_formatted_data = json.loads(json_str) #encode into JSON
with open('data2.json', 'w') as f: #writing JSON file
json.dump(newly_formatted_data, f)
I hope that I was clear in my description. Thank you all for your help.
Sincerely,
Saran
More information about the Python-list
mailing list