To write headers once with different values in separate row in CSV
kbtyo
ahlusar.ahluwalia at gmail.com
Thu Jun 25 12:37:12 EDT 2015
Okay, so I have gone back to the drawing board and have the following predicament (my apologies, in advance for the indentation):
Here is my sample:
<Response ID="24856-775" RequestType="Moverview">
<MonthDayCount>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
</MonthDayCount>
<FeeCount>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
<Int32>0</Int32>
</FeeCount>
<PaymentBucketAmount>
<Double>0</Double>
<Double>0</Double>
<Double>0</Double>
<Double>0</Double>
<Double>0</Double>
<Double>0</Double>
<Double>0</Double>
<Double>0</Double>
</PaymentBucketAmount>
<PaymentBucketDueDate>
<DateTime>1/1/0001 12:00:00 AM</DateTime>
<DateTime>1/1/0001 12:00:00 AM</DateTime>
<DateTime>1/1/0001 12:00:00 AM</DateTime>
<DateTime>1/1/0001 12:00:00 AM</DateTime>
<DateTime>1/1/0001 12:00:00 AM</DateTime>
<DateTime>1/1/0001 12:00:00 AM</DateTime>
<DateTime>1/1/0001 12:00:00 AM</DateTime>
<DateTime>1/1/0001 12:00:00 AM</DateTime>
</PaymentBucketDueDate>
<Warnings />
<SList />
<LList />
<PA>False</PA>
<PL>False</PL>
<PC>False</PC>
<PCs>False</PCs>
<PJ>False</PJ>
<OITC>0</OITC>
<MG />
<R />
<CCGoods />
</Response>
Using this:
import xml.etree.cElementTree as ElementTree
from xml.etree.ElementTree import XMLParser
import csv
def flatten_list(aList, prefix=''):
for i, element in enumerate(aList, 1):
eprefix = "{}{}".format(prefix, i)
if element:
# treat like dict
if len(element) == 1 or element[0].tag != element[1].tag:
yield from flatten_dict(element, eprefix)
# treat like list
elif element[0].tag == element[1].tag:
yield from flatten_list(element, eprefix)
elif element.text:
text = element.text.strip()
if text:
yield eprefix[:].rstrip('.'), element.text
def flatten_dict(parent_element, prefix=''):
prefix = prefix + parent_element.tag
if parent_element.items():
for k, v in parent_element.items():
yield prefix + k, v
for element in parent_element:
eprefix = prefix + element.tag
if element:
# treat like dict - we assume that if the first two tags
# in a series are different, then they are all different.
if len(element) == 1 or element[0].tag != element[1].tag:
yield from flatten_dict(element, prefix=prefix)
# treat like list - we assume that if the first two tags
# in a series are the same, then the rest are the same.
else:
# here, we put the list in dictionary; the key is the
# tag name the list elements all share in common, and
# the value is the list itself
yield from flatten_list(element, prefix=eprefix)
# if the tag has attributes, add those to the dict
if element.items():
for k, v in element.items():
yield eprefix+k
# this assumes that if you've got an attribute in a tag,
# you won't be having any text. This may or may not be a
# good idea -- time will tell. It works for the way we are
# currently doing XML configuration files...
elif element.items():
for k, v in element.items():
yield eprefix+k
# finally, if there are no child tags and no attributes, extract
# the text
else:
yield eprefix, element.text
def makerows(pairs):
headers = []
columns = {}
for k, v in pairs:
if k in columns:
columns[k].extend((v,))
else:
headers.append(k)
columns[k] = [k, v]
m = max(len(c) for c in columns.values())
for c in columns.values():
c.extend(' ' for i in range(len(c), m))
L = [columns[k] for k in headers]
rows = list(zip(*L))
return rows
def main():
with open('sample.xml', 'r', encoding='utf-8') as f:
xml_string = f.read()
xml_string= xml_string.replace('�', '') #optional to remove ampersands.
root = ElementTree.XML(xml_string)
for key, value in flatten_dict(root):
key = key.rstrip('.').rsplit('.', 1)[-1]
print(key,value)
if __name__ == "__main__":
main()
I receive this output:
ResponseRequestType Moverview
ResponseID 24856-775
ResponseMonthDayCount1 0
ResponseMonthDayCount2 0
ResponseMonthDayCount3 0
ResponseMonthDayCount4 0
ResponseMonthDayCount5 0
ResponseMonthDayCount6 0
ResponseMonthDayCount7 0
ResponseMonthDayCount8 0
ResponseMonthDayCount9 0
ResponseMonthDayCount10 0
ResponseMonthDayCount11 0
ResponseMonthDayCount12 0
ResponseMonthDayCount13 0
ResponseMonthDayCount14 0
ResponseMonthDayCount15 0
ResponseMonthDayCount16 0
ResponseMonthDayCount17 0
ResponseMonthDayCount18 0
ResponseMonthDayCount19 0
ResponseMonthDayCount20 0
ResponseMonthDayCount21 0
ResponseMonthDayCount22 0
ResponseMonthDayCount23 0
ResponseMonthDayCount24 0
ResponseMonthDayCount25 0
ResponseFeeCount1 0
ResponseFeeCount2 0
ResponseFeeCount3 0
ResponseFeeCount4 0
ResponseFeeCount5 0
ResponseFeeCount6 0
ResponsePaymentBucketAmount1 0
ResponsePaymentBucketAmount2 0
ResponsePaymentBucketAmount3 0
ResponsePaymentBucketAmount4 0
ResponsePaymentBucketAmount5 0
ResponsePaymentBucketAmount6 0
ResponsePaymentBucketAmount7 0
ResponsePaymentBucketAmount8 0
ResponsePaymentBucketDueDate1 1/1/0001 12:00:00 AM
ResponsePaymentBucketDueDate2 1/1/0001 12:00:00 AM
ResponsePaymentBucketDueDate3 1/1/0001 12:00:00 AM
ResponsePaymentBucketDueDate4 1/1/0001 12:00:00 AM
ResponsePaymentBucketDueDate5 1/1/0001 12:00:00 AM
ResponsePaymentBucketDueDate6 1/1/0001 12:00:00 AM
ResponsePaymentBucketDueDate7 1/1/0001 12:00:00 AM
ResponsePaymentBucketDueDate8 1/1/0001 12:00:00 AM
ResponseWarnings None
ResponseSList None
ResponseLList None
ResponsePA False
ResponsePL False
ResponsePC False
ResponsePCs False
ResponsePJ False
ResponseOITC 0
ResponseMG None
ResponseR None
ResponseCCGoods None
When I write it out to the CSV, using :
writer = csv.writer(open("try2.csv", 'wt'))
writer.writerows(makerows(flatten_dict(root)))`
I still receive the headers with the **Response** chained to the sub-elements of the root with the tags' text as the values (which is just fine). My goal is to only have the sublements as the headers (along with their values), sans the Response (unless it is the tag name). Thanks you all for your feedback! For example, ResponsePaymentBucketAmount1 0, should just be PaymentBucketAmount1 as the header.
More information about the Python-list
mailing list