[Tutor] python: extracting nested json object from multiple files, write to separate text files
Gary LaRose
garylarose at outlook.com
Fri Oct 4 14:32:36 EDT 2019
Thank you Cameron, this works nicely - and thanks for pointing me to os.pathsplitext and repr functions
The 'with open...as' ran faster on my local machine.
Best regards
-----Original Message-----
From: Cameron Simpson <cs at cskk.id.au>
Sent: October 3, 2019 7:27 PM
To: Gary LaRose <garylarose at outlook.com>
Cc: tutor at python.org
Subject: Re: [Tutor] python: extracting nested json object from multiple files, write to separate text files
On 03Oct2019 22:57, Gary LaRose <garylarose at outlook.com> wrote:
>Thank you for you guidance.
>I am attempting to extract nested json object in multiple json files and write to individual text files.
>I have been able to get a non-nested element ['text'] from the json files and write to text files using:
>
>import os, json
>import glob
>
>filelist = glob.glob('./*.json')
No need for the leading "./" here. "*.json" will do.
>for fname in filelist:
> FI = open(fname, 'r', encoding = 'UTF-8')
> FO = open(fname.replace('json', 'txt'), 'w', encoding = 'UTF-8')
Minor remark: this is not robust; consider the filename "some-json-in-here.json". Have a glance at the os.pathsplitext function.
> json_object = json.load(FI)
> FO.write(json_object['text'])
>
>FI.close()
>FO.close()
Second minor remark: these are better written:
with open(fname, 'r', encoding = 'UTF-8') as FI:
json_object = json.load(FI)
with open(fname.replace('json', 'txt'), 'w', encoding = 'UTF-8') as FO:
FO.write(json_object['text'])
which do the closes for you (even if an exception happens).
>I have set the working directory to the folder that contains the json files.
>Below is example json file. For each file (2,900), I need to extract 'entities' and write to a separate text file:
>
>{'author': 'Reuters Editorial',
>'crawled': '2018-02-02T12:58:39.000+02:00',
>'entities': {'locations': [{'name': 'sweden', 'sentiment': 'none'},
> {'name': 'sweden', 'sentiment': 'none'},
> {'name': 'gothenburg', 'sentiment': 'none'}],
> 'organizations': [{'name': 'reuters', 'sentiment': 'negative'},
> {'name': 'skanska ab', 'sentiment': 'negative'},
> {'name': 'eikon', 'sentiment': 'none'}],
> 'persons': [{'name': 'anna ringstrom', 'sentiment':
>'none'}]},
[...]
Well, the entities come in from the JSON as a dictionary mapping str to list. Thus:
entities = json_object['entities']
FOr example, with the example data above, the expression entities['locations'] has the value:
[
{'name': 'sweden', 'sentiment': 'none'},
{'name': 'sweden', 'sentiment': 'none'},
{'name': 'gothenburg', 'sentiment': 'none'}
]
Which is just a list of dictionaries. You just need to access whatever you need as required. When you went:
FO.write(json_object['text'])
that has the advantage that json_object['text'] is a simple string. If you need to write out the values from entities then you _likely_ want to print it in some more meaningful way. However, just to get off the ground you would go:
FO.write(repr(entities))
as a proff of concept. When happy, write something more elaborate to get the actual output format you desire.
Cheers,
Cameron Simpson <cs at cskk.id.au>
More information about the Tutor
mailing list