Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API

Left Right olegsivokon at gmail.com
Mon Sep 30 15:30:06 EDT 2024


> Streaming won't work because the file is gzipped.  You have to receive
> the whole thing before you can unzip it. Once unzipped it will be even
> larger, and all in memory.

GZip is specifically designed to be streamed.  So, that's not a
problem (in principle), but you would need to have a streaming GZip
parser, quick search in PyPI revealed this package:
https://pypi.org/project/gzip-stream/ .

On Mon, Sep 30, 2024 at 6:20 PM Thomas Passin via Python-list
<python-list at python.org> wrote:
>
> On 9/30/2024 11:30 AM, Barry via Python-list wrote:
> >
> >
> >> On 30 Sep 2024, at 06:52, Abdur-Rahmaan Janhangeer via Python-list <python-list at python.org> wrote:
> >>
> >>
> >> import polars as pl
> >> pl.read_json("file.json")
> >>
> >>
> >
> > This is not going to work unless the computer has a lot more the 60GiB of RAM.
> >
> > As later suggested a streaming parser is required.
>
> Streaming won't work because the file is gzipped.  You have to receive
> the whole thing before you can unzip it. Once unzipped it will be even
> larger, and all in memory.
> --
> https://mail.python.org/mailman/listinfo/python-list


More information about the Python-list mailing list