Add pathlib.Path.write_json and pathlib.Path.read_json
Hi guys, What do you think about adding methods pathlib.Path.write_json and pathlib.Path.read_json , similar to write_text, write_bytes, read_text, read_bytes? This would make writing / reading JSON to a file a one liner instead of a two-line with clause. Thanks, Ram.
Oh, and also it saves you from having to import json.
On Mon, Mar 27, 2017 at 2:50 PM, Ram Rachum
Hi guys,
What do you think about adding methods pathlib.Path.write_json and pathlib.Path.read_json , similar to write_text, write_bytes, read_text, read_bytes?
This would make writing / reading JSON to a file a one liner instead of a two-line with clause.
Thanks, Ram.
It was enough of a benefit for text (and I never forget the argument order for writing text to a file, unlike json.dump(file_or_data?, data_or_file?) ) +1 Top-posted from my Windows Phone -----Original Message----- From: "Paul Moore"
This would make writing / reading JSON to a file a one liner instead of a two-line with clause.
That hardly seems like a significant benefit... Paul _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-1, should we also include write_ini, write_yaml, etc?
A class cannot account for everyone who wants to use it in different ways.
On Mar 27, 2017 17:07, "Steve Dower"
It was enough of a benefit for text (and I never forget the argument order for writing text to a file, unlike json.dump(file_or_data?, data_or_file?) )
+1
Top-posted from my Windows Phone ------------------------------ From: Paul Moore
Sent: 3/27/2017 5:57 To: Ram Rachum Cc: python-ideas Subject: Re: [Python-ideas] Add pathlib.Path.write_json andpathlib.Path.read_json On 27 March 2017 at 13:50, Ram Rachum
wrote: This would make writing / reading JSON to a file a one liner instead of a two-line with clause.
That hardly seems like a significant benefit...
Paul _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On 27 Mar 2017, at 15:08, Markus Meskanen
wrote: -1, should we also include write_ini, write_yaml, etc?
Markus, You illustrate why this is a bad design pattern to implement. It does not scale. I attended a talk at PYCON UK that talked to the point of using object composition rather then rich interfaces. I cannot recall the term that was used to cover this idea. I also think that its a mistake to open a text file from pathlib. -1 A pattern that allows pathlib.Path to be composed with content handling is an interesting idea. Maybe that should be explored? But that should be a separate topic. Barry
A class cannot account for everyone who wants to use it in different ways.
On Mar 27, 2017 17:07, "Steve Dower"
mailto:steve.dower@python.org> wrote: It was enough of a benefit for text (and I never forget the argument order for writing text to a file, unlike json.dump(file_or_data?, data_or_file?) ) +1
Top-posted from my Windows Phone From: Paul Moore mailto:p.f.moore@gmail.com Sent: 3/27/2017 5:57 To: Ram Rachum mailto:ram@rachum.com Cc: python-ideas mailto:python-ideas@python.org Subject: Re: [Python-ideas] Add pathlib.Path.write_json andpathlib.Path.read_json
On 27 March 2017 at 13:50, Ram Rachum
mailto:ram@rachum.com> wrote: This would make writing / reading JSON to a file a one liner instead of a two-line with clause.
That hardly seems like a significant benefit...
Paul _______________________________________________ Python-ideas mailing list Python-ideas@python.org mailto:Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list Python-ideas@python.org mailto:Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
I attended a talk at PYCON UK that talked to the point of using object composition rather then rich interfaces. I cannot recall the term that was used to cover this idea.
Separating things by concern/abstraction (the storage vs. the serialization) results in easier-to-learn code, *especially* incrementally, as you can (for example) plug reading from a file, a socket, a database into the same JSON, INI, XML... functions. Learn N ways to read data, M ways to transform the data, and you can do N*M things with N+M knowledge. If the libraries start tightly coupling everything, you need to start going through N*M methods, then do it yourself anyways, because reader X doesn't support new-hotness-format Y directly. Perhaps less code could result from making objects "quack" alike, so instead of you doing the plumbing, the libraries themselves would. I recently was satisfied by being able to exchange with open('dump.txt') as f: for line in f:... with import gzip with gzip.open('dump.gz', 'rt') as f: for line in f:... and it just worked through the magic of file-like objects and context managers. Nick
On Mar 27, 2017, at 8:50 AM, Ram Rachum
wrote: What do you think about adding methods pathlib.Path.write_json and pathlib.Path.read_json , similar to write_text, write_bytes, read_text, read_bytes?
-1, I also think that write_* and read_* were mistakes to begin with. — Donald Stufft
On 27 March 2017 at 15:33, Donald Stufft
What do you think about adding methods pathlib.Path.write_json and pathlib.Path.read_json , similar to write_text, write_bytes, read_text, read_bytes?
-1, I also think that write_* and read_* were mistakes to begin with.
Text is (much) more general-use than JSON. Paul
Another idea: Maybe make json.load and json.dump support Path objects? On Mon, Mar 27, 2017 at 4:36 PM, Paul Moore
On 27 March 2017 at 15:33, Donald Stufft
wrote: What do you think about adding methods pathlib.Path.write_json and pathlib.Path.read_json , similar to write_text, write_bytes, read_text, read_bytes?
-1, I also think that write_* and read_* were mistakes to begin with.
Text is (much) more general-use than JSON. Paul
Another idea: Maybe make json.load and json.dump support Path objects? Much better. Or maybe add json.load_path and dump_path
On 3/27/17 10:40 AM, Ram Rachum wrote:
Another idea: Maybe make json.load and json.dump support Path objects?
json.dump requires open file objects, not strings or Paths representing filenames. But does this not already do what you want: Path('foo.json').write_text(json.dumps(obj)) ? Eric.
On 27 March 2017 at 15:48, Eric V. Smith
On 3/27/17 10:40 AM, Ram Rachum wrote:
Another idea: Maybe make json.load and json.dump support Path objects?
json.dump requires open file objects, not strings or Paths representing filenames.
But does this not already do what you want:
Path('foo.json').write_text(json.dumps(obj)) ?
Indeed. There have now been a few posts quoting ways of reading and writing JSON, all of which are pretty short (if that matters). Do we *really* need another way? Paul
On 27 March 2017 at 15:40, Ram Rachum
Another idea: Maybe make json.load and json.dump support Path objects?
If they currently supported filenames, I'd say that's a reasonable extension. Given that they don't, it still seems like more effort than it's worth to save a few characters with path.open('w'): json.dump(obj, f) with path.open() as f: obj = json.load(f) Paul
On Mon, Mar 27, 2017 at 7:59 AM, Paul Moore
On 27 March 2017 at 15:40, Ram Rachum
wrote: Another idea: Maybe make json.load and json.dump support Path objects?
If they currently supported filenames, I'd say that's a reasonable extension. Given that they don't, it still seems like more effort than it's worth to save a few characters
Sure, but they probably should -- it's a REALLY common (most common) use-case to read and write JSON from a file. And many APIs support "filename or open file-like object". I'd love to see that added, and, or course, support for Path objects as well. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Mon, Mar 27, 2017 at 10:34 AM, Chris Barker
On Mon, Mar 27, 2017 at 7:59 AM, Paul Moore
wrote: On 27 March 2017 at 15:40, Ram Rachum
wrote: Another idea: Maybe make json.load and json.dump support Path objects?
If they currently supported filenames, I'd say that's a reasonable extension. Given that they don't, it still seems like more effort than it's worth to save a few characters
Sure, but they probably should -- it's a REALLY common (most common) use-case to read and write JSON from a file. And many APIs support "filename or open file-like object".
I'd love to see that added, and, or course, support for Path objects as well.
https://docs.python.org/2/library/json.html#encoders-and-decoders # https://docs.python.org/2/library/json.html#json.JSONEncoder class PathJSONEncoder(json.JSONEncoder): def default(self, obj): if isinstance(obj, pathlib.Path): return unicode(obj) # ? (what about bytes) return OrderedDict(( ('@type', 'pydatatypes:pathlib.Path'), # JSON-LD ('path', unicode(obj)), ) return json.JSONEncoder.default(self, obj) # https://docs.python.org/2/library/json.html#json.JSONDecoder def as_pathlib_Path(obj): if obj.get('@type') == 'pydatatypes:pathlib.Path': return pathlib.Path(obj.get('path')) return obj def read_json(self, **kwargs): object_pairs_hook = kwargs.pop('object_pairs_hook', collections.OrderedDict) # OrderedDefaultDict object_hook = kwargs.pop('object_hook', as_pathlib_Path) encoding = kwargs.pop('encoding', 'utf8') with codecs.open(self, 'r ', encoding=encoding) as _file: return json.load(_file, object_pairs_hook=object_pairs_hook, object_hook=object_hook, **kwargs) def write_json(self, obj, **kwargs): kwargs['cls'] = kwargs.pop('cls', PathJSONEncoder) encoding = kwargs.pop('encoding', 'utf8') with codecs.open(self, 'w', encoding=encoding) as _file: return json.dump(obj, _file, **kwargs) def test_pathlib_json_encoder_decoder(): p = pathlib.Path('./test.json') obj = dict(path=p, _path=str(unicode(p))) p.write_json(obj) obj2 = p.read_json() assert obj['path'] == obj2['path'] assert isinstance(obj['path'], pathlib.Path) https://github.com/jaraco/path.py/blob/master/path.py#L735 open() bytes() chunks() write_bytes() text() def write_text(self, text, encoding=None, errors='strict', linesep=os.linesep, append=False): lines() write_lines() read_hash() read_md5() read_hexhash()
-CHB
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Mon, Mar 27, 2017 at 4:27 PM, Wes Turner
On Mon, Mar 27, 2017 at 10:34 AM, Chris Barker
wrote: On Mon, Mar 27, 2017 at 7:59 AM, Paul Moore
wrote: On 27 March 2017 at 15:40, Ram Rachum
wrote: Another idea: Maybe make json.load and json.dump support Path objects?
If they currently supported filenames, I'd say that's a reasonable extension. Given that they don't, it still seems like more effort than it's worth to save a few characters
Sure, but they probably should -- it's a REALLY common (most common) use-case to read and write JSON from a file. And many APIs support "filename or open file-like object".
I'd love to see that added, and, or course, support for Path objects as well.
https://docs.python.org/2/library/json.html#encoders-and-decoders
# https://docs.python.org/2/library/json.html#json.JSONEncoder
class PathJSONEncoder(json.JSONEncoder): def default(self, obj): if isinstance(obj, pathlib.Path): return unicode(obj) # ? (what about bytes) return OrderedDict(( ('@type', 'pydatatypes:pathlib.Path'), # JSON-LD ('path', unicode(obj)), ) return json.JSONEncoder.default(self, obj)
# https://docs.python.org/2/library/json.html#json.JSONDecoder def as_pathlib_Path(obj): if obj.get('@type') == 'pydatatypes:pathlib.Path': return pathlib.Path(obj.get('path')) return obj
def as_pathlib_Path(obj): if hasattr(obj, 'get') and obj.get('@type') == 'pydatatypes:pathlib.Path': return pathlib.Path(obj.get('path')) return obj
def read_json(self, **kwargs): object_pairs_hook = kwargs.pop('object_pairs_hook', collections.OrderedDict) # OrderedDefaultDict object_hook = kwargs.pop('object_hook', as_pathlib_Path) encoding = kwargs.pop('encoding', 'utf8') with codecs.open(self, 'r ', encoding=encoding) as _file: return json.load(_file, object_pairs_hook=object_pairs_hook, object_hook=object_hook, **kwargs)
def write_json(self, obj, **kwargs): kwargs['cls'] = kwargs.pop('cls', PathJSONEncoder) encoding = kwargs.pop('encoding', 'utf8') with codecs.open(self, 'w', encoding=encoding) as _file: return json.dump(obj, _file, **kwargs)
def test_pathlib_json_encoder_decoder(): p = pathlib.Path('./test.json') obj = dict(path=p, _path=str(unicode(p))) p.write_json(obj) obj2 = p.read_json() assert obj['path'] == obj2['path'] assert isinstance(obj['path'], pathlib.Path)
should it be 'self' or 'obj'?
https://github.com/jaraco/path.py/blob/master/path.py#L735 open() bytes() chunks() write_bytes() text() def write_text(self, text, encoding=None, errors='strict', linesep=os.linesep, append=False): lines() write_lines()
read_hash() read_md5() read_hexhash()
-CHB
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Ram Rachum
Another idea: Maybe make json.load and json.dump support Path objects?
yes, all string-path expecting stdlib APIs should support PEP 519 https://www.python.org/dev/peps/pep-0519/
On Mar 27, 2017, at 10:36 AM, Paul Moore
wrote: On 27 March 2017 at 15:33, Donald Stufft
mailto:donald@stufft.io> wrote: What do you think about adding methods pathlib.Path.write_json and pathlib.Path.read_json , similar to write_text, write_bytes, read_text, read_bytes?
-1, I also think that write_* and read_* were mistakes to begin with.
Text is (much) more general-use than JSON.
Sure. I also think touch() and all the others are the same :) I think they’re just an unfortunate detritus of a time before PathLike and that it’s super weird to have some operations you do to a file path (compared to things you do to generate, modify, or resolve a path) be hung off of the Path object and every other be an independent thing that takes it as an input. I’d find it equally weird if dictionary objects supported a print() or a .json() method. — Donald Stufft
On 27.03.17 15:50, Ram Rachum wrote:
Hi guys,
What do you think about adding methods pathlib.Path.write_json and pathlib.Path.read_json , similar to write_text, write_bytes, read_text, read_bytes?
This would make writing / reading JSON to a file a one liner instead of a two-line with clause.
Good try, but you have published this idea 5 days ahead of schedule.
On Mon, Mar 27, 2017 at 02:50:38PM +0200, Ram Rachum wrote:
Hi guys,
What do you think about adding methods pathlib.Path.write_json and pathlib.Path.read_json , similar to write_text, write_bytes, read_text, read_bytes?
This would make writing / reading JSON to a file a one liner instead of a two-line with clause.
Reading/writing JSON is already a one liner, for people who care about writing one liners: obj = json.load(open("foo.json")) json.dump(obj, open("foo.json")) Pathlib exists as an OO interface to low-level path and file operations. It understands how to read and write to files, but it doesn't understand the content of those files. I don't think it should. Of course pathlib can already read JSON, or for that matter ReST text or JPG binary files. It can read anything as text or bytes, including JSON: some_path.write_text(json.dumps(obj)) json.loads(some_path.read_text()) I don't think it should be pathlib's responsibility to deal with the file format (besides text). Today you want to add JSON support. What about XML and plists and ini files? Tomorrow you'll ask for HTML support, next week someone will want pathlib to support .wav files as a one liner, and before you know it pathlib is responsible for a hundred different file formats with separate read_* and write_* methods. That's not pathlib's responsibility, and there is nothing wrong with writing two lines of code. -- Steve
On 03/27/2017 08:04 AM, Steven D'Aprano wrote:
On Mon, Mar 27, 2017 at 02:50:38PM +0200, Ram Rachum wrote:
What do you think about adding methods pathlib.Path.write_json and pathlib.Path.read_json , similar to write_text, write_bytes, read_text, read_bytes?
That's not pathlib's responsibility, and there is nothing wrong with writing two lines of code.
+1
2017-03-27 17:04 GMT+02:00 Steven D'Aprano
Of course pathlib can already read JSON, or for that matter ReST text or JPG binary files. It can read anything as text or bytes, including JSON:
some_path.write_text(json.dumps(obj)) json.loads(some_path.read_text())
Note: You should specify the encoding: some_path.write_text(json.dumps(obj), encoding='utf8') json.loads(some_path.read_text(encoding='utf8'))
I don't think it should be pathlib's responsibility to deal with the file format (besides text).
Right. Victor
I'm not in favor of this idea for the reason mentioned by many of the other
posters. BUT ... this does bring up something missing from json readers: *the
ability to read one json object from the input rather than reading the
entire input* and attempting to interpret it as one object. For my use
case, it would be sufficient to read whole lines only but I can imagine
other use cases.
The basic rule would be to read as much of the input as necessary (and no
more) to read a single json object, ignoring leading white space.
In practical terms:
- if the first character is [ or { or " read to the matching ] or } or "
- otherwise if the first character is a digit or '-' read as many
characters as possible to parse a number
- otherwise attempt to match 'true', 'false' or 'null'
- otherwise fail
--- Bruce
Check out my puzzle book and get it free here:
http://J.mp/ingToConclusionsFree (available on iOS)
On Mon, Mar 27, 2017 at 5:50 AM, Ram Rachum
Hi guys,
What do you think about adding methods pathlib.Path.write_json and pathlib.Path.read_json , similar to write_text, write_bytes, read_text, read_bytes?
This would make writing / reading JSON to a file a one liner instead of a two-line with clause.
Thanks, Ram.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Mon, Mar 27, 2017 at 9:43 AM, Bruce Leban
I'm not in favor of this idea for the reason mentioned by many of the other posters. BUT ... this does bring up something missing from json readers: *the ability to read one json object from the input rather than reading the entire input* and attempting to interpret it as one object.
I can't tell from the JSON spec (at least not quickly), but it is possible to have more than one object at the top level? Experimenting with the python json module seems to indicate that it is not -- you can only have one "thing" in a JSON file -- either an "object" or an array. then, of course you can arbitrarily nest stuff inside that top-level container. Since the nesting is arbitrary, I'm not sure it's clear how a one-object-at-a-time reader would work in the general case? -CHB
For my use case, it would be sufficient to read whole lines only but I can imagine other use cases.
The basic rule would be to read as much of the input as necessary (and no more) to read a single json object, ignoring leading white space.
In practical terms:
- if the first character is [ or { or " read to the matching ] or } or " - otherwise if the first character is a digit or '-' read as many characters as possible to parse a number - otherwise attempt to match 'true', 'false' or 'null' - otherwise fail
--- Bruce Check out my puzzle book and get it free here: http://J.mp/ingToConclusionsFree (available on iOS)
On Mon, Mar 27, 2017 at 5:50 AM, Ram Rachum
wrote: Hi guys,
What do you think about adding methods pathlib.Path.write_json and pathlib.Path.read_json , similar to write_text, write_bytes, read_text, read_bytes?
This would make writing / reading JSON to a file a one liner instead of a two-line with clause.
Thanks, Ram.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On 27 March 2017 at 17:43, Bruce Leban
the ability to read one json object from the input rather than reading the entire input
Is this a well-defined idea? From a quick read of the JSON spec (which is remarkably short on details of how JSON is stored in files, etc) the only reference I can see is to a "JSON text" which is a JSON representation of a single value. There's nothing describing how multiple values would be stored in the same file/transmitted in the same stream. It's not unreasonable to assume "read one object, then read another" but without an analysis of the grammar, it's not 100% clear if the grammar supports that (you sort of have to assume that when you hit "the end of the object" you skip some whitespace then start on the next - but the spec doesn't say anything like that. Alternatively, it's just as reasonable to assume that json.load/json.loads expect to be passed a single "JSON text" as defined by the spec. If the spec was clear on how multiple objects in a single stream should be handled, then yes the json module should support that. But without anything explicit in the spec, it's not as obvious. What do other languages do? Paul
The format JSON lines (http://jsonlines.org/) is pretty widely used, but is an extension of JSON itself. Basically, it's the idea that you can put one object per physical line to allow incremental reading or spending of objects. It's a good idea, and I think the `json` module should support it. But it definitely doesn't belong in `pathlib`. On Mar 27, 2017 3:36 PM, "Paul Moore"
the ability to read one json object from the input rather than reading
On 27 March 2017 at 17:43, Bruce Leban
wrote: the entire input
Is this a well-defined idea? From a quick read of the JSON spec (which is remarkably short on details of how JSON is stored in files, etc) the only reference I can see is to a "JSON text" which is a JSON representation of a single value. There's nothing describing how multiple values would be stored in the same file/transmitted in the same stream. It's not unreasonable to assume "read one object, then read another" but without an analysis of the grammar, it's not 100% clear if the grammar supports that (you sort of have to assume that when you hit "the end of the object" you skip some whitespace then start on the next - but the spec doesn't say anything like that. Alternatively, it's just as reasonable to assume that json.load/json.loads expect to be passed a single "JSON text" as defined by the spec.
If the spec was clear on how multiple objects in a single stream should be handled, then yes the json module should support that. But without anything explicit in the spec, it's not as obvious. What do other languages do?
Paul _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
This is a better link: https://en.m.wikipedia.org/wiki/JSON_Streaming
On Mar 27, 2017 3:45 PM, "David Mertz"
The format JSON lines (http://jsonlines.org/) is pretty widely used, but is an extension of JSON itself. Basically, it's the idea that you can put one object per physical line to allow incremental reading or spending of objects.
It's a good idea, and I think the `json` module should support it. But it definitely doesn't belong in `pathlib`.
On Mar 27, 2017 3:36 PM, "Paul Moore"
wrote: the ability to read one json object from the input rather than reading
On 27 March 2017 at 17:43, Bruce Leban
wrote: the entire input
Is this a well-defined idea? From a quick read of the JSON spec (which is remarkably short on details of how JSON is stored in files, etc) the only reference I can see is to a "JSON text" which is a JSON representation of a single value. There's nothing describing how multiple values would be stored in the same file/transmitted in the same stream. It's not unreasonable to assume "read one object, then read another" but without an analysis of the grammar, it's not 100% clear if the grammar supports that (you sort of have to assume that when you hit "the end of the object" you skip some whitespace then start on the next - but the spec doesn't say anything like that. Alternatively, it's just as reasonable to assume that json.load/json.loads expect to be passed a single "JSON text" as defined by the spec.
If the spec was clear on how multiple objects in a single stream should be handled, then yes the json module should support that. But without anything explicit in the spec, it's not as obvious. What do other languages do?
Paul _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
FWIW, pyline could produce streaming JSON w/ json.dumps(indent=0), but because indent>0, there are newlines. pydoc json | pyline '{"a":l} if "json" in l.lower() else None' -O json pydoc json | pyline -r '.*JSON.*' 'rgx and line' -O json It's a similar issue: what are good default JSON encoding/decoding settings? # loads/JSONDecoder file.encoding # UTF-8 object_pairs_hook object_hook # dumps/JSONEncoder file.encoding # UTF-8 cls separators indent - [ ] ENH: pyline: add 'jsonlines' as an {output,} format
From https://twitter.com/raymondh/status/842777864193769472 :
#python tip: Set separators=(',', ':') to dump JSON more compactly.
json.dumps({'a':1, 'b':2}, separators=(',',':')) '{"a":1,"b":2}'
On Mon, Mar 27, 2017 at 3:46 PM, David Mertz
This is a better link: https://en.m.wikipedia.org/wiki/JSON_Streaming
On Mar 27, 2017 3:45 PM, "David Mertz"
wrote: The format JSON lines (http://jsonlines.org/) is pretty widely used, but is an extension of JSON itself. Basically, it's the idea that you can put one object per physical line to allow incremental reading or spending of objects.
It's a good idea, and I think the `json` module should support it. But it definitely doesn't belong in `pathlib`.
On Mar 27, 2017 3:36 PM, "Paul Moore"
wrote: the ability to read one json object from the input rather than reading
On 27 March 2017 at 17:43, Bruce Leban
wrote: the entire input
Is this a well-defined idea? From a quick read of the JSON spec (which is remarkably short on details of how JSON is stored in files, etc) the only reference I can see is to a "JSON text" which is a JSON representation of a single value. There's nothing describing how multiple values would be stored in the same file/transmitted in the same stream. It's not unreasonable to assume "read one object, then read another" but without an analysis of the grammar, it's not 100% clear if the grammar supports that (you sort of have to assume that when you hit "the end of the object" you skip some whitespace then start on the next - but the spec doesn't say anything like that. Alternatively, it's just as reasonable to assume that json.load/json.loads expect to be passed a single "JSON text" as defined by the spec.
If the spec was clear on how multiple objects in a single stream should be handled, then yes the json module should support that. But without anything explicit in the spec, it's not as obvious. What do other languages do?
Paul _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Paul Moore wrote:
Is this a well-defined idea? ... There's nothing describing how multiple values would be stored in the same file/transmitted in the same stream.
I think this is something that's outside the scope of the spec. But since the grammar makes it clear when you've reached the end of a value, it seems entirely reasonable for a parser to just stop reading from the stream at that point, and leave whatever remains for the application to deal with as it sees fit. The application can then choose to immediately read another value from the same stream if it wants. -- Greg
On 28.03.17 02:35, Greg Ewing wrote:
Paul Moore wrote:
Is this a well-defined idea? ... There's nothing describing how multiple values would be stored in the same file/transmitted in the same stream.
I think this is something that's outside the scope of the spec.
But since the grammar makes it clear when you've reached the end of a value, it seems entirely reasonable for a parser to just stop reading from the stream at that point, and leave whatever remains for the application to deal with as it sees fit. The application can then choose to immediately read another value from the same stream if it wants.
You can determine the end of integer literal only after reading a character past the end of the integer literal. This there is not a way to put back a character, it will be lost for following readers. And currently json.load() is implemented by reading all file content at once and passing it to json.loads(). Different implementation would be much more complex (if we don't want to loss the performance).
participants (18)
-
Barry Scott
-
Bruce Leban
-
Chris Barker
-
David Mertz
-
Donald Stufft
-
Eric V. Smith
-
Ethan Furman
-
Greg Ewing
-
Markus Meskanen
-
Nick Timkovich
-
Paul Moore
-
Philipp A.
-
Ram Rachum
-
Serhiy Storchaka
-
Steve Dower
-
Steven D'Aprano
-
Victor Stinner
-
Wes Turner