[Python-checkins] cpython (merge 3.4 -> default): Issue #21514: The documentation of the json module now refers to new JSON RFC

serhiy.storchaka python-checkins at python.org
Thu Nov 27 18:51:14 CET 2014


https://hg.python.org/cpython/rev/aced2548345a
changeset:   93626:aced2548345a
parent:      93622:167d51a54de2
parent:      93625:89bb4384f1e1
user:        Serhiy Storchaka <storchaka at gmail.com>
date:        Thu Nov 27 19:45:31 2014 +0200
summary:
  Issue #21514: The documentation of the json module now refers to new JSON RFC
7159 instead of obsoleted RFC 4627.

files:
  Doc/library/json.rst |  111 ++++++++++++++++++++----------
  Misc/NEWS            |    3 +
  2 files changed, 75 insertions(+), 39 deletions(-)


diff --git a/Doc/library/json.rst b/Doc/library/json.rst
--- a/Doc/library/json.rst
+++ b/Doc/library/json.rst
@@ -7,9 +7,11 @@
 .. sectionauthor:: Bob Ippolito <bob at redivi.com>
 
 `JSON (JavaScript Object Notation) <http://json.org>`_, specified by
-:rfc:`4627`, is a lightweight data interchange format based on a subset of
-`JavaScript <http://en.wikipedia.org/wiki/JavaScript>`_ syntax (`ECMA-262 3rd
-edition <http://www.ecma-international.org/publications/files/ECMA-ST-ARCH/ECMA-262,%203rd%20edition,%20December%201999.pdf>`_).
+:rfc:`7159` (which obsoletes :rfc:`4627`) and by
+`ECMA-404 <http://www.ecma-international.org/publications/standards/Ecma-404.htm>`_,
+is a lightweight data interchange format inspired by
+`JavaScript <http://en.wikipedia.org/wiki/JavaScript>`_ object literal syntax
+(although it is not a strict subset of JavaScript [#rfc-errata]_ ).
 
 :mod:`json` exposes an API familiar to users of the standard library
 :mod:`marshal` and :mod:`pickle` modules.
@@ -467,18 +469,18 @@
                 mysocket.write(chunk)
 
 
-Standard Compliance
--------------------
+Standard Compliance and Interoperability
+----------------------------------------
 
-The JSON format is specified by :rfc:`4627`.  This section details this
-module's level of compliance with the RFC.  For simplicity,
-:class:`JSONEncoder` and :class:`JSONDecoder` subclasses, and parameters other
-than those explicitly mentioned, are not considered.
+The JSON format is specified by :rfc:`7159` and by
+`ECMA-404 <http://www.ecma-international.org/publications/standards/Ecma-404.htm>`_.
+This section details this module's level of compliance with the RFC.
+For simplicity, :class:`JSONEncoder` and :class:`JSONDecoder` subclasses, and
+parameters other than those explicitly mentioned, are not considered.
 
 This module does not comply with the RFC in a strict fashion, implementing some
 extensions that are valid JavaScript but not valid JSON.  In particular:
 
-- Top-level non-object, non-array values are accepted and output;
 - Infinite and NaN number values are accepted and output;
 - Repeated names within an object are accepted, and only the value of the last
   name-value pair is used.
@@ -490,8 +492,8 @@
 Character Encodings
 ^^^^^^^^^^^^^^^^^^^
 
-The RFC recommends that JSON be represented using either UTF-8, UTF-16, or
-UTF-32, with UTF-8 being the default.
+The RFC requires that JSON be represented using either UTF-8, UTF-16, or
+UTF-32, with UTF-8 being the recommended default for maximum interoperability.
 
 As permitted, though not required, by the RFC, this module's serializer sets
 *ensure_ascii=True* by default, thus escaping the output so that the resulting
@@ -499,34 +501,20 @@
 
 Other than the *ensure_ascii* parameter, this module is defined strictly in
 terms of conversion between Python objects and
-:class:`Unicode strings <str>`, and thus does not otherwise address the issue
-of character encodings.
+:class:`Unicode strings <str>`, and thus does not otherwise directly address
+the issue of character encodings.
 
+The RFC prohibits adding a byte order mark (BOM) to the start of a JSON text,
+and this module's serializer does not add a BOM to its output.
+The RFC permits, but does not require, JSON deserializers to ignore an initial
+BOM in their input.  This module's deserializer raises a :exc:`ValueError`
+when an initial BOM is present.
 
-Top-level Non-Object, Non-Array Values
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-The RFC specifies that the top-level value of a JSON text must be either a
-JSON object or array (Python :class:`dict` or :class:`list`).  This module's
-deserializer also accepts input texts consisting solely of a
-JSON null, boolean, number, or string value::
-
-   >>> just_a_json_string = '"spam and eggs"'  # Not by itself a valid JSON text
-   >>> json.loads(just_a_json_string)
-   'spam and eggs'
-
-This module itself does not include a way to request that such input texts be
-regarded as illegal.  Likewise, this module's serializer also accepts single
-Python :data:`None`, :class:`bool`, numeric, and :class:`str`
-values as input and will generate output texts consisting solely of a top-level
-JSON null, boolean, number, or string value without raising an exception::
-
-   >>> neither_a_list_nor_a_dict = "spam and eggs"
-   >>> json.dumps(neither_a_list_nor_a_dict)  # The result is not a valid JSON text
-   '"spam and eggs"'
-
-This module's serializer does not itself include a way to enforce the
-aforementioned constraint.
+The RFC does not explicitly forbid JSON strings which contain byte sequences
+that don't correspond to valid Unicode characters (e.g. unpaired UTF-16
+surrogates), but it does note that they may cause interoperability problems.
+By default, this module accepts and outputs (when present in the original
+:class:`str`) codepoints for such sequences.
 
 
 Infinite and NaN Number Values
@@ -556,7 +544,7 @@
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 The RFC specifies that the names within a JSON object should be unique, but
-does not specify how repeated names in JSON objects should be handled.  By
+does not mandate how repeated names in JSON objects should be handled.  By
 default, this module does not raise an exception; instead, it ignores all but
 the last name-value pair for a given name::
 
@@ -566,6 +554,42 @@
 
 The *object_pairs_hook* parameter can be used to alter this behavior.
 
+
+Top-level Non-Object, Non-Array Values
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The old version of JSON specified by the obsolete :rfc:`4627` required that
+the top-level value of a JSON text must be either a JSON object or array
+(Python :class:`dict` or :class:`list`), and could not be a JSON null,
+boolean, number, or string value.  :rfc:`7159` removed that restriction, and
+this module does not and has never implemented that restriction in either its
+serializer or its deserializer.
+
+Regardless, for maximum interoperability, you may wish to voluntarily adhere
+to the restriction yourself.
+
+
+Implementation Limitations
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Some JSON deserializer implementations may set limits on:
+
+* the size of accepted JSON texts
+* the maximum level of nesting of JSON objects and arrays
+* the range and precision of JSON numbers
+* the content and maximum length of JSON strings
+
+This module does not impose any such limits beyond those of the relevant
+Python datatypes themselves or the Python interpreter itself.
+
+When serializing to JSON, beware any such limitations in applications that may
+consume your JSON.  In particular, it is common for JSON numbers to be
+deserialized into IEEE 754 double precision numbers and thus subject to that
+representation's range and precision limitations.  This is especially relevant
+when serializing Python :class:`int` values of extremely large magnitude, or
+when serializing instances of "exotic" numerical types such as
+:class:`decimal.Decimal`.
+
 .. highlight:: bash
 .. module:: json.tool
 
@@ -627,3 +651,12 @@
 .. cmdoption:: -h, --help
 
    Show the help message.
+
+
+.. rubric:: Footnotes
+
+.. [#rfc-errata] As noted in `the errata for RFC 7159
+   <http://www.rfc-editor.org/errata_search.php?rfc=7159>`_,
+   JSON permits literal U+2028 (LINE SEPARATOR) and
+   U+2029 (PARAGRAPH SEPARATOR) characters in strings, whereas JavaScript
+   (as of ECMAScript Edition 5.1) does not.
diff --git a/Misc/NEWS b/Misc/NEWS
--- a/Misc/NEWS
+++ b/Misc/NEWS
@@ -1300,6 +1300,9 @@
 Documentation
 -------------
 
+- Issue #21514: The documentation of the json module now refers to new JSON RFC
+  7159 instead of obsoleted RFC 4627.
+
 - Issue #21777: The binary sequence methods on bytes and bytearray are now
   documented explicitly, rather than assuming users will be able to derive
   the expected behaviour from the behaviour of the corresponding str methods.

-- 
Repository URL: https://hg.python.org/cpython


More information about the Python-checkins mailing list