[docs] Fixed the documentation of the mapping codec APIs (issue 28749)

Thu Nov 24 06:08:21 EST 2016

Reviewers: xiang.zhang,


http://bugs.python.org/review/28749/diff/19217/Doc/c-api/unicode.rst
File Doc/c-api/unicode.rst (right):

http://bugs.python.org/review/28749/diff/19217/Doc/c-api/unicode.rst#newcode1413
Doc/c-api/unicode.rst:1413: decode characters.  The mapping objects
provided must only support the
On 2016/11/24 10:35:22, xiang.zhang wrote:
> I don't understand what the 'only' means. It looks to me except
'__getitem__' no
> other methods could be defined.

This phrase was in the old documentation, and it still is true. It means
that the mapping object needs to define the __getitem__ method. No other
methods are needed.

http://bugs.python.org/review/28749/diff/19217/Doc/c-api/unicode.rst#newcode1430
Doc/c-api/unicode.rst:1430: cause an error.
On 2016/11/24 10:35:22, xiang.zhang wrote:
> Cause an error or not depends on *errors*.

Is this a question? All these cases mean "undefined mapping". They cause
calling the error handler named *error* if it is provided, as in other
decoders.

http://bugs.python.org/review/28749/diff/19217/Doc/c-api/unicode.rst#newcode1442
Doc/c-api/unicode.rst:1442: ``None`` are  treated as "undefined mapping"
and cause an error.
On 2016/11/24 10:35:22, xiang.zhang wrote:
>                      ^ redundant space
> 
> One more question about this API, why not allow *mapping* be NULL and
default to
> Latin1? PyUnicode_EncodeCharmap and PyUnicode_DecodeCharmap both get
this
> behaviour.

PyUnicode_EncodeCharmap and PyUnicode_DecodeCharmap have this behavior
for historical reasons. Latin1 was the default encoding in Python 2. No
need of this case in Python 3.



Please review this at http://bugs.python.org/review/28749/

Affected files:
  Doc/c-api/unicode.rst


diff -r 75fe67538905 Doc/c-api/unicode.rst

--- a/Doc/c-api/unicode.rst	Sun Nov 20 12:13:44 2016 +0200
+++ b/Doc/c-api/unicode.rst	Sun Nov 20 12:29:14 2016 +0200
@@ -1410,59 +1410,72 @@ Character Map Codecs
 This codec is special in that it can be used to implement many different codecs
 (and this is in fact what was done to obtain most of the standard codecs
 included in the :mod:`encodings` package). The codec uses mapping to encode and
-decode characters.
-
-Decoding mappings must map single string characters to single Unicode
-characters, integers (which are then interpreted as Unicode ordinals) or ``None``
-(meaning "undefined mapping" and causing an error).
-
-Encoding mappings must map single Unicode characters to single string
-characters, integers (which are then interpreted as Latin-1 ordinals) or ``None``
-(meaning "undefined mapping" and causing an error).
-
-The mapping objects provided must only support the __getitem__ mapping
-interface.
-
-If a character lookup fails with a LookupError, the character is copied as-is
-meaning that its ordinal value will be interpreted as Unicode or Latin-1 ordinal
-resp. Because of this, mappings only need to contain those mappings which map
-characters to different code points.
+decode characters.  The mapping objects provided must only support the
+:meth:`__getitem__` mapping interface; dictionaries and sequences work well.
 
 These are the mapping codec APIs:
 
-.. c:function:: PyObject* PyUnicode_DecodeCharmap(const char *s, Py_ssize_t size, \
+.. c:function:: PyObject* PyUnicode_DecodeCharmap(const char *data, Py_ssize_t size, \
                               PyObject *mapping, const char *errors)
 
-   Create a Unicode object by decoding *size* bytes of the encoded string *s* using
-   the given *mapping* object.  Return *NULL* if an exception was raised by the
-   codec. If *mapping* is *NULL* latin-1 decoding will be done. Else it can be a
-   dictionary mapping byte or a unicode string, which is treated as a lookup table.
-   Byte values greater that the length of the string and U+FFFE "characters" are
-   treated as "undefined mapping".
+   Create a Unicode object by decoding *size* bytes of the encoded string *s*
+   using the given *mapping* object.  Return *NULL* if an exception was raised
+   by the codec.
+
+   If *mapping* is *NULL* Latin-1 decoding will be done.  Else it must map
+   bytes (integers in the range from 0 to 255) to Unicode strings, integers
+   (which are then interpreted as Unicode ordinals) or ``None``.  Unmapped
+   bytes (ones which cause a :exc:`LookupError`) as well as mapped to
+   ``None``, ``0xFFFE`` or ``'\ufffe'`` are treated as "undefined mapping" and
+   cause an error.
 
 
 .. c:function:: PyObject* PyUnicode_AsCharmapString(PyObject *unicode, PyObject *mapping)
 
-   Encode a Unicode object using the given *mapping* object and return the result
-   as Python string object.  Error handling is "strict".  Return *NULL* if an
+   Encode a Unicode object using the given *mapping* object and return the
+   result as a bytes object.  Error handling is "strict".  Return *NULL* if an
    exception was raised by the codec.
 
+   The *mapping* object must map Unicode ordinal integers to bytes objects,
+   integers in the range from 0 to 255 or ``None``.  Unmapped character
+   ordinals (ones which cause a :exc:`LookupError`) as well as mapped to
+   ``None`` are  treated as "undefined mapping" and cause an error.
+
+
+.. c:function:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE *s, Py_ssize_t size, \
+                              PyObject *mapping, const char *errors)
+
+   Encode the :c:type:`Py_UNICODE` buffer of the given *size* using the given
+   *mapping* object and return the result as a bytes object.  Return *NULL* if
+   an exception was raised by the codec.
+
+   .. deprecated-removed:: 3.3 4.0
+      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
+      :c:func:`PyUnicode_AsCharmapString` or
+      :c:func:`PyUnicode_AsEncodedString`.
+
+
 The following codec API is special in that maps Unicode to Unicode.
 
+.. c:function:: PyObject* PyUnicode_Translate(PyObject *unicode, \
+                              PyObject *mapping, const char *errors)
+
+   Translate a Unicode object using the given *mapping* object and return the
+   resulting Unicode object.  Return *NULL* if an exception was raised by the
+   codec.
+
+   The *mapping* object must map Unicode ordinal integers to Unicode strings,
+   integers (which are then interpreted as Unicode ordinals) or ``None``
+   (causing deletion of the character).  Unmapped character ordinals (ones
+   which cause a :exc:`LookupError`) are left untouched and are copied as-is.
+
 
 .. c:function:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE *s, Py_ssize_t size, \
-                              PyObject *table, const char *errors)
+                              PyObject *mapping, const char *errors)
 
    Translate a :c:type:`Py_UNICODE` buffer of the given *size* by applying a
-   character mapping *table* to it and return the resulting Unicode object.  Return
-   *NULL* when an exception was raised by the codec.
-
-   The *mapping* table must map Unicode ordinal integers to Unicode ordinal
-   integers or ``None`` (causing deletion of the character).
-
-   Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
-   and sequences work well.  Unmapped character ordinals (ones which cause a
-   :exc:`LookupError`) are left untouched and are copied as-is.
+   character *mapping* table to it and return the resulting Unicode object.
+   Return *NULL* when an exception was raised by the codec.
 
    .. deprecated-removed:: 3.3 4.0
       Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
@@ -1470,19 +1483,6 @@ The following codec API is special in th
       <codec-registry>`
 
 
-.. c:function:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE *s, Py_ssize_t size, \
-                              PyObject *mapping, const char *errors)
-
-   Encode the :c:type:`Py_UNICODE` buffer of the given *size* using the given
-   *mapping* object and return a Python string object. Return *NULL* if an
-   exception was raised by the codec.
-
-   .. deprecated-removed:: 3.3 4.0
-      Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
-      :c:func:`PyUnicode_AsCharmapString` or
-      :c:func:`PyUnicode_AsEncodedString`.
-
-
 MBCS codecs for Windows
 """""""""""""""""""""""