Mailman 3 Re: [docs] 'codecs' module docs improvements (issue 19548) - docs

22 Dec 2014

      I definitely like the proposed changes overall, especially splitting
CodecInfo out into a separately documented class.

Lots more detailed feedback inline, including some genuien questions
where I don't know the answer off the top of my head myself, and it
isn't a quick check at the interactive prompt to confirm the current
behaviour.

http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst
File Doc/library/codecs.rst (right):

http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod...
Doc/library/codecs.rst:24: bytes to bytes.
I like the idea of expanding on this paragraph. I suggest we make "text
encoding" a defined term in the glossary, and adjust this addition to
be:

Most standard codecs are :term:`text encodings <text encoding>`, which
encode text to bytes, but there are also codecs that encode text to
text, and bytes to bytes. Custom codecs beyond those provided in the
standard library may convert between arbitrary types."

http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod...
Doc/library/codecs.rst:54: is not reversible, which doesn't play well
with module reloading.
I suggest pulling this caveat out into a "note". As far as phrasing
goes, something a little more formal may be good. Perhaps:

"Search function registration is not currently reversible, which may
cause problems in some cases, such as unit testing or module reloading"

http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod...
Doc/library/codecs.rst:198: suitable replacement marker such as ``b'?'``
in or ``'\ufffd'``.
With the new defined term, this error handling function should gain a
sentence to indicate that it's only applicable to text encodings. (There
may actually be cases where it's usable for bytes->bytes or text->text,
but if that's actually the case, we can figure out how to fix the issue
later. For the time being, I think it's more important to clearly
document the text encoding case)

The substitution can then be clarified to say that b'?' is produced when
encoding, and '\ufffd' when decoding (we can also embed the actual
encoded code point in the documentation here, in addition to the \u
escape).

http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod...
Doc/library/codecs.rst:210: unencodable character is replaced by an
appropriate XML character reference.
Also text encoding specific.

http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod...
Doc/library/codecs.rst:216: unencodable character is replaced by a
backslashed escape sequence.
Also text encoding specific.

http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod...
Doc/library/codecs.rst:221: unencodable character is replaced by a
``\N{...}`` escape sequence.
Also text encoding specific.

http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod...
Doc/library/codecs.rst:238: No automatic conversion of ``'\n'`` is done
on reading and writing.
We should also explicitly note here that *this* open uses the
StreamReader and StreamWriter classes provided by the CodecInfo module,
*not* the general purpose IO classes provided by the io module.

This means that 'encoding' here can be any arbitrary registered codec,
rather than necessarily being a text encoding as it is for the io module
(and hence the builtin open function)

http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod...
Doc/library/codecs.rst:241: Only the codecs whose encoded form is bytes
are supported.
It's worth explicitly noting here that the types supported by the
file-like object will depend on the codec used.

http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod...
Doc/library/codecs.rst:261: a text stream.
I'm going to have to look at the implementation to confirm whether this
more specific version is accurate, as I suspect it isn't quite right in
the case where data_encoding == file_encoding.

For the case where they're different, we may want to use the word
"transcoding".

http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod...
Doc/library/codecs.rst:267: error occurs.
Since this factory function produces a StreamRecoder instance, a cross
reference may be useful.

http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod...
Doc/library/codecs.rst:329: To simplify and standardize error handling,
I suggest giving this section on error handlers its own subheading and
label, so it can be referenced directly.

http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod...
Doc/library/codecs.rst:332: and implemented by all standard Python
codecs:
It may be worthwhile adding cross references back to the functions
implementing each error handler.

For example, finishing with: "(Implemented in :func:`strict_errors`)"

http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod...
Doc/library/codecs.rst:337: | Value                   | Meaning         
                             |
I believe this table needs to be split in two, with replace through to
surrogateescape being restricted in applicability to text encodings.

http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod...
Doc/library/codecs.rst:368: In addition, the following error handlers
are specific to Unicode encoding
With the above table split in two, the phrasing in this introductory
paragraph may need tweaking.

http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod...
Doc/library/codecs.rst:469: by providing the *errors* keyword argument.
These parameters are predefined:
Given the new link target for the error handler tables, I'd replace this
bulleted list with a cross-reference.

http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod...
Doc/library/codecs.rst:489: :func:`register_error`.
This paragraph can move to the shared section on error handlers.

http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod...
Doc/library/codecs.rst:540: by providing the *errors* keyword argument.
These parameters are predefined:
As above - it's likely worth replacing the repeated list with a
cross-reference.

http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod...
Doc/library/codecs.rst:554: :func:`register_error`.
With this paragraph moved to the common section, it can be dropped here.

http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod...
Doc/library/codecs.rst:617: *stream* must be a file-like object open for
writing.
I suggest finishing this sentence as "... writing text or binary data,
as appropriate for the specific codec".

http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod...
Doc/library/codecs.rst:620: providing the *errors* keyword argument.
These parameters are predefined:
As with the Incremental* APIs, I'm a fan of consolidating the error
handler documentation, and linking to it where appropriate.

http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod...
Doc/library/codecs.rst:650: the :meth:`write` method). Does not work on
byte encoders.
The new caveat needs to be clarified: what does "Does not work" mean?
What counts as a "byte encoder" in this context?

If the issue is "May not be present for codecs other than text
encodings", then that's what we should say.

http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod...
Doc/library/codecs.rst:684: *stream* must be a file-like object open for
reading.
See the StreamWriter comments for suggested phrasing here.

http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod...
Doc/library/codecs.rst:708: characters or bytes to return. The
:func:`read` method will
characters -> code points (due to complexities in the definition of what
actually constitutes a Unicode characters, str's are code point arrays
rather than character arrays)

http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod...
Doc/library/codecs.rst:713: number of encoded bytes or characters to
read
characters -> code points again.

http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod...
Doc/library/codecs.rst:760: In addition to the above methods, the
:class:`StreamReader` must also inherit
Not a new issue in the patch, but "inherit" doesn't strike me as being
the right word here (I'm also not sure the statement is even true any
more).

http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod...
Doc/library/codecs.rst:1242: |                    |         | decoding
``b''``          |
This should be clear on what b'' gets decoded to.

http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod...
Doc/library/codecs.rst:1250: |                    |         | Latin-1
source code.      |
This description should be caveated with the fact that Python 3 uses
UTF-8 by default, and relying on that is preferable to unicode_escape in
most cases where Python 2 source compatibility isn't needed and a utf-8
encoding cookie can't be used for some reason.

http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod...
Doc/library/codecs.rst:1256: |                    |         | ..
deprecated:: 3.3       |
May be worth cross referencing PEP 393 here to explain this deprecation.

http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod...
Doc/library/codecs.rst:1265: to :class:`bytes` mappings.  They are not
supported by :meth:`bytes.decode`.
Perhaps append "(which is expected to produce :class:`str` output)" to
the new sentence?

http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod...
Doc/library/codecs.rst:1320: mapping.  It is not supported by
:meth:`str.encode`.
Perhaps append "(which is expected to produce :class:`bytes` output)"?

http://bugs.python.org/review/19548/

Re: [docs] 'codecs' module docs improvements (issue 19548)

ncoghlan＠gmail.com

tags

participants (1)