Re: [docs] 'codecs' module docs improvements (issue 19548)
I definitely like the proposed changes overall, especially splitting CodecInfo out into a separately documented class. Lots more detailed feedback inline, including some genuien questions where I don't know the answer off the top of my head myself, and it isn't a quick check at the interactive prompt to confirm the current behaviour. http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst File Doc/library/codecs.rst (right): http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod... Doc/library/codecs.rst:24: bytes to bytes. I like the idea of expanding on this paragraph. I suggest we make "text encoding" a defined term in the glossary, and adjust this addition to be: Most standard codecs are :term:`text encodings <text encoding>`, which encode text to bytes, but there are also codecs that encode text to text, and bytes to bytes. Custom codecs beyond those provided in the standard library may convert between arbitrary types." http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod... Doc/library/codecs.rst:54: is not reversible, which doesn't play well with module reloading. I suggest pulling this caveat out into a "note". As far as phrasing goes, something a little more formal may be good. Perhaps: "Search function registration is not currently reversible, which may cause problems in some cases, such as unit testing or module reloading" http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod... Doc/library/codecs.rst:198: suitable replacement marker such as ``b'?'`` in or ``'\ufffd'``. With the new defined term, this error handling function should gain a sentence to indicate that it's only applicable to text encodings. (There may actually be cases where it's usable for bytes->bytes or text->text, but if that's actually the case, we can figure out how to fix the issue later. For the time being, I think it's more important to clearly document the text encoding case) The substitution can then be clarified to say that b'?' is produced when encoding, and '\ufffd' when decoding (we can also embed the actual encoded code point in the documentation here, in addition to the \u escape). http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod... Doc/library/codecs.rst:210: unencodable character is replaced by an appropriate XML character reference. Also text encoding specific. http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod... Doc/library/codecs.rst:216: unencodable character is replaced by a backslashed escape sequence. Also text encoding specific. http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod... Doc/library/codecs.rst:221: unencodable character is replaced by a ``\N{...}`` escape sequence. Also text encoding specific. http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod... Doc/library/codecs.rst:238: No automatic conversion of ``'\n'`` is done on reading and writing. We should also explicitly note here that *this* open uses the StreamReader and StreamWriter classes provided by the CodecInfo module, *not* the general purpose IO classes provided by the io module. This means that 'encoding' here can be any arbitrary registered codec, rather than necessarily being a text encoding as it is for the io module (and hence the builtin open function) http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod... Doc/library/codecs.rst:241: Only the codecs whose encoded form is bytes are supported. It's worth explicitly noting here that the types supported by the file-like object will depend on the codec used. http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod... Doc/library/codecs.rst:261: a text stream. I'm going to have to look at the implementation to confirm whether this more specific version is accurate, as I suspect it isn't quite right in the case where data_encoding == file_encoding. For the case where they're different, we may want to use the word "transcoding". http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod... Doc/library/codecs.rst:267: error occurs. Since this factory function produces a StreamRecoder instance, a cross reference may be useful. http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod... Doc/library/codecs.rst:329: To simplify and standardize error handling, I suggest giving this section on error handlers its own subheading and label, so it can be referenced directly. http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod... Doc/library/codecs.rst:332: and implemented by all standard Python codecs: It may be worthwhile adding cross references back to the functions implementing each error handler. For example, finishing with: "(Implemented in :func:`strict_errors`)" http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod... Doc/library/codecs.rst:337: | Value | Meaning | I believe this table needs to be split in two, with replace through to surrogateescape being restricted in applicability to text encodings. http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod... Doc/library/codecs.rst:368: In addition, the following error handlers are specific to Unicode encoding With the above table split in two, the phrasing in this introductory paragraph may need tweaking. http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod... Doc/library/codecs.rst:469: by providing the *errors* keyword argument. These parameters are predefined: Given the new link target for the error handler tables, I'd replace this bulleted list with a cross-reference. http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod... Doc/library/codecs.rst:489: :func:`register_error`. This paragraph can move to the shared section on error handlers. http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod... Doc/library/codecs.rst:540: by providing the *errors* keyword argument. These parameters are predefined: As above - it's likely worth replacing the repeated list with a cross-reference. http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod... Doc/library/codecs.rst:554: :func:`register_error`. With this paragraph moved to the common section, it can be dropped here. http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod... Doc/library/codecs.rst:617: *stream* must be a file-like object open for writing. I suggest finishing this sentence as "... writing text or binary data, as appropriate for the specific codec". http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod... Doc/library/codecs.rst:620: providing the *errors* keyword argument. These parameters are predefined: As with the Incremental* APIs, I'm a fan of consolidating the error handler documentation, and linking to it where appropriate. http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod... Doc/library/codecs.rst:650: the :meth:`write` method). Does not work on byte encoders. The new caveat needs to be clarified: what does "Does not work" mean? What counts as a "byte encoder" in this context? If the issue is "May not be present for codecs other than text encodings", then that's what we should say. http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod... Doc/library/codecs.rst:684: *stream* must be a file-like object open for reading. See the StreamWriter comments for suggested phrasing here. http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod... Doc/library/codecs.rst:708: characters or bytes to return. The :func:`read` method will characters -> code points (due to complexities in the definition of what actually constitutes a Unicode characters, str's are code point arrays rather than character arrays) http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod... Doc/library/codecs.rst:713: number of encoded bytes or characters to read characters -> code points again. http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod... Doc/library/codecs.rst:760: In addition to the above methods, the :class:`StreamReader` must also inherit Not a new issue in the patch, but "inherit" doesn't strike me as being the right word here (I'm also not sure the statement is even true any more). http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod... Doc/library/codecs.rst:1242: | | | decoding ``b''`` | This should be clear on what b'' gets decoded to. http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod... Doc/library/codecs.rst:1250: | | | Latin-1 source code. | This description should be caveated with the fact that Python 3 uses UTF-8 by default, and relying on that is preferable to unicode_escape in most cases where Python 2 source compatibility isn't needed and a utf-8 encoding cookie can't be used for some reason. http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod... Doc/library/codecs.rst:1256: | | | .. deprecated:: 3.3 | May be worth cross referencing PEP 393 here to explain this deprecation. http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod... Doc/library/codecs.rst:1265: to :class:`bytes` mappings. They are not supported by :meth:`bytes.decode`. Perhaps append "(which is expected to produce :class:`str` output)" to the new sentence? http://bugs.python.org/review/19548/diff/13481/Doc/library/codecs.rst#newcod... Doc/library/codecs.rst:1320: mapping. It is not supported by :meth:`str.encode`. Perhaps append "(which is expected to produce :class:`bytes` output)"? http://bugs.python.org/review/19548/
participants (1)
-
ncoghlan@gmail.com